This post is about learning to learn Xcode’s Frameworks and using them in
external programs for fun.
Background and Motivation:
Program language tooling needs a Compilation Databases as input in order to setup the compiler stack.
The canonical uses of Comp DB’s are LibTooling
and clang-c
. Importantly,
they are required for Swift support for
Vim. An HTTP service calls the
swift compiler toolchain to intelligently generate code completions based on
the source code and dependencies. You can learn more about that at
SwiftySwiftVim’s github.
Typically a Compilation Database is derived from the compiler invocations in a
build system but not in Xcode. We need access to the arguments that Xcode is
invoking swift
with. These arguments are generated by, and consumed in,
Xcode. I thought it’d be possible to extract this data, and write a program
using Xcode’s infrastructure and data structures.
Note: There’s existing solutions for generating Compiliation Databases, but they rely on parsing build logs of xcodebuild. These solutions don’t work with swift and they rely on build logs.
Ok, lets get to the fun part.
First, let’s inspect the dpgh
file.
The file contains juicy data about Xcode’s build process. It lives on the file system in a derived data folder.
/path/to/__DERIVED_DATA__/MyProj-UID/Build/Intermediates/MyProj.build/Debug/MyTarget.build/dpgh
Let’s take a look at a dpgh
file for a project:
$ cat /path/to/__DERIVED_DATA__/MyProj-UID/Build/Intermediates/MyProj.build/Debug/MyTarget.build/dpgh
REDACTED...SLF07#21%IDEActivityLogSection1@2#32...REDACTED
Notice that within the file is a reference to IDEActivityLogSection
. This
reference is needed by a deserialization implementation in order to bind the
raw data back to the object representation of the class. You can learn more
about deserialization protocols at the Apple developer site.
A quick inspection of the class, IDEActivityLogSection
, ( in
IDEFoundation.framework ), shows it does implement a serialization protocol,
and the method dvt_writeToSerializer:
. ( More on inspecting classes later )
This is important, because serialization us used to write structures into a
different format, and for persiting data like a dpgh
.
If our speculation is correct, doing a build operation in Xcode will trigger
this code path, since it needs to write out a dpgh
file after building is
done. Let’s set a beakpoint and see what happens. Test this theory out by
building any program in Xcode.
Attach lldb to Xcode and break
(lldb) break set -S dvt_writeToSerializer:
let’s grab a whisky while our test program is building.
Oh, nice! LLDB stopped when Xcode wrote the XCDependencyGraph
to the file system.
* thread #30, queue = '<IDEBuildOperation:0x7f98237aa3e0:REfc>-builder-queue :: NSOperation 0x7f9823767230 (QOS: DEFAULT)', stop reason = breakpoint 2.6
* frame #0: 0x000000010bc1979a IDEFoundation`-[IDEActivityLogSection dvt_writeToSerializer:]
frame #1: 0x000000010ac50f09 DVTFoundation`-[DVTSimplePlainTextSerializer encodeObject:] + 366
frame #2: 0x000000010bc19c74 IDEFoundation`-[IDEActivityLogSection serializedData] + 146
frame #3: 0x000000011acd529f DevToolsCore`-[XCDependencyGraph writeToByteStream:error:] + 4871
frame #4: 0x000000011acd6881 DevToolsCore`-[XCDependencyGraph _writeToBuildDirectory:forceWrite:error:] + 894
frame #5: 0x000000011acd64fe DevToolsCore`-[XCDependencyGraph writeToBuildDirectory:error:] + 25
frame #6: 0x000000011aaee2ae DevToolsCore`-[PBXTargetBuildContext writeDependencyGraphState] + 180
frame #7: 0x000000011aa8fae0 DevToolsCore`-[PBXTarget(XCBuildables) buildDidFinishWithForBuilder:buildLogRecorder:] + 692
frame #8: 0x000000011acb508d DevToolsCore`-[Xcode3TargetBuildableSnapshot buildDidFinishForBuilder:buildPlan:] + 446
frame #9: 0x000000010b89ce15 IDEFoundation`-[IDEBuildableSnapshot performBuildForBuilder:buildCommand:buildOnlyTheseFiles:] + 4134
frame #10: 0x000000010b8ce776 IDEFoundation`-[IDEBuilder primitiveMain] + 1253
frame #11: 0x000000010b8ce1b9 IDEFoundation`-[IDEBuilder main] + 264
frame #12: 0x00007fffc0f95d84 Foundation`-[__NSOperationInternal _start:] + 672
frame #13: 0x00007fffc0f91c3b Foundation`__NSOQSchedule_f + 201
frame #14: 0x00007fffd47a0128 libdispatch.dylib`_dispatch_client_callout + 8
frame #15: 0x00007fffd47b6b97 libdispatch.dylib`_dispatch_queue_serial_drain + 896
frame #16: 0x00007fffd47a8d41 libdispatch.dylib`_dispatch_queue_invoke + 1046
frame #17: 0x00007fffd47a1ee0 libdispatch.dylib`_dispatch_root_queue_drain + 476
frame #18: 0x00007fffd47a1cb7 libdispatch.dylib`_dispatch_worker_thread3 + 99
frame #19: 0x00007fffd49ed736 libsystem_pthread.dylib`_pthread_wqthread + 1299
frame #20: 0x00007fffd49ed211 libsystem_pthread.dylib`start_wqthread + 13
It looks like, the caller we’re interested in is in DevToolsCore.framework
, which is bundled with Xcode /Applications/Xcode.app/Contents/PlugIns/Xcode3Core.ideplugin/Contents/Frameworks/DevToolsCore.framework
Today, were interested in XCDependencyGraph
, so let’s dump DevToolsCore’s symbols to learn more about it.
Exploring the binary
There’s a lot of tools out there to analyze binaries. otool
can dump
information about a binary, Hopper.app
, IDA
and other disassemblers can
disassemble and show code paths.
I start with simple techniques, like running strings
to see what strings are in a binary:
strings PlugIns/Xcode3Core.ideplugin/Contents/Frameworks/DevToolsCore.framework/DevToolsCore | less
strings
mixed with grep
and other common command line tools can unveil a lot and fast.
LLDB
is an amazing reversing aid, because you can set breakpoints like above and dump registers.
I’ve had luck using class dump to create nice headers for these frameworks, but it’s not necessary.
A complimentary technique to strings
is to link the framework to a baisc
program and traverse the ObjC runtime with the runtime API. This is incredibly
powerful, because it is easy to find interesting for classes or methods using a
turing complete language. We’ll be doing that next.
Cool, let’s use this to write a program
First link these Frameworks into a program: let’s link in DevToolsCore
.
That’s easy since it’s simply a dynamic framework.
clang
or gcc
need the framework as a linker flag and runtime search paths
including the framework’s location on disk ( in Xcode, where we found the
frameworks in the first place ). Additionally, setup include paths for
DevToolsCore
’s dependencies.
// main.m
#import <Foundation/Foundation.h>
// Class dump will give us a nice print out of the API and we can derive an interface:
@class PBXTargetBuildContext;
@interface XCDependencyGraph : NSObject
+ (id)readFromBuildDirectory:(NSString *)buildRoot withTargetBuildContext:(PBXTargetBuildContext *)arg2 error:(NSError *)arg3;
@end
int main(int argc, const char * argv[]) {
NSString *url = @"/Path/To/Build/Root";
// Create a build graph from output
//__DERIVED_DATA__/MyProj-UID/Build/Intermediates/MyProj.build/Debug/MyTarget.build/
NSError *e;
PBXTargetBuildContext *ctx = [NSClassFromString(@"PBXTargetBuildContext") new];
XCDependencyGraph *graph = [NSClassFromString(@"XCDependencyGraph") readFromBuildDirectory:url withTargetBuildContext:ctx error:&e];
assert(graph);
assert(e != nil && "Can't create graph");
NSLog(@"Graph: %@", graph.description);
[graph printNodes];
NSDictionary *records = [graph valueForKey:@"_commandInvocRecordsByIdent"];
// Dump out all of the records
for (NSString *key in records) {
XCDependencyCommandInvocationRecord *record = records[key];
NSLog(@"\nrecord: %@ args: %@", record.executionDescription, record.commandLineArguments);
}
return 0;
}
Conclusion
Simple reversing techniques can yield amazing results. We didn’t dive into
using LLDB
to read registers or use a dissasembler.
I thought this post could be useful for tool development for Xcode. It is nice to code with Xcode’s primitives and infrastructure to work with the data types within Xcode. Using this binds your program to an undocumented API, so it probably isn’t practical.
I used DevToolsCore
and this research to implement a compilation database for swift over on
github.
While reading about dpgh
, I found that yiding wrote
a parser for this file
format.