Reversing the Xcode Dependency Graph for Fun

16 May 2017


This post is about learning to learn Xcode’s Frameworks and using them in external programs for fun.

Background and Motivation:

Program language tooling needs a Compilation Databases as input in order to setup the compiler stack.

The canonical uses of Comp DB’s are LibTooling and clang-c. Importantly, they are required for Swift support for Vim. An HTTP service calls the swift compiler toolchain to intelligently generate code completions based on the source code and dependencies. You can learn more about that at SwiftySwiftVim’s github.

Typically a Compilation Database is derived from the compiler invocations in a build system but not in Xcode. We need access to the arguments that Xcode is invoking swift with. These arguments are generated by, and consumed in, Xcode. I thought it’d be possible to extract this data, and write a program using Xcode’s infrastructure and data structures.

Note: There’s existing solutions for generating Compiliation Databases, but they rely on parsing build logs of xcodebuild. These solutions don’t work with swift and they rely on build logs.

Ok, lets get to the fun part.

First, let’s inspect the dpgh file.

The file contains juicy data about Xcode’s build process. It lives on the file system in a derived data folder.

 /path/to/__DERIVED_DATA__/MyProj-UID/Build/Intermediates/MyProj.build/Debug/MyTarget.build/dpgh


Let’s take a look at a dpgh file for a project:

 $ cat  /path/to/__DERIVED_DATA__/MyProj-UID/Build/Intermediates/MyProj.build/Debug/MyTarget.build/dpgh
 REDACTED...SLF07#21%IDEActivityLogSection1@2#32...REDACTED


Notice that within the file is a reference to IDEActivityLogSection. This reference is needed by a deserialization implementation in order to bind the raw data back to the object representation of the class. You can learn more about deserialization protocols at the Apple developer site.

A quick inspection of the class, IDEActivityLogSection, ( in IDEFoundation.framework ), shows it does implement a serialization protocol, and the method dvt_writeToSerializer:. ( More on inspecting classes later ) This is important, because serialization us used to write structures into a different format, and for persiting data like a dpgh.

If our speculation is correct, doing a build operation in Xcode will trigger this code path, since it needs to write out a dpgh file after building is done. Let’s set a beakpoint and see what happens. Test this theory out by building any program in Xcode.

Attach lldb to Xcode and break

    (lldb) break set -S dvt_writeToSerializer:

let’s grab a whisky while our test program is building.

Oh, nice! LLDB stopped when Xcode wrote the XCDependencyGraph to the file system.

 * thread #30, queue = '<IDEBuildOperation:0x7f98237aa3e0:REfc>-builder-queue :: NSOperation 0x7f9823767230 (QOS: DEFAULT)', stop reason = breakpoint 2.6
  * frame #0: 0x000000010bc1979a IDEFoundation`-[IDEActivityLogSection dvt_writeToSerializer:]
    frame #1: 0x000000010ac50f09 DVTFoundation`-[DVTSimplePlainTextSerializer encodeObject:] + 366
    frame #2: 0x000000010bc19c74 IDEFoundation`-[IDEActivityLogSection serializedData] + 146
    frame #3: 0x000000011acd529f DevToolsCore`-[XCDependencyGraph writeToByteStream:error:] + 4871
    frame #4: 0x000000011acd6881 DevToolsCore`-[XCDependencyGraph _writeToBuildDirectory:forceWrite:error:] + 894
    frame #5: 0x000000011acd64fe DevToolsCore`-[XCDependencyGraph writeToBuildDirectory:error:] + 25
    frame #6: 0x000000011aaee2ae DevToolsCore`-[PBXTargetBuildContext writeDependencyGraphState] + 180
    frame #7: 0x000000011aa8fae0 DevToolsCore`-[PBXTarget(XCBuildables) buildDidFinishWithForBuilder:buildLogRecorder:] + 692
    frame #8: 0x000000011acb508d DevToolsCore`-[Xcode3TargetBuildableSnapshot buildDidFinishForBuilder:buildPlan:] + 446
    frame #9: 0x000000010b89ce15 IDEFoundation`-[IDEBuildableSnapshot performBuildForBuilder:buildCommand:buildOnlyTheseFiles:] + 4134
    frame #10: 0x000000010b8ce776 IDEFoundation`-[IDEBuilder primitiveMain] + 1253
    frame #11: 0x000000010b8ce1b9 IDEFoundation`-[IDEBuilder main] + 264
    frame #12: 0x00007fffc0f95d84 Foundation`-[__NSOperationInternal _start:] + 672
    frame #13: 0x00007fffc0f91c3b Foundation`__NSOQSchedule_f + 201
    frame #14: 0x00007fffd47a0128 libdispatch.dylib`_dispatch_client_callout + 8
    frame #15: 0x00007fffd47b6b97 libdispatch.dylib`_dispatch_queue_serial_drain + 896
    frame #16: 0x00007fffd47a8d41 libdispatch.dylib`_dispatch_queue_invoke + 1046
    frame #17: 0x00007fffd47a1ee0 libdispatch.dylib`_dispatch_root_queue_drain + 476
    frame #18: 0x00007fffd47a1cb7 libdispatch.dylib`_dispatch_worker_thread3 + 99
    frame #19: 0x00007fffd49ed736 libsystem_pthread.dylib`_pthread_wqthread + 1299
    frame #20: 0x00007fffd49ed211 libsystem_pthread.dylib`start_wqthread + 13


It looks like, the caller we’re interested in is in DevToolsCore.framework, which is bundled with Xcode /Applications/Xcode.app/Contents/PlugIns/Xcode3Core.ideplugin/Contents/Frameworks/DevToolsCore.framework

Today, were interested in XCDependencyGraph, so let’s dump DevToolsCore’s symbols to learn more about it.

Exploring the binary

There’s a lot of tools out there to analyze binaries. otool can dump information about a binary, Hopper.app, IDA and other disassemblers can disassemble and show code paths.

I start with simple techniques, like running strings to see what strings are in a binary:

 strings PlugIns/Xcode3Core.ideplugin/Contents/Frameworks/DevToolsCore.framework/DevToolsCore | less


strings mixed with grep and other common command line tools can unveil a lot and fast.

LLDB is an amazing reversing aid, because you can set breakpoints like above and dump registers.

I’ve had luck using class dump to create nice headers for these frameworks, but it’s not necessary.

A complimentary technique to strings is to link the framework to a baisc program and traverse the ObjC runtime with the runtime API. This is incredibly powerful, because it is easy to find interesting for classes or methods using a turing complete language. We’ll be doing that next.

Cool, let’s use this to write a program

First link these Frameworks into a program: let’s link in DevToolsCore. That’s easy since it’s simply a dynamic framework.

clang or gcc need the framework as a linker flag and runtime search paths including the framework’s location on disk ( in Xcode, where we found the frameworks in the first place ). Additionally, setup include paths for DevToolsCore’s dependencies.

// main.m

#import <Foundation/Foundation.h>

// Class dump will give us a nice print out of the API and we can derive an interface:

@class PBXTargetBuildContext;

@interface XCDependencyGraph : NSObject
+ (id)readFromBuildDirectory:(NSString *)buildRoot withTargetBuildContext:(PBXTargetBuildContext *)arg2 error:(NSError *)arg3;
@end

int main(int argc, const char * argv[]) {
    NSString *url = @"/Path/To/Build/Root";

    // Create a build graph from output
    //__DERIVED_DATA__/MyProj-UID/Build/Intermediates/MyProj.build/Debug/MyTarget.build/
    NSError *e;
    PBXTargetBuildContext *ctx = [NSClassFromString(@"PBXTargetBuildContext") new];
    XCDependencyGraph *graph = [NSClassFromString(@"XCDependencyGraph") readFromBuildDirectory:url withTargetBuildContext:ctx error:&e];
    assert(graph);
    assert(e != nil && "Can't create graph");
    
    NSLog(@"Graph: %@", graph.description);
    [graph printNodes];
    
    NSDictionary *records = [graph valueForKey:@"_commandInvocRecordsByIdent"];

    // Dump out all of the records
    for (NSString *key in records) {
        XCDependencyCommandInvocationRecord *record = records[key];
        NSLog(@"\nrecord: %@ args: %@", record.executionDescription, record.commandLineArguments);
    }
    return 0;
}

Conclusion

Simple reversing techniques can yield amazing results. We didn’t dive into using LLDB to read registers or use a dissasembler.

I thought this post could be useful for tool development for Xcode. It is nice to code with Xcode’s primitives and infrastructure to work with the data types within Xcode. Using this binds your program to an undocumented API, so it probably isn’t practical.

I used DevToolsCore and this research to implement a compilation database for swift over on github.

While reading about dpgh, I found that yiding wrote a parser for this file format.

Published on 16 May 2017 Find me on Twitter!