Skip to content

Latest commit

 

History

History
399 lines (360 loc) · 20 KB

README.md

File metadata and controls

399 lines (360 loc) · 20 KB

Mach-O Parser

To learn the Mach-O format, no way is better than building a parser from scratch. This helps me understand, byte by byte, how Mach-O format is laid out. This parser actually turns out to be a super light version of the combination of otool, nm, strings, dyldinfo, codesign etc.

Usage

There are two ways to build the sources code:

  1. Use Bazel, bazel build //:macho_parser. (Preferred)
  2. Run ./build.sh --openssl. (OpenSSL is not required if not parsing code signature.)
$ ./macho_parser --help
Usage: macho_parser [options] macho_file
    -c, --command LOAD_COMMAND           show specific load command
    -v, --verbose                        can be used multiple times to increase verbose level
        --arch                           specify an architecture, arm64 or x86_64
        --no-truncate                    do not truncate even the content is long
    -h, --help                           show this help message

    --segments                           equivalent to '--command LC_SEGMENT_64
    --section INDEX                      show the section at INDEX
    --dylibs                             show dylib related commands
    --build-version                      equivalent to '--command LC_BUILD_VERSION --command LC_VERSION_MIN_*'

Code Signature Options:
    --cs,  --code-signature              equivalent to '--command LC_CODE_SIGNATURE'
    --cd,  --code-directory              show Code Directory
    --ent, --entitlement                 show the embedded entitlement
           --blob-wrapper                show the blob wrapper (signature blob)

Dynamic Symbol Table Options:
    --dysymtab                           equivalent to '--command LC_DYSYMTAB'
    --local                              show local symbols
    --extdef                             show externally (public) defined symbols
    --undef                              show undefined symbols
    --indirect                           show indirect symbol table

Dyld Info Options:
    --dyld-info                          equivalent to '--command LC_DYLD_INFO(_ONLY)'
    --rebase                             show rebase info
    --bind                               show binding info
    --weak-bind                          show weak binding info
    --lazy-bind                          show lazy binding info
    --export                             show export trie
    --opcode                             show the raw opcode instead of a table

Sample

This directory also includes a sample that demonstrates how source code ends up in the different parts of a binary file.

$ ./build_sample.sh
$ ./macho_parser sample.out
LC_SEGMENT_64        cmdsize: 72     segname: __PAGEZERO       fileoff: 0x00000000 filesize: 0            (fileend: 0x00000000)
LC_SEGMENT_64        cmdsize: 712    segname: __TEXT           fileoff: 0x00000000 filesize: 16384        (fileend: 0x00004000)
LC_SEGMENT_64        cmdsize: 552    segname: __DATA_CONST     fileoff: 0x00004000 filesize: 16384        (fileend: 0x00008000)
LC_SEGMENT_64        cmdsize: 392    segname: __DATA           fileoff: 0x00008000 filesize: 16384        (fileend: 0x0000c000)
LC_SEGMENT_64        cmdsize: 72     segname: __LINKEDIT       fileoff: 0x0000c000 filesize: 1328         (fileend: 0x0000c530)
LC_SYMTAB            cmdsize: 24     symoff: 0x8c360   nsyms: 21   (symsize: 336)   stroff: 0x08c360   strsize: 464
LC_DYSYMTAB          cmdsize: 80     nlocalsym: 5  nextdefsym: 7   nundefsym: 9   nindirectsyms: 10
LC_LOAD_DYLIB        cmdsize: 48     @rpath/my_dylib.dylib
LC_LOAD_DYLIB        cmdsize: 56     /usr/lib/libSystem.B.dylib
LC_LOAD_DYLIB        cmdsize: 104    /System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation
LC_LOAD_DYLIB        cmdsize: 96     /System/Library/Frameworks/Foundation.framework/Versions/C/Foundation
LC_LOAD_DYLIB        cmdsize: 56     /usr/lib/libobjc.A.dylib
LC_RPATH             cmdsize: 24     build
$ ./macho_parser --segments sample.out
LC_SEGMENT_64        cmdsize: 72     segname: __PAGEZERO     file: 0x00000000-0x00000000 0B         vm: 0x000000000-0x100000000 4.00GB    prot: 0/0
LC_SEGMENT_64        cmdsize: 712    segname: __TEXT         file: 0x00000000-0x00004000 16.00KB    vm: 0x100000000-0x100004000 16.00KB   prot: 5/5
   0: 0x100003e70-0x100003f27 183B        (__TEXT,__text)                   type: S_REGULAR  offset: 15984
   1: 0x100003f28-0x100003f40 24B         (__TEXT,__stubs)                  type: S_SYMBOL_STUBS  offset: 16168   reserved1:  6
   2: 0x100003f40-0x100003f7a 58B         (__TEXT,__stub_helper)            type: S_REGULAR  offset: 16192
   3: 0x100003f7a-0x100003f9b 33B         (__TEXT,__cstring)                type: S_CSTRING_LITERALS  offset: 16250
   4: 0x100003f9b-0x100003fa7 12B         (__TEXT,__objc_classname)         type: S_CSTRING_LITERALS  offset: 16283
   5: 0x100003fa7-0x100003fac 5B          (__TEXT,__objc_methname)          type: S_CSTRING_LITERALS  offset: 16295
   6: 0x100003fac-0x100003fb4 8B          (__TEXT,__objc_methtype)          type: S_CSTRING_LITERALS  offset: 16300
   7: 0x100003fb4-0x100003ffc 72B         (__TEXT,__unwind_info)            type: S_REGULAR  offset: 16308
LC_SEGMENT_64        cmdsize: 552    segname: __DATA_CONST   file: 0x00004000-0x00008000 16.00KB    vm: 0x100004000-0x100008000 16.00KB   prot: 3/3
   8: 0x100004000-0x100004010 16B         (__DATA_CONST,__got)              type: S_NON_LAZY_SYMBOL_POINTERS  offset: 16384   reserved1:  4
   9: 0x100004010-0x100004018 8B          (__DATA_CONST,__mod_init_func)    type: S_MOD_INIT_FUNC_POINTERS  offset: 16400
  10: 0x100004018-0x100004038 32B         (__DATA_CONST,__cfstring)         type: S_REGULAR  offset: 16408
  11: 0x100004038-0x100004040 8B          (__DATA_CONST,__objc_classlist)   type: S_REGULAR  offset: 16440
  12: 0x100004040-0x100004048 8B          (__DATA_CONST,__objc_nlclslist)   type: S_REGULAR  offset: 16448
  13: 0x100004048-0x100004050 8B          (__DATA_CONST,__objc_imageinfo)   type: S_REGULAR  offset: 16456
LC_SEGMENT_64        cmdsize: 392    segname: __DATA         file: 0x00008000-0x0000c000 16.00KB    vm: 0x100008000-0x10000c000 16.00KB   prot: 3/3
  14: 0x100008000-0x100008020 32B         (__DATA,__la_symbol_ptr)          type: S_LAZY_SYMBOL_POINTERS  offset: 32768   reserved1:  6
  15: 0x100008020-0x1000080d0 176B        (__DATA,__objc_const)             type: S_REGULAR  offset: 32800
  16: 0x1000080d0-0x100008120 80B         (__DATA,__objc_data)              type: S_REGULAR  offset: 32976
  17: 0x100008120-0x100008128 8B          (__DATA,__data)                   type: S_REGULAR  offset: 33056
LC_SEGMENT_64        cmdsize: 72     segname: __LINKEDIT     file: 0x0000c000-0x0000c728 1.79KB     vm: 0x10000c000-0x100010000 16.00KB   prot: 1/1
$ ./macho_parser --dyld-info sample.out
LC_DYLD_INFO_ONLY    cmdsize: 48     export_size: 192
  rebase_off   : 49152        rebase_size   : 24
  bind_off     : 49176        bind_size     : 184
  weak_bind_off: 0            weak_bind_size: 0
  lazy_bind_off: 49360        lazy_bind_size: 80
  export_off   : 49440        export_size   : 192

Rebase

$ ./macho_parser --rebase sample.out
  Rebase Table:
__DATA_CONST,__mod_init_func      0x100004010  pointer  value(0x100003E70)
__DATA_CONST,__cfstring           0x100004028  pointer  value(0x100003F89)
__DATA_CONST,__objc_classlist     0x100004038  pointer  value(0x1000080F8)
__DATA_CONST,__objc_nlclslist     0x100004040  pointer  value(0x1000080F8)
__DATA,__la_symbol_ptr            0x100008000  pointer  value(0x100003F70)
__DATA,__la_symbol_ptr            0x100008008  pointer  value(0x100003F40)
__DATA,__la_symbol_ptr            0x100008010  pointer  value(0x100003F5C)
__DATA,__la_symbol_ptr            0x100008018  pointer  value(0x100003F66)
...
$ ./macho_parser --rebase --opcode sample.out
  Rebase Opcodes:
0x0000 REBASE_OPCODE_SET_TYPE_IMM (REBASE_TYPE_POINTER)
0x0001 REBASE_OPCODE_SET_SEGMENT_AND_OFFSET_ULEB (2, 0x00000010) -- __DATA_CONST
0x0003 REBASE_OPCODE_DO_REBASE_ADD_ADDR_ULEB (0x00000010)
0x0005 REBASE_OPCODE_DO_REBASE_ADD_ADDR_ULEB (0x00000008)
0x0007 REBASE_OPCODE_DO_REBASE_IMM_TIMES (2)
0x0008 REBASE_OPCODE_SET_SEGMENT_AND_OFFSET_ULEB (3, 0x00000000) -- __DATA
0x000A REBASE_OPCODE_DO_REBASE_IMM_TIMES (4)
0x000B REBASE_OPCODE_ADD_ADDR_IMM_SCALED (1)
...
$ ./macho_parser --bind [--lazy-bind] [--weak-bind] sample.out
  Binding Table:
__DATA_CONST,__got        0x100004000  pointer  flat-namespace        addend(0)  _c_extern_weak_function (weak import)
__DATA_CONST,__got        0x100004008  pointer  libSystem.B.dylib     addend(0)  dyld_stub_binder
__DATA_CONST,__cfstring   0x100004018  pointer  CoreFoundation        addend(0)  ___CFConstantStringClassReference
__DATA,__objc_data        0x100008100  pointer  libobjc.A.dylib       addend(0)  _OBJC_CLASS_$_NSObject
__DATA,__objc_data        0x1000080D0  pointer  libobjc.A.dylib       addend(0)  _OBJC_METACLASS_$_NSObject
__DATA,__objc_data        0x1000080D8  pointer  libobjc.A.dylib       addend(0)  _OBJC_METACLASS_$_NSObject
__DATA,__objc_data        0x1000080E0  pointer  libobjc.A.dylib       addend(0)  __objc_empty_cache
__DATA,__objc_data        0x100008108  pointer  libobjc.A.dylib       addend(0)  __objc_empty_cache
$ ./macho_parser --bind --opcode sample.out
  Binding Opcodes:
0x0000 BIND_OPCODE_SET_DYLIB_SPECIAL_IMM (BIND_SPECIAL_DYLIB_FLAT_LOOKUP)
0x0001 BIND_OPCODE_SET_SYMBOL_TRAILING_FLAGS_IMM (BIND_SYMBOL_FLAGS_WEAK_IMPORT, _c_extern_weak_function)
0x001A BIND_OPCODE_SET_TYPE_IMM (BIND_TYPE_POINTER)
0x001B BIND_OPCODE_SET_SEGMENT_AND_OFFSET_ULEB (2, 0x00000000) -- __DATA_CONST
0x001D BIND_OPCODE_DO_BIND ()
0x001E BIND_OPCODE_SET_DYLIB_ORDINAL_IMM (2) -- libSystem.B.dylib
0x001F BIND_OPCODE_SET_SYMBOL_TRAILING_FLAGS_IMM (0, dyld_stub_binder)
0x0031 BIND_OPCODE_DO_BIND ()
...
$ ./macho_parser --export sample.out
  Exported Symbols (Trie):
  _
    _mh_execute_header (data: 0000)
    c_
      constructor_function (data: 00f07c)
      used_function (data: 00807d)
      weak_import_function (data: 00907d)
    main (data: 00a07d)
    OBJC_
      METACLASS_$_SimpleClass (data: 00d08102)
      CLASS_$_SimpleClass (data: 00f88102)
./macho_parser -c LC_DYLD_CHAINED_FIXUPS sample.out
LC_DYLD_CHAINED_FIXUPS cmdsize: 16     dataoff: 0xc000 (49152)   datasize: 296
  CHAINED FIXUPS HEADER
    fixups_version : 0
    starts_offset  : 0x20 (32)
    imports_offset : 0x68 (104)
    symbols_offset : 0x88 (136)
    imports_count  : 8
    imports_format : 1 (DYLD_CHAINED_IMPORT)
    symbols_format : 0 (UNCOMPRESSED)

  IMPORTS
    [0] lib_ordinal: 4 (Foundation)        weak_import: 0   name_offset: 1 (_NSLog)
    [1] lib_ordinal: 254 (flat lookup)     weak_import: 1   name_offset: 8 (_c_extern_weak_function)
    [2] lib_ordinal: 1 (my_dylib.dylib)    weak_import: 0   name_offset: 32 (_my_dylib_func)
    [3] lib_ordinal: 2 (libSystem.B.dylib) weak_import: 0   name_offset: 47 (_printf)
    [4] lib_ordinal: 3 (CoreFoundation)    weak_import: 0   name_offset: 55 (___CFConstantStringClassReference)
    [5] lib_ordinal: 5 (libobjc.A.dylib)   weak_import: 0   name_offset: 89 (__objc_empty_cache)
    [6] lib_ordinal: 5 (libobjc.A.dylib)   weak_import: 0   name_offset: 108 (_OBJC_METACLASS_$_NSObject)
    [7] lib_ordinal: 5 (libobjc.A.dylib)   weak_import: 0   name_offset: 135 (_OBJC_CLASS_$_NSObject)

  SEGMENT __PAGEZERO (offset: 0)

  SEGMENT __TEXT (offset: 0)

  SEGMENT __DATA_CONST (offset: 24)
    size: 24
    page_size: 0x4000
    pointer_format: 6 (DYLD_CHAINED_PTR_64_OFFSET)
    segment_offset: 0x4000
    max_valid_pointer: 0
    page_count: 1
    page_start: 0
      PAGE 0 (offset: 0)
        0x00004000 BIND     ordinal: 0   addend: 0    reserved: 0   (_NSLog)
        0x00004008 BIND     ordinal: 1   addend: 0    reserved: 0   (_c_extern_weak_function)
        0x00004010 BIND     ordinal: 2   addend: 0    reserved: 0   (_my_dylib_func)
        0x00004018 BIND     ordinal: 3   addend: 0    reserved: 0   (_printf)
        0x00004020 BIND     ordinal: 4   addend: 0    reserved: 0   (___CFConstantStringClassReference)
        0x00004030 REBASE   target: 0x00003f83   high8: 0
        0x00004040 REBASE   target: 0x000080d8   high8: 0
        0x00004048 REBASE   target: 0x000080d8   high8: 0
    ...

LC_DYLD_EXPORTS_TRIE

See Export in LC_DYLD_INFO.

$ ./macho_parser --command LC_DYLD_ENVIRONMENT /Applications/Xcode-13.2.1.app/Contents/MacOS/Xcode
LC_DYLD_ENVIRONMENT  cmdsize: 120    DYLD_VERSIONED_FRAMEWORK_PATH=@executable_path/../SystemFrameworks:@executable_path/../InternalFrameworks
LC_DYLD_ENVIRONMENT  cmdsize: 120    DYLD_VERSIONED_LIBRARY_PATH=@executable_path/../SystemLibraries:@executable_path/../InternalLibraries
$ ./macho_parser --command LC_SYMTAB --no-truncate sample.out
LC_SYMTAB            cmdsize: 24     symoff: 49640   nsyms: 41   (symsize: 656)   stroff: 50336   strsize: 648
    0   : 0000000100003f00  [N_SECT]  +[SimpleClass load]
    1   : 0000000100008020  [N_SECT]  __OBJC_$_CLASS_METHODS_SimpleClass
    2   : 0000000100008040  [N_SECT]  __OBJC_METACLASS_RO_$_SimpleClass
    3   : 0000000100008088  [N_SECT]  __OBJC_CLASS_RO_$_SimpleClass
    4   : 0000000100008120  [N_SECT]  __dyld_private
    5   : 0000000000000000  [N_STAB(0x60)]  /Users/qingyang/Projects/llios/macho_parser/sample/
    ...
    36  :                   [N_EXT N_UNDF]  __objc_empty_cache  [UNDEFINED_NON_LAZY LIBRARY_ORDINAL(5)]
    37  :                   [N_EXT N_UNDF]  _c_extern_weak_function  [UNDEFINED_NON_LAZY N_WEAK_REF LIBRARY_ORDINAL(254)]
    38  :                   [N_EXT N_UNDF]  _my_dylib_func  [UNDEFINED_NON_LAZY LIBRARY_ORDINAL(1)]
    39  :                   [N_EXT N_UNDF]  _printf  [UNDEFINED_NON_LAZY LIBRARY_ORDINAL(2)]
    40  :                   [N_EXT N_UNDF]  dyld_stub_binder  [UNDEFINED_NON_LAZY LIBRARY_ORDINAL(2)]
$ ./macho_parser --dysymtab --local --extdef --undef --indirect sample.out
LC_DYSYMTAB          cmdsize: 80     nlocalsym: 25  nextdefsym: 7   nundefsym: 9   nindirectsyms: 10
  ilocalsym     : 0           nlocalsym    : 25
  iextdefsym    : 25          nextdefsym   : 7
  iundefsym     : 32          nundefsym    : 9
  tocoff        : 0x0         ntoc         : 0
  modtaboff     : 0x0         nmodtab      : 0
  extrefsymoff  : 0x0         nextrefsyms  : 0
  indirectsymoff: 0x0000c478  nindirectsyms: 10
  extreloff     : 0x0         nextrel      : 0
  locreloff     : 0x0         nlocrel      : 0

  Local symbols (ilocalsym 0, nlocalsym:25)
    0   : 0000000100003f00  [N_SECT]  +[SimpleClass load]
    1   : 0000000100008020  [N_SECT]  __OBJC_$_CLASS_METHODS_SimpleClass
    2   : 0000000100008040  [N_SECT]  __OBJC_METACLASS_RO_$_SimpleClass
    ...

  Externally defined symbols (iextdefsym: 25, nextdefsym:7)
    25  : 00000001000080f8  [N_EXT N_SECT]  _OBJC_CLASS_$_SimpleClass
    26  : 00000001000080d0  [N_EXT N_SECT]  _OBJC_METACLASS_$_SimpleClass
    27  : 0000000100000000  [N_EXT N_SECT]  __mh_execute_header  [REFERENCED_DYNAMICALLY]
    ...

  Undefined symbols (iundefsym: 32, nundefsym:9)
    32  : [N_EXT N_UNDF]  _NSLog  [UNDEFINED_NON_LAZY LIBRARY_ORDINAL(4)]
    33  : [N_EXT N_UNDF]  _OBJC_CLASS_$_NSObject  [UNDEFINED_NON_LAZY LIBRARY_ORDINAL(5)]
    34  : [N_EXT N_UNDF]  _OBJC_METACLASS_$_NSObject  [UNDEFINED_NON_LAZY LIBRARY_ORDINAL(5)]
    ...

  Indirect symbol table (indirectsymoff: 0xc478, nindirectsyms: 10)
    0  -> 32  : [N_EXT N_UNDF]  _NSLog  [UNDEFINED_NON_LAZY LIBRARY_ORDINAL(4)]
    1  -> 37  : [N_EXT N_UNDF]  _c_extern_weak_function  [UNDEFINED_NON_LAZY N_WEAK_REF LIBRARY_ORDINAL(254)]
    2  -> 38  : [N_EXT N_UNDF]  _my_dylib_func  [UNDEFINED_NON_LAZY LIBRARY_ORDINAL(1)]
    ...
$ ./macho_parser --command LC_FUNCTION_STARTS sample.out
LC_FUNCTION_STARTS   cmdsize: 16     dataoff: 0xc1e0 (49632)   datasize: 8
  0x100003e70  _c_constructor_function
  0x100003e80  _c_used_function
  0x100003e90  _c_weak_import_function
  0x100003ea0  _main
  0x100003f00  +[SimpleClass load]
$ ./macho_parser --command LC_MAIN sample.out
LC_MAIN              cmdsize: 24     entryoff: 16052 (0x3eb4)  stacksize: 0
$ ./macho_parser --command LC_LINKER_OPTION sample.o
LC_LINKER_OPTION     cmdsize: 32     count: 1   -lswift_Concurrency
LC_LINKER_OPTION     cmdsize: 24     count: 1   -lswiftCore
LC_LINKER_OPTION     cmdsize: 32     count: 1   -lswiftWebKit
LC_LINKER_OPTION     cmdsize: 32     count: 2   -framework WebKit
...
$ ./macho_parser --dylibs sample.out
LC_LOAD_DYLIB        cmdsize: 48     @rpath/my_dylib.dylib
LC_LOAD_DYLIB        cmdsize: 56     /usr/lib/libSystem.B.dylib
LC_LOAD_DYLIB        cmdsize: 104    /System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation
LC_LOAD_DYLIB        cmdsize: 96     /System/Library/Frameworks/Foundation.framework/Versions/C/Foundation
LC_LOAD_DYLIB        cmdsize: 56     /usr/lib/libobjc.A.dylib
$ ./macho_parser --command LC_RPATH sample.out
LC_RPATH             cmdsize: 24     build
$ ./macho_parser --build-version sample.out
LC_BUILD_VERSION     cmdsize: 32     platform: MACOS   minos: 12.0.0   sdk: 12.0.0
    tool:  LD   version: 711.0.0
$ ./macho_parser --command LC_ENCRYPTION_INFO_64 sample.out
LC_ENCRYPTION_INFO_64 cmdsize: 24    cryptoff: 16384  cryptsize: 143081472  (range: 0x4000-0x8878000)  cryptid: 1   pad: 0
$ ./macho_parser --code-signature --code-directory --entitlement --blob-wrapper -v SampleApp.app/SampleApp
LC_CODE_SIGNATURE    cmdsize: 16     dataoff: 0x214e0 (136416)   datasize: 20384
SuperBlob: magic: CSMAGIC_EMBEDDED_SIGNATURE, length: 7162, count: 5
  Blob 0: type: 0000000, offset: 52, magic: CSMAGIC_CODEDIRECTORY, length: 1430
    version      : 0x20400
    flags        : 0
    hashOffset   : 342
    identOffset  : 88
    nSpecialSlots: 7
    nCodeSlots   : 34
    codeLimit    : 136416
    hashSize     : 32
    hashType     : SHA256
    platform     : 0
    pageSize     : 4096
    identity     : me.qyang.SampleApp
    CDHash       : 353b6287288167ddfab1b0dff6d628d774045e941be05d6960662da3d1878ee3

    Slot[ -7] : e7f1d4635ba1128f12935ec8659106d194ef48e907e3750a5263bdfc62b268f0
    Slot[ -6] : 0000000000000000000000000000000000000000000000000000000000000000
    Slot[ -5] : 5c605b825812dc227e5162acba5f8af22b61b330c703397b4586f266e4534fe9
    Slot[ -4] : 0000000000000000000000000000000000000000000000000000000000000000
    ...

  Blob 1: type: 0x00002, offset: 1482, magic: CSMAGIC_REQUIREMENTS, length: 188
    Requirement[0]: offset: 20, length: 168
      identifier "me.qyang.SampleApp" and anchor apple generic and certificate leaf[subject.CN] = "Apple Development: yangq.nj@gmail.com (J5K827UFE4)" and certificate 1[field.1.2.840.113635.100.6.2.1] /* exists */

  Blob 2: type: 0x00005, offset: 1670, magic: CSMAGIC_EMBEDDED_ENTITLEMENTS, length: 495
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
        <key>application-identifier</key>
        <string>YK72VSF9J6.me.qyang.SampleApp</string>
        ...
</dict>
</plist>

  Blob 3: type: 0x00007, offset: 2157, magic: 0xfade7172, length: 205  (likely DER entitlements)
  Blob 4: type: 0x10000, offset: 2370, magic: CSMAGIC_BLOBWRAPPER, length: 4792
    PKCS7:
      type: pkcs7-signedData (1.2.840.113549.1.7.2)
      d.sign:
        version: 1
        md_algs:
            algorithm: sha256 (2.16.840.1.101.3.4.2.1)
            parameter: NULL
   ...