Introduction

There are plenty of articles that explain how does Mach-O file format looks like, but not so many of them explain how bytes can be transformed into valuable information that can tell us more about the binary itself.

Apple provides a command-line tool for reading Mach-O files which is called otool. You can, for example, read a Mach-O header of a binary by invoking the following command:

$ otool -h <path_to_binary>
Mach header
      magic cputype cpusubtype  caps    filetype ncmds sizeofcmds      flags
 0xfeedfacf 16777228          0  0x00           2    82       9032 0x00218085

You can also read what shared libraries are linked to a specified binary by invoking the following:

$ otool -L <path_to_binary>
/System/Library/Frameworks/AdSupport.framework/AdSupport (compatibility version 1.0.0, current version 1.0.0)
@rpath/Resources.framework/Resources (compatibility version 1.0.0, current version 1.0.0)
@rpath/Settings.framework/Settings (compatibility version 1.0.0, current version 1.0.0, weak)
/usr/lib/libc++.1.dylib (compatibility version 1.0.0, current version 902.0.0)
...

otool is very powerful that lets you inspect various parts of Mach-O files. Here’s the documentation.

I’ve decided to take a deeper look at how it all works and thought that I could write a command-line tool in Swift for reading Mach-O files. There are a few goals that I want to achieve by writing such a tool:

In the article I want to focus only on reading Mach-O header, but first let’s recall what is the Mach-O file format and how it is structured.

Mach-O file format

Mach-O is a file format for executables, object code, shared libraries. It is used by most systems based on Mach kernel such as macOS & iOS.

It consists of three main parts:

Mach Header

Every Mach-O file starts with a header. It contains general information about the binary, such as the target architecture the executable can run on, number of load commands, or file type.

Load Commands

Load commands contain a description of segments that are stored in the Data region, libraries that are dynamically linked to an executable or application entry point.

Data

Data region contains the actual data of a program such as assembly instructions, variables, symbols. It is divided into sections that are described by load commands.

The core of Mach-O reader

Now that we know a little bit about the Mach-O file format we can try to parse a header from a Mach-O binary. To do that we will need to implement the logic for reading bytes from the given binary. Let’s use Foundation’s FileHandle API and define a simple Swift class:

internal final class FileReader {
    private let fileHandle: FileHandle

    internal var fileOffset: UInt64 {
        get {
            return fileHandle.offsetInFile
        }
        set {
            fileHandle.seek(toFileOffset: newValue)
        }
    }

    internal init(fileHandle: FileHandle) {
        self.fileHandle = fileHandle
    }

    internal func read<T>(dataType: T.Type, fileOffset: UInt64 = 0) -> T {
        fileHandle.seek(toFileOffset: fileOffset)

        let dataTypeSize = MemoryLayout<T>.size
        let data = fileHandle.readData(ofLength: dataTypeSize)

        return data.convert(to: dataType)
    }
}

The FileReader class would simply use the file handle and read the specified number of bytes from a file. The number of bytes that are going to be read is the size of a data type, which is passed as an argument. We will use MemoryLayout to get the number of bytes for a data type.

Parsing Mach-O header

After we have implemented the logic for reading bytes, we can try to read the Mach-O header. The process of reading the header can be described in three steps:

Mach header struct is defined in MachO.loader module. There are two versions available for the header - 32bit and 64bit. They look as follows:

public struct mach_header {
    public var magic: UInt32 /* mach magic number identifier */
    public var cputype: cpu_type_t /* cpu specifier */
    public var cpusubtype: cpu_subtype_t /* machine specifier */
    public var filetype: UInt32 /* type of file */
    public var ncmds: UInt32 /* number of load commands */
    public var sizeofcmds: UInt32 /* the size of all the load commands */
    public var flags: UInt32 /* flags */
}

public struct mach_header_64 {
    public var magic: UInt32 /* mach magic number identifier */
    public var cputype: cpu_type_t /* cpu specifier */
    public var cpusubtype: cpu_subtype_t /* machine specifier */
    public var filetype: UInt32 /* type of file */
    public var ncmds: UInt32 /* number of load commands */
    public var sizeofcmds: UInt32 /* the size of all the load commands */
    public var flags: UInt32 /* flags */
    public var reserved: UInt32 /* reserved */
}

As you can see the headers are pretty the same. The 64bit header has an additional reserved variable, which has 4 bytes.
To use the correct header we need to find out under which architecture the binary has been built. Fortunately the magic variable has information about the architecture.
Let’s write the code for reading mach header then.

func readHeader() throws -> MachHeader {
  let magic = try readMagic()

  switch magic.fileArchitecture {
  case .arch32bit:
      let header = try read32BitHeader()
      return .arch32(header)
  case .arch64bit:
      let header = try read64BitHeader()
      return .arch64(header)
  }        
}

func readMagic() throws -> Magic {
  let magic = fileReader.read(dataType: UInt32.self)

  return try MagicMapper().map(input: magic)
}

func read32BitHeader() throws -> MachHeader32 {
  let header = fileReader.read(dataType: mach_header.self)

  return MachHeader32(
      magic: try MagicMapper().map(input: header.magic),
      cpu: try CPUMapper().map(input: header.cputype),
      cpuSubType: cpuSubType(value: header.cpusubtype),
      fileType: try FileTypeMapper().map(input: header.filetype),
      numberOfCommands: Int(header.ncmds),
      sizeOfCommands: Int(header.sizeofcmds),
      flags: flags(value: header.flags)
  )
}

The readHeader method reads magic first and then based on that it decides which header to read. You can also notice some mappers which are responsible for mapping raw bytes into usable information.

The last thing that I have not mentioned here is the fact that we need to handle endianness when reading headers. We may need to swap bytes to get the correct information from the binary. I haven’t covered that in the article, but you can check the source code how it is handled - I’ll provide a link to the Github project at the end of the article.

Parsing FAT header

It may happen that a Mach-O binary contains more than one architecture. Such a binary is called a universal binary or a FAT binary. The FAT binary is structured as follows:

FAT header is declared in MachO.fat module and it looks as follows:

public struct fat_header {
    public var magic: UInt32 /* FAT_MAGIC or FAT_MAGIC_64 */
    public var nfat_arch: UInt32 /* number of structs that follow */
}

public struct fat_arch {
    public var cputype: cpu_type_t /* cpu specifier (int) */
    public var cpusubtype: cpu_subtype_t /* machine specifier (int) */
    public var offset: UInt32 /* file offset to this object file */
    public var size: UInt32 /* size of this object file */
    public var align: UInt32 /* alignment as a power of 2 */
}

public struct fat_arch_64 {
    public var cputype: cpu_type_t /* cpu specifier (int) */
    public var cpusubtype: cpu_subtype_t /* machine specifier (int) */
    public var offset: UInt64 /* file offset to this object file */
    public var size: UInt64 /* size of this object file */
    public var align: UInt32 /* alignment as a power of 2 */
    public var reserved: UInt32 /* reserved */
}

FAT header also has magic at the beginning and then it defines the number of architectures that are embedded inside the binary. Magic tells us if the FAT binary is in 32bit or 64bit file format. After reading the magic we need to read proper fat_arch struct n times, where n is a number of architectures. The Swift code for reading 32bit header looks as follows:

func readHeader() throws -> MachHeaderFAT {
    let magic = try readMagic()

    switch magic.fileArchitecture {
    case .fat32bit:
        let header = try read32BitFATHeader()
        return .arch32(header)
    ...
    }
}

func read32BitFATHeader() throws -> MachHeaderFAT32 {
    let header = fileReader.read(dataType: fat_header.self)
    let magic = try MagicMapper().map(input: header.magic)
    let numberOfArchitectures = Int(header.nfat_arch)

    let architectures: [MachHeaderFAT32.Architecture] = try (0..<numberOfArchitectures)
        .map { _ in
            let fatHeaderArch = try read32BitFATHeaderArchitecure(
                fileOffset: fileReader.fileOffset,
            )
            return fatHeaderArch
        }

    return MachHeaderFAT32(magic: magic, architectures: architectures)
}

func read32BitFATHeaderArchitecure(fileOffset: UInt64) throws -> MachHeaderFAT32.Architecture {
    let fatArch32 = fileReader.read(dataType: fat_arch.self,
                                    fileOffset: fileOffset)

    return MachHeaderFAT32.Architecture(
        cpu: try CPUMapper().map(input: fatArch32.cputype),
        cpuSubType: cpuSubType(value: fatArch32.cpusubtype),
        offset: fatArch32.offset,
        size: fatArch32.size,
        align: fatArch32.align
    )
}

We’ve managed to read the FAT header of a binary! The next bytes of a binary refer to the first Mach-O file, which is built for the architecture that is defined as first in the FAT header.

Further improvements - reading archive header

Except for reading mach headers and FAT headers, otool is also able to read archive headers, which have a quite different format.
Archives files are nothing else than static libraries generated by the archive tool. They are a collection of object files (.o) with some additional information. If you don’t know, object files are also in the form of a Mach-O file, but just for one object (e.g. class). We can inspect the FirebaseCore binary in MachoView application to see that this is a FAT binary containing archives for various architectures.

alt text

If we wanted to implement the logic for parsing archive headers we would do the same as for mach headers and FAT headers. We would probably have to parse magic first. For archives the magic has a value 0x21 0x3C 0x61 0x72 0x63 0x68 0x3E 0x0A which is !<arch>\n in ASCII. Then we would probably have to read a header. If you are curious how does the archive header look like you can check it here. You can also take a look at how does otool parse archive headers by reading the code, which is opensource.

Summary

We have managed to implement the logic for reading Mach headers and FAT headers. In addition to that I’ve added the code for command-line arguments parsing, so we could read the headers by invoking a command-line tool with appropriate arguments. Let’s see the tool in action!

$ swift run machoreader -h <binary_file_path>
MachO Header 64bit
magic: 0xfeedfacf
cpu: ARM (64bit)
cpuSubType: 0x0
fileType: MH_EXECUTE
numberOfCommands: 80
sizeOfCommands: 8784
flags: MH_NOUNDEFS | MH_DYLDLINK | MH_TWOLEVEL | MH_WEAK_DEFINES | MH_BINDS_TO_WEAK | MH_PIE
reserved: 0

$ swift run machoreader -f <fat_binary_file_path>
MachO FAT Header 32bit
magic: 0xcafebabe
architectures: [I386 (32bit), X86_64 (64bit), ARM (32bit), ARM (64bit)]

Nice! Everything works and we have achieved the same functionality as otool gives us when it comes to headers parsing. You can check out the full implementation on my Github page. Mach-O reader is available as a swift package that contains the command-line executable.