markdown · 3899 bytes Raw Blame History

Sprint 1: MH_OBJECT Header & Load Commands

Prerequisites

Sprint 0 — crate, harness, references in place.

Goals

Read a Mach-O relocatable object file: parse the header and every load command afs-as emits. End state: given any .o in afs-as/tests/corpus/, afs-ld can pretty-print its structure and round-trip-compare it to a golden.

Deliverables

1. Mach-O constants

afs-ld/src/macho/constants.rs: duplicate the constants afs-as uses. Numeric literals only, no imports from afs-as.

pub const MH_MAGIC_64: u32 = 0xFEEDFACF;
pub const CPU_TYPE_ARM64: u32 = 0x0100000C;
pub const MH_OBJECT: u32 = 1;
pub const MH_EXECUTE: u32 = 2;
pub const MH_DYLIB: u32 = 6;
pub const MH_SUBSECTIONS_VIA_SYMBOLS: u32 = 0x2000;

pub const LC_SEGMENT_64: u32 = 0x19;
pub const LC_SYMTAB: u32 = 0x02;
pub const LC_DYSYMTAB: u32 = 0x0B;
pub const LC_BUILD_VERSION: u32 = 0x32;
pub const LC_LINKER_OPTIMIZATION_HINT: u32 = 0x2E;
// ... plus LC_MAIN, LC_DYLD_INFO_ONLY, LC_DYLD_CHAINED_FIXUPS,
//         LC_FUNCTION_STARTS, LC_DATA_IN_CODE, LC_CODE_SIGNATURE,
//         LC_ID_DYLIB, LC_LOAD_DYLIB, LC_LOAD_WEAK_DYLIB,
//         LC_REEXPORT_DYLIB, LC_RPATH, LC_UUID, LC_SOURCE_VERSION.

2. Header parser

afs-ld/src/macho/reader.rs:

pub struct MachHeader64 {
    pub magic: u32, pub cputype: u32, pub cpusubtype: u32,
    pub filetype: u32, pub ncmds: u32, pub sizeofcmds: u32,
    pub flags: u32, pub reserved: u32,
}

pub fn parse_header(bytes: &[u8]) -> Result<MachHeader64, ReadError>;

Validate: magic matches MH_MAGIC_64, cputype matches CPU_TYPE_ARM64, ncmds * 8 <= sizeofcmds, 32 + sizeofcmds <= bytes.len(). Clear, sourced diagnostics via src/diag.rs.

3. Load-command dispatcher

LoadCommand enum with variants for each command afs-as emits:

pub enum LoadCommand {
    Segment64(Segment64),
    Symtab(SymtabCmd),
    Dysymtab(DysymtabCmd),
    BuildVersion(BuildVersionCmd),
    LinkerOptimizationHint(LohCmd),
    // placeholders for later sprints:
    DyldInfoOnly, DyldChainedFixups, Main, FunctionStarts,
    DataInCode, CodeSignature, IdDylib, LoadDylib, LoadWeakDylib,
    ReexportDylib, Rpath, Uuid, SourceVersion,
    Unknown { cmd: u32, cmdsize: u32, data: Vec<u8> },
}

pub fn parse_commands(header: &MachHeader64, bytes: &[u8]) -> Result<Vec<LoadCommand>, ReadError>;

Exhaustive matching. Unknown commands preserved (not erased) so round-trips survive.

4. Segment + section header parsing (metadata only — contents in Sprint 2)

Decode segment_command_64 (72 bytes) + N section_64 structs (80 bytes each). Store:

  • segname (fixed 16 bytes, null-padded)
  • sectname (fixed 16 bytes, null-padded)
  • addr, size, offset, align (as log2), reloff, nreloc, flags, reserved1, reserved2, reserved3

5. LC_BUILD_VERSION + LC_LINKER_OPTIMIZATION_HINT

Decode platform (PLATFORM_MACOS = 1), minos, sdk, ntools, tool records. Decode the LOH blob as raw bytes (interpretation in Sprint 25).

6. Pretty-printer

afs-ld/src/bin/dump.rs (optional subcommand afs-ld --dump <path>): otool-like output. Used by the round-trip harness.

Testing Strategy

  • Round-trip test: for every .o in afs-as/tests/corpus/, parse, serialize back into the same byte layout (no reshuffling in this sprint — just read+echo), compare.
  • Malformed-input tests: truncated header, wrong magic, wrong cputype, ncmds lying about sizeofcmds, unaligned commands. Each must produce a specific diagnostic, never a panic.
  • Differential: otool -lV against our dumper for the full corpus. Diff must be zero after whitespace normalization.

Definition of Done

  • All afs-as corpus .o files parse cleanly.
  • Every load command afs-as emits is represented in LoadCommand.
  • Malformed-input fuzz finds no panics.
  • Round-trip byte-level equality on the full corpus.
  • otool -lV and our dumper agree after whitespace normalization.
View source
1 # Sprint 1: MH_OBJECT Header & Load Commands
2
3 ## Prerequisites
4 Sprint 0 — crate, harness, references in place.
5
6 ## Goals
7 Read a Mach-O relocatable object file: parse the header and every load command afs-as emits. End state: given any `.o` in `afs-as/tests/corpus/`, afs-ld can pretty-print its structure and round-trip-compare it to a golden.
8
9 ## Deliverables
10
11 ### 1. Mach-O constants
12 `afs-ld/src/macho/constants.rs`: duplicate the constants afs-as uses. Numeric literals only, no imports from afs-as.
13
14 ```rust
15 pub const MH_MAGIC_64: u32 = 0xFEEDFACF;
16 pub const CPU_TYPE_ARM64: u32 = 0x0100000C;
17 pub const MH_OBJECT: u32 = 1;
18 pub const MH_EXECUTE: u32 = 2;
19 pub const MH_DYLIB: u32 = 6;
20 pub const MH_SUBSECTIONS_VIA_SYMBOLS: u32 = 0x2000;
21
22 pub const LC_SEGMENT_64: u32 = 0x19;
23 pub const LC_SYMTAB: u32 = 0x02;
24 pub const LC_DYSYMTAB: u32 = 0x0B;
25 pub const LC_BUILD_VERSION: u32 = 0x32;
26 pub const LC_LINKER_OPTIMIZATION_HINT: u32 = 0x2E;
27 // ... plus LC_MAIN, LC_DYLD_INFO_ONLY, LC_DYLD_CHAINED_FIXUPS,
28 // LC_FUNCTION_STARTS, LC_DATA_IN_CODE, LC_CODE_SIGNATURE,
29 // LC_ID_DYLIB, LC_LOAD_DYLIB, LC_LOAD_WEAK_DYLIB,
30 // LC_REEXPORT_DYLIB, LC_RPATH, LC_UUID, LC_SOURCE_VERSION.
31 ```
32
33 ### 2. Header parser
34 `afs-ld/src/macho/reader.rs`:
35
36 ```rust
37 pub struct MachHeader64 {
38 pub magic: u32, pub cputype: u32, pub cpusubtype: u32,
39 pub filetype: u32, pub ncmds: u32, pub sizeofcmds: u32,
40 pub flags: u32, pub reserved: u32,
41 }
42
43 pub fn parse_header(bytes: &[u8]) -> Result<MachHeader64, ReadError>;
44 ```
45
46 Validate: magic matches MH_MAGIC_64, cputype matches CPU_TYPE_ARM64, `ncmds * 8 <= sizeofcmds`, `32 + sizeofcmds <= bytes.len()`. Clear, sourced diagnostics via `src/diag.rs`.
47
48 ### 3. Load-command dispatcher
49 `LoadCommand` enum with variants for each command afs-as emits:
50
51 ```rust
52 pub enum LoadCommand {
53 Segment64(Segment64),
54 Symtab(SymtabCmd),
55 Dysymtab(DysymtabCmd),
56 BuildVersion(BuildVersionCmd),
57 LinkerOptimizationHint(LohCmd),
58 // placeholders for later sprints:
59 DyldInfoOnly, DyldChainedFixups, Main, FunctionStarts,
60 DataInCode, CodeSignature, IdDylib, LoadDylib, LoadWeakDylib,
61 ReexportDylib, Rpath, Uuid, SourceVersion,
62 Unknown { cmd: u32, cmdsize: u32, data: Vec<u8> },
63 }
64
65 pub fn parse_commands(header: &MachHeader64, bytes: &[u8]) -> Result<Vec<LoadCommand>, ReadError>;
66 ```
67
68 Exhaustive matching. Unknown commands preserved (not erased) so round-trips survive.
69
70 ### 4. Segment + section header parsing (metadata only — contents in Sprint 2)
71 Decode `segment_command_64` (72 bytes) + N `section_64` structs (80 bytes each). Store:
72 - segname (fixed 16 bytes, null-padded)
73 - sectname (fixed 16 bytes, null-padded)
74 - addr, size, offset, align (as log2), reloff, nreloc, flags, reserved1, reserved2, reserved3
75
76 ### 5. LC_BUILD_VERSION + LC_LINKER_OPTIMIZATION_HINT
77 Decode platform (PLATFORM_MACOS = 1), minos, sdk, ntools, tool records. Decode the LOH blob as raw bytes (interpretation in Sprint 25).
78
79 ### 6. Pretty-printer
80 `afs-ld/src/bin/dump.rs` (optional subcommand `afs-ld --dump <path>`): otool-like output. Used by the round-trip harness.
81
82 ## Testing Strategy
83 - Round-trip test: for every `.o` in `afs-as/tests/corpus/`, parse, serialize back into the same byte layout (no reshuffling in this sprint — just read+echo), compare.
84 - Malformed-input tests: truncated header, wrong magic, wrong cputype, `ncmds` lying about `sizeofcmds`, unaligned commands. Each must produce a specific diagnostic, never a panic.
85 - Differential: `otool -lV` against our dumper for the full corpus. Diff must be zero after whitespace normalization.
86
87 ## Definition of Done
88 - All afs-as corpus `.o` files parse cleanly.
89 - Every load command afs-as emits is represented in `LoadCommand`.
90 - Malformed-input fuzz finds no panics.
91 - Round-trip byte-level equality on the full corpus.
92 - `otool -lV` and our dumper agree after whitespace normalization.