Sprint 1: MH_OBJECT Header & Load Commands
Prerequisites
Sprint 0 — crate, harness, references in place.
Goals
Read a Mach-O relocatable object file: parse the header and every load command afs-as emits. End state: given any .o in afs-as/tests/corpus/, afs-ld can pretty-print its structure and round-trip-compare it to a golden.
Deliverables
1. Mach-O constants
afs-ld/src/macho/constants.rs: duplicate the constants afs-as uses. Numeric literals only, no imports from afs-as.
pub const MH_MAGIC_64: u32 = 0xFEEDFACF;
pub const CPU_TYPE_ARM64: u32 = 0x0100000C;
pub const MH_OBJECT: u32 = 1;
pub const MH_EXECUTE: u32 = 2;
pub const MH_DYLIB: u32 = 6;
pub const MH_SUBSECTIONS_VIA_SYMBOLS: u32 = 0x2000;
pub const LC_SEGMENT_64: u32 = 0x19;
pub const LC_SYMTAB: u32 = 0x02;
pub const LC_DYSYMTAB: u32 = 0x0B;
pub const LC_BUILD_VERSION: u32 = 0x32;
pub const LC_LINKER_OPTIMIZATION_HINT: u32 = 0x2E;
// ... plus LC_MAIN, LC_DYLD_INFO_ONLY, LC_DYLD_CHAINED_FIXUPS,
// LC_FUNCTION_STARTS, LC_DATA_IN_CODE, LC_CODE_SIGNATURE,
// LC_ID_DYLIB, LC_LOAD_DYLIB, LC_LOAD_WEAK_DYLIB,
// LC_REEXPORT_DYLIB, LC_RPATH, LC_UUID, LC_SOURCE_VERSION.
2. Header parser
afs-ld/src/macho/reader.rs:
pub struct MachHeader64 {
pub magic: u32, pub cputype: u32, pub cpusubtype: u32,
pub filetype: u32, pub ncmds: u32, pub sizeofcmds: u32,
pub flags: u32, pub reserved: u32,
}
pub fn parse_header(bytes: &[u8]) -> Result<MachHeader64, ReadError>;
Validate: magic matches MH_MAGIC_64, cputype matches CPU_TYPE_ARM64, ncmds * 8 <= sizeofcmds, 32 + sizeofcmds <= bytes.len(). Clear, sourced diagnostics via src/diag.rs.
3. Load-command dispatcher
LoadCommand enum with variants for each command afs-as emits:
pub enum LoadCommand {
Segment64(Segment64),
Symtab(SymtabCmd),
Dysymtab(DysymtabCmd),
BuildVersion(BuildVersionCmd),
LinkerOptimizationHint(LohCmd),
// placeholders for later sprints:
DyldInfoOnly, DyldChainedFixups, Main, FunctionStarts,
DataInCode, CodeSignature, IdDylib, LoadDylib, LoadWeakDylib,
ReexportDylib, Rpath, Uuid, SourceVersion,
Unknown { cmd: u32, cmdsize: u32, data: Vec<u8> },
}
pub fn parse_commands(header: &MachHeader64, bytes: &[u8]) -> Result<Vec<LoadCommand>, ReadError>;
Exhaustive matching. Unknown commands preserved (not erased) so round-trips survive.
4. Segment + section header parsing (metadata only — contents in Sprint 2)
Decode segment_command_64 (72 bytes) + N section_64 structs (80 bytes each). Store:
- segname (fixed 16 bytes, null-padded)
- sectname (fixed 16 bytes, null-padded)
- addr, size, offset, align (as log2), reloff, nreloc, flags, reserved1, reserved2, reserved3
5. LC_BUILD_VERSION + LC_LINKER_OPTIMIZATION_HINT
Decode platform (PLATFORM_MACOS = 1), minos, sdk, ntools, tool records. Decode the LOH blob as raw bytes (interpretation in Sprint 25).
6. Pretty-printer
afs-ld/src/bin/dump.rs (optional subcommand afs-ld --dump <path>): otool-like output. Used by the round-trip harness.
Testing Strategy
- Round-trip test: for every
.oinafs-as/tests/corpus/, parse, serialize back into the same byte layout (no reshuffling in this sprint — just read+echo), compare. - Malformed-input tests: truncated header, wrong magic, wrong cputype,
ncmdslying aboutsizeofcmds, unaligned commands. Each must produce a specific diagnostic, never a panic. - Differential:
otool -lVagainst our dumper for the full corpus. Diff must be zero after whitespace normalization.
Definition of Done
- All afs-as corpus
.ofiles parse cleanly. - Every load command afs-as emits is represented in
LoadCommand. - Malformed-input fuzz finds no panics.
- Round-trip byte-level equality on the full corpus.
otool -lVand our dumper agree after whitespace normalization.
View source
| 1 | # Sprint 1: MH_OBJECT Header & Load Commands |
| 2 | |
| 3 | ## Prerequisites |
| 4 | Sprint 0 — crate, harness, references in place. |
| 5 | |
| 6 | ## Goals |
| 7 | Read a Mach-O relocatable object file: parse the header and every load command afs-as emits. End state: given any `.o` in `afs-as/tests/corpus/`, afs-ld can pretty-print its structure and round-trip-compare it to a golden. |
| 8 | |
| 9 | ## Deliverables |
| 10 | |
| 11 | ### 1. Mach-O constants |
| 12 | `afs-ld/src/macho/constants.rs`: duplicate the constants afs-as uses. Numeric literals only, no imports from afs-as. |
| 13 | |
| 14 | ```rust |
| 15 | pub const MH_MAGIC_64: u32 = 0xFEEDFACF; |
| 16 | pub const CPU_TYPE_ARM64: u32 = 0x0100000C; |
| 17 | pub const MH_OBJECT: u32 = 1; |
| 18 | pub const MH_EXECUTE: u32 = 2; |
| 19 | pub const MH_DYLIB: u32 = 6; |
| 20 | pub const MH_SUBSECTIONS_VIA_SYMBOLS: u32 = 0x2000; |
| 21 | |
| 22 | pub const LC_SEGMENT_64: u32 = 0x19; |
| 23 | pub const LC_SYMTAB: u32 = 0x02; |
| 24 | pub const LC_DYSYMTAB: u32 = 0x0B; |
| 25 | pub const LC_BUILD_VERSION: u32 = 0x32; |
| 26 | pub const LC_LINKER_OPTIMIZATION_HINT: u32 = 0x2E; |
| 27 | // ... plus LC_MAIN, LC_DYLD_INFO_ONLY, LC_DYLD_CHAINED_FIXUPS, |
| 28 | // LC_FUNCTION_STARTS, LC_DATA_IN_CODE, LC_CODE_SIGNATURE, |
| 29 | // LC_ID_DYLIB, LC_LOAD_DYLIB, LC_LOAD_WEAK_DYLIB, |
| 30 | // LC_REEXPORT_DYLIB, LC_RPATH, LC_UUID, LC_SOURCE_VERSION. |
| 31 | ``` |
| 32 | |
| 33 | ### 2. Header parser |
| 34 | `afs-ld/src/macho/reader.rs`: |
| 35 | |
| 36 | ```rust |
| 37 | pub struct MachHeader64 { |
| 38 | pub magic: u32, pub cputype: u32, pub cpusubtype: u32, |
| 39 | pub filetype: u32, pub ncmds: u32, pub sizeofcmds: u32, |
| 40 | pub flags: u32, pub reserved: u32, |
| 41 | } |
| 42 | |
| 43 | pub fn parse_header(bytes: &[u8]) -> Result<MachHeader64, ReadError>; |
| 44 | ``` |
| 45 | |
| 46 | Validate: magic matches MH_MAGIC_64, cputype matches CPU_TYPE_ARM64, `ncmds * 8 <= sizeofcmds`, `32 + sizeofcmds <= bytes.len()`. Clear, sourced diagnostics via `src/diag.rs`. |
| 47 | |
| 48 | ### 3. Load-command dispatcher |
| 49 | `LoadCommand` enum with variants for each command afs-as emits: |
| 50 | |
| 51 | ```rust |
| 52 | pub enum LoadCommand { |
| 53 | Segment64(Segment64), |
| 54 | Symtab(SymtabCmd), |
| 55 | Dysymtab(DysymtabCmd), |
| 56 | BuildVersion(BuildVersionCmd), |
| 57 | LinkerOptimizationHint(LohCmd), |
| 58 | // placeholders for later sprints: |
| 59 | DyldInfoOnly, DyldChainedFixups, Main, FunctionStarts, |
| 60 | DataInCode, CodeSignature, IdDylib, LoadDylib, LoadWeakDylib, |
| 61 | ReexportDylib, Rpath, Uuid, SourceVersion, |
| 62 | Unknown { cmd: u32, cmdsize: u32, data: Vec<u8> }, |
| 63 | } |
| 64 | |
| 65 | pub fn parse_commands(header: &MachHeader64, bytes: &[u8]) -> Result<Vec<LoadCommand>, ReadError>; |
| 66 | ``` |
| 67 | |
| 68 | Exhaustive matching. Unknown commands preserved (not erased) so round-trips survive. |
| 69 | |
| 70 | ### 4. Segment + section header parsing (metadata only — contents in Sprint 2) |
| 71 | Decode `segment_command_64` (72 bytes) + N `section_64` structs (80 bytes each). Store: |
| 72 | - segname (fixed 16 bytes, null-padded) |
| 73 | - sectname (fixed 16 bytes, null-padded) |
| 74 | - addr, size, offset, align (as log2), reloff, nreloc, flags, reserved1, reserved2, reserved3 |
| 75 | |
| 76 | ### 5. LC_BUILD_VERSION + LC_LINKER_OPTIMIZATION_HINT |
| 77 | Decode platform (PLATFORM_MACOS = 1), minos, sdk, ntools, tool records. Decode the LOH blob as raw bytes (interpretation in Sprint 25). |
| 78 | |
| 79 | ### 6. Pretty-printer |
| 80 | `afs-ld/src/bin/dump.rs` (optional subcommand `afs-ld --dump <path>`): otool-like output. Used by the round-trip harness. |
| 81 | |
| 82 | ## Testing Strategy |
| 83 | - Round-trip test: for every `.o` in `afs-as/tests/corpus/`, parse, serialize back into the same byte layout (no reshuffling in this sprint — just read+echo), compare. |
| 84 | - Malformed-input tests: truncated header, wrong magic, wrong cputype, `ncmds` lying about `sizeofcmds`, unaligned commands. Each must produce a specific diagnostic, never a panic. |
| 85 | - Differential: `otool -lV` against our dumper for the full corpus. Diff must be zero after whitespace normalization. |
| 86 | |
| 87 | ## Definition of Done |
| 88 | - All afs-as corpus `.o` files parse cleanly. |
| 89 | - Every load command afs-as emits is represented in `LoadCommand`. |
| 90 | - Malformed-input fuzz finds no panics. |
| 91 | - Round-trip byte-level equality on the full corpus. |
| 92 | - `otool -lV` and our dumper agree after whitespace normalization. |