afs-ld
Bespoke ARM64 Mach-O linker for Apple Silicon, written in Rust, stdlib only.
Why
armfortas already owns the compiler (armfortas) and the assembler (afs-as). Every binary the toolchain produces today is still shaped by Apple's ld — which puts the same class of opaque, untouchable bugs back in our path that motivated abandoning LLVM in the first place. afs-ld closes the loop. We own every byte from .f90 source to the final Mach-O executable on disk.
This is not a toy or educational linker. The target is production parity with Apple ld for:
- Everything armfortas produces today: arm64 PIE executables statically linking
libarmfortas_rt.aand dynamically linkinglibSystem. - The fortsh milestone: ~57 KLoC Fortran 2018, 55 modules,
iso_c_binding, allocatable strings, derived types. - Deterministic output:
-no_uuidparity, reproducible byte layout across invocations. - All ARM64 Mach-O relocation types, static archives (
.a), binary dylibs, and TAPI TBD v4 text stubs (libSystem ships as.tbdon modern SDKs). - Both classic
LC_DYLD_INFOopcodes and modernLC_DYLD_CHAINED_FIXUPS. - Dylib output (
-dylib) as a first-class feature, not an afterthought. - Ad-hoc code signing — macOS 11+ will not execute an unsigned arm64 binary, even if every other byte is perfect.
Non-goals (current)
- ELF, COFF, PE, or any non-Mach-O format.
- x86_64, arm64_32, armv7, or any architecture other than arm64.
- Bitcode/LTO. armfortas emits assembly, not bitcode; lto is not part of the armfortas pipeline.
- ObjC / Swift metadata merging. No armfortas code emits it. Hooks exist for later.
- Cross-compilation. We target the host Mac; no
-sdk_versiontime-travel.
These are non-goals now; afs-ld is built to grow into them without architectural retrofit.
What afs-as hands us
afs-as emits MH_OBJECT only (hand-rolled, no external Mach-O crate). Its output defines the contract afs-ld reads:
- Load commands:
LC_SEGMENT_64,LC_BUILD_VERSION(PLATFORM_MACOS), optionalLC_LINKER_OPTIMIZATION_HINT,LC_SYMTAB,LC_DYSYMTAB. - Section kinds:
__TEXT,__text,__TEXT,__cstring,__TEXT,__literal16,__TEXT,__const,__TEXT,__compact_unwind,__TEXT,__eh_frame,__DATA,__data,__DATA,__bss,__DATA,__thread_data,__DATA,__thread_bss,__DATA,__thread_vars. - Symbol flags:
N_UNDF/N_SECT/N_ABSwithN_EXT,N_PEXT,N_WEAK_REF,N_WEAK_DEF,N_NO_DEAD_STRIP. Common symbols live inN_UNDFwithn_desc-encoded alignment. - Relocation types:
ARM64_RELOC_UNSIGNED,SUBTRACTOR,BRANCH26,PAGE21,PAGEOFF12,GOT_LOAD_PAGE21,GOT_LOAD_PAGEOFF12,POINTER_TO_GOT,TLVP_LOAD_PAGE21,TLVP_LOAD_PAGEOFF12,ADDEND(paired prefix). - Flags:
MH_SUBSECTIONS_VIA_SYMBOLSalways set — atomization model is in play. - LOH hints:
AdrpAdd,AdrpLdr,AdrpLdrGot,AdrpLdrGotLdr— afs-ld preserves them from Sprint 0 and relaxes them in Sprint 25.
afs-as exposes no Mach-O reader. afs-ld ships its own.
Current driver contract
armfortas/src/driver/mod.rs:497-565 shells out to ld:
ld <obj1> <obj2> ... <libarmfortas_rt.a> \
-lSystem -no_uuid -syslibroot <SDK> -e _main -o <output>
Inputs are .o from afs-as plus libarmfortas_rt.a from the runtime/ crate. Output is an arm64 PIE executable with entry _main (a synthetic wrapper at src/driver/mod.rs:371-392 calling _afs_program_init → user PROGRAM → _afs_program_finalize). afs-ld drops into this contract unchanged; the driver swap (Sprint 20) is initially gated behind AFS_LD=1.
Reference material
Already in parent .refs/:
.refs/llvm/lld/MachO/(~21 KLoC C++) — primary architectural reference. Most relevant files:Driver.cpp(pipeline),InputFiles.cpp(object/archive/dylib parsing),SymbolTable.cpp(resolution),SyntheticSections.cpp(GOT/stubs/binding),Arch/ARM64.cpp(reloc math),Writer.cpp(layout)..refs/llvm/lld/docs/MachO/index.rst— design notes comparing lld and ld64.
Cloned in Sprint 0:
.refs/ld64/— Apple's open-sourceld64(GitHub mirror of last publicly released tarball). Authoritative for byte-level parity edge cases when our diff harness disagrees with Apple'sld..refs/mold/— Rui Ueyama'smoldincluding its Darwin port. Leaner second opinion and a source of performance ideas.
Spec-level:
- Apple
<mach-o/loader.h>,<mach-o/nlist.h>,<mach-o/reloc.h>,<mach-o/arm64/reloc.h>— mirrored numerically in ourmacho/constants.rs. dyldopen source — bind/rebase/lazy-bind opcode semantics and chained-fixups format.- ARM Architecture Reference Manual (ARMv8-A) — encoding of relocated instructions (ADRP, ADD, LDR, B/BL).
Repo layout
afs-ld is a Cargo workspace member of armfortas and a Git submodule at armfortas/afs-ld pointing at git@github.com:FortranGoingOnForty/afs-ld.git, mirroring how afs-as is organized.
afs-ld/
├── Cargo.toml # no deps outside std
├── CLAUDE.md # mirrors afs-as/CLAUDE.md, tailored to linker
├── README.md
├── .docs/
│ ├── overview.md # this file
│ └── sprints/ # 32 sprint files
├── src/
│ ├── lib.rs # re-export Linker, LinkOptions, OutputKind
│ ├── main.rs # afs-ld binary
│ ├── args.rs # CLI parsing (hand-rolled, no clap)
│ ├── macho/
│ │ ├── mod.rs
│ │ ├── constants.rs # LC_*, MH_*, S_*, N_*, ARM64_RELOC_*
│ │ ├── reader.rs # parse MH_OBJECT
│ │ ├── writer.rs # emit MH_EXECUTE or MH_DYLIB
│ │ ├── dylib.rs # parse MH_DYLIB
│ │ └── tbd.rs # parse TAPI TBD v4
│ ├── archive.rs # ar/ranlib archive reader
│ ├── input.rs # InputFile enum, lazy member fetch
│ ├── symbol.rs # Symbol kinds, SymbolTable
│ ├── resolve.rs # name resolution pass
│ ├── atom.rs # subsections-via-symbols atom model
│ ├── section.rs # InputSection, OutputSection, OutputSegment
│ ├── layout.rs # VM addr + file offset assignment
│ ├── reloc/
│ │ ├── mod.rs
│ │ ├── arm64.rs # ARM64_RELOC_* application
│ │ └── loh.rs # LOH preservation / relaxation
│ ├── synth/ # synthetic sections
│ │ ├── mod.rs
│ │ ├── got.rs
│ │ ├── stubs.rs
│ │ ├── tlv.rs
│ │ ├── symtab.rs
│ │ ├── dyld_info.rs # classic rebase/bind/lazy/weak + export trie
│ │ ├── chained.rs # LC_DYLD_CHAINED_FIXUPS
│ │ ├── unwind.rs
│ │ ├── eh_frame.rs
│ │ ├── func_starts.rs
│ │ ├── data_in_code.rs
│ │ └── code_sig.rs # ad-hoc SHA-256 code signature
│ ├── map.rs # -map text link map
│ ├── why_live.rs # -why_live dead-strip reasons
│ ├── gc.rs # -dead_strip
│ ├── icf.rs # -icf=safe
│ ├── driver.rs # orchestrator
│ └── diag.rs # diagnostics, path/line/col parity with afs-as
└── tests/
├── common/harness.rs # spawn afs-ld, diff output vs system ld
├── reader_*.rs # round-trip object reads
├── reloc_*.rs # golden-file reloc application
├── resolve_*.rs # symbol resolution matrices
├── hello_world.rs # first end-to-end executable link
├── hello_library.rs # first end-to-end dylib link
├── armfortas_integration.rs
└── corpus/ # hand-curated .o / .a / .dylib / .tbd fixtures
Architecture pipeline
args → inputs → resolve → atomize → layout → apply relocs → synth sections → write → sign
- args: hand-rolled parser for the
ld-compatible CLI surface. No clap. - inputs: demultiplex
.o,.a,.dylib,.tbd; lazy archive-member fetching. - resolve: symbol-table fixed-point loop; weak/common/alias coalescing; archive-driven pulls.
- atomize: split input sections at symbol boundaries per
MH_SUBSECTIONS_VIA_SYMBOLS. - layout: assign VM addrs (
__PAGEZERO/__TEXT/__DATA_CONST/__DATA/__LINKEDITfor executables; no__PAGEZEROfor dylibs) and file offsets. - apply relocs: ARM64_RELOC_* patching; GOT/stubs/lazy-pointer emission; LOH honoring.
- synth sections:
__LINKEDITpayload — symbol table, string table,LC_DYLD_INFOand/or chained fixups, function starts, data-in-code, compact unwind, eh_frame passthrough. - write: Mach-O header + load commands + segment data;
-no_uuiddeterministic. - sign: ad-hoc SHA-256 page hashes in
LC_CODE_SIGNATUREso the binary runs on bare arm64.
Coding conventions
- Rust std only. No
clap, noserde, nobyteorder, noobject, nogoblin. Hand-roll parsers, serializers, and the tiny YAML subset we need for TBD. unsafeonly where genuinely required. Keep blocks small and commented.- Exhaustive pattern matching on
Section,Symbol,Relocation,InputFile,Fixup— no catch-all_arms outside tests. - Diagnostics: path, offset, caret under source — mirror
afs-as/src/diag*.rs. - Determinism: no timestamps in output, sorted iteration, stable hashing.
- Commit discipline: terse imperative, no co-authors, per-file/per-chunk commits, never monoliths. Matches armfortas and afs-as house rules.
- No borrowed constants across crates. afs-ld duplicates
MH_*,LC_*,S_*,N_*,ARM64_RELOC_*inmacho/constants.rsrather than depending on afs-as at a type level. Each submodule stays independent.
Testing strategy
- Unit: every parser and encoder has a round-trip test — parse a fixture, re-emit, compare bytes.
- Corpus:
tests/corpus/collects.o,.a,.dylib,.tbdfixtures. Every new relocation type or section kind lands a corpus entry in the same sprint that implements it. - Differential (from Sprint 1):
tests/common/harness.rslinks the same inputs throughldandafs-ld, diffs load commands, symbol tables, bind/rebase or chained-fixup streams, and disassembly. Tolerated-diff allowlist covers UUID, timestamp, hash-backed temp names. CI gate from sprint one. - End-to-end (from Sprint 18): hello-world executable must run; (Sprint 18.5) hello-library dylib must
dlopen. (Sprint 21) the full armfortas integration suite must pass. - fortsh link (Sprint 29): explicit milestone. A fortsh binary linked by afs-ld must behave identically to one linked by system
ld. - Audits: post-Sprint 18 (hello), 18.5 (dylib), 22 (first signed & running on bare arm64), 27 (parity gate), 29 (fortsh), 31 (final). Brutal honesty rules from armfortas/CLAUDE.md apply.
Sprint roadmap (summary)
See .docs/sprints/index.md for the full list. Ten phases, 32 sprints:
- Phase 0 — Scaffolding: Sprint 0.
- Phase 1 — Mach-O reading: Sprints 1–3 (header/load commands, sections/symbols, relocations).
- Phase 2 — Archives & dylibs: Sprints 4–6 (
ar, binary dylib, TBD). - Phase 3 — Symbol resolution: Sprints 7–9 (model, resolution pass, atomization).
- Phase 4 — Output construction: Sprints 10–14 (layout, reloc application, GOT/stubs, TLV, symtab/strtab). MH_EXECUTE and MH_DYLIB both first-class.
- Phase 5 — Dyld metadata: Sprints 15 (classic
LC_DYLD_INFO), 15.5 (chained fixups), 16 (function starts/data-in-code), 17 (unwind info). - Phase 6 — End-to-end: Sprint 18 (hello-world executable), 18.5 (hello-library dylib).
- Phase 7 — CLI & driver: Sprints 19 (CLI +
-map/-why_livediagnostics), 20 (driver swap). - Phase 8 — Runtime compatibility: Sprints 21 (runtime archive + integration tests), 22 (ad-hoc code signature).
- Phase 9 — Advanced: Sprints 23 (
-dead_strip), 24 (-icf=safe), 25 (LOH relaxation), 26 (thunks). - Phase 10 — Hardening: Sprints 27 (differential gate), 28 (performance), 29 (fortsh link audit), 30 (polish), 31 (final audit).
Scope decisions (confirmed)
- Dylib output is in scope from Phase 4; the writer is dylib-aware from Sprint 10. Dylib milestone at Sprint 18.5.
- Both classic
LC_DYLD_INFO(Sprint 15) and chained fixups (Sprint 15.5) are in scope. Chained becomes default on macOS 12+ after Sprint 27 parity gate. .refs/gains ld64 and mold alongside lld. lld is the architectural reference, ld64 is authoritative for Apple-parity edge cases, mold informs performance.-mapand-why_liveland in Sprint 19 with the core CLI. They are the debugging surface during driver adoption, not a polish item.
View source
| 1 | # afs-ld |
| 2 | |
| 3 | **Bespoke ARM64 Mach-O linker for Apple Silicon, written in Rust, stdlib only.** |
| 4 | |
| 5 | ## Why |
| 6 | |
| 7 | armfortas already owns the compiler (`armfortas`) and the assembler (`afs-as`). Every binary the toolchain produces today is still shaped by Apple's `ld` — which puts the same class of opaque, untouchable bugs back in our path that motivated abandoning LLVM in the first place. `afs-ld` closes the loop. We own every byte from `.f90` source to the final Mach-O executable on disk. |
| 8 | |
| 9 | This is **not** a toy or educational linker. The target is production parity with Apple `ld` for: |
| 10 | |
| 11 | - Everything armfortas produces today: arm64 PIE executables statically linking `libarmfortas_rt.a` and dynamically linking `libSystem`. |
| 12 | - The fortsh milestone: ~57 KLoC Fortran 2018, 55 modules, `iso_c_binding`, allocatable strings, derived types. |
| 13 | - Deterministic output: `-no_uuid` parity, reproducible byte layout across invocations. |
| 14 | - All ARM64 Mach-O relocation types, static archives (`.a`), binary dylibs, and TAPI TBD v4 text stubs (libSystem ships as `.tbd` on modern SDKs). |
| 15 | - Both classic `LC_DYLD_INFO` opcodes **and** modern `LC_DYLD_CHAINED_FIXUPS`. |
| 16 | - Dylib output (`-dylib`) as a first-class feature, not an afterthought. |
| 17 | - Ad-hoc code signing — macOS 11+ will not execute an unsigned arm64 binary, even if every other byte is perfect. |
| 18 | |
| 19 | ## Non-goals (current) |
| 20 | |
| 21 | - ELF, COFF, PE, or any non-Mach-O format. |
| 22 | - x86_64, arm64_32, armv7, or any architecture other than arm64. |
| 23 | - Bitcode/LTO. armfortas emits assembly, not bitcode; lto is not part of the armfortas pipeline. |
| 24 | - ObjC / Swift metadata merging. No armfortas code emits it. Hooks exist for later. |
| 25 | - Cross-compilation. We target the host Mac; no `-sdk_version` time-travel. |
| 26 | |
| 27 | These are non-goals **now**; afs-ld is built to grow into them without architectural retrofit. |
| 28 | |
| 29 | ## What afs-as hands us |
| 30 | |
| 31 | afs-as emits `MH_OBJECT` only (hand-rolled, no external Mach-O crate). Its output defines the contract afs-ld reads: |
| 32 | |
| 33 | - Load commands: `LC_SEGMENT_64`, `LC_BUILD_VERSION` (PLATFORM_MACOS), optional `LC_LINKER_OPTIMIZATION_HINT`, `LC_SYMTAB`, `LC_DYSYMTAB`. |
| 34 | - Section kinds: `__TEXT,__text`, `__TEXT,__cstring`, `__TEXT,__literal16`, `__TEXT,__const`, `__TEXT,__compact_unwind`, `__TEXT,__eh_frame`, `__DATA,__data`, `__DATA,__bss`, `__DATA,__thread_data`, `__DATA,__thread_bss`, `__DATA,__thread_vars`. |
| 35 | - Symbol flags: `N_UNDF`/`N_SECT`/`N_ABS` with `N_EXT`, `N_PEXT`, `N_WEAK_REF`, `N_WEAK_DEF`, `N_NO_DEAD_STRIP`. Common symbols live in `N_UNDF` with `n_desc`-encoded alignment. |
| 36 | - Relocation types: `ARM64_RELOC_UNSIGNED`, `SUBTRACTOR`, `BRANCH26`, `PAGE21`, `PAGEOFF12`, `GOT_LOAD_PAGE21`, `GOT_LOAD_PAGEOFF12`, `POINTER_TO_GOT`, `TLVP_LOAD_PAGE21`, `TLVP_LOAD_PAGEOFF12`, `ADDEND` (paired prefix). |
| 37 | - Flags: `MH_SUBSECTIONS_VIA_SYMBOLS` always set — atomization model is in play. |
| 38 | - LOH hints: `AdrpAdd`, `AdrpLdr`, `AdrpLdrGot`, `AdrpLdrGotLdr` — afs-ld preserves them from Sprint 0 and relaxes them in Sprint 25. |
| 39 | |
| 40 | afs-as exposes no Mach-O reader. afs-ld ships its own. |
| 41 | |
| 42 | ## Current driver contract |
| 43 | |
| 44 | `armfortas/src/driver/mod.rs:497-565` shells out to `ld`: |
| 45 | |
| 46 | ``` |
| 47 | ld <obj1> <obj2> ... <libarmfortas_rt.a> \ |
| 48 | -lSystem -no_uuid -syslibroot <SDK> -e _main -o <output> |
| 49 | ``` |
| 50 | |
| 51 | Inputs are `.o` from afs-as plus `libarmfortas_rt.a` from the `runtime/` crate. Output is an arm64 PIE executable with entry `_main` (a synthetic wrapper at `src/driver/mod.rs:371-392` calling `_afs_program_init` → user PROGRAM → `_afs_program_finalize`). afs-ld drops into this contract unchanged; the driver swap (Sprint 20) is initially gated behind `AFS_LD=1`. |
| 52 | |
| 53 | ## Reference material |
| 54 | |
| 55 | Already in parent `.refs/`: |
| 56 | |
| 57 | - `.refs/llvm/lld/MachO/` (~21 KLoC C++) — primary architectural reference. Most relevant files: `Driver.cpp` (pipeline), `InputFiles.cpp` (object/archive/dylib parsing), `SymbolTable.cpp` (resolution), `SyntheticSections.cpp` (GOT/stubs/binding), `Arch/ARM64.cpp` (reloc math), `Writer.cpp` (layout). |
| 58 | - `.refs/llvm/lld/docs/MachO/index.rst` — design notes comparing lld and ld64. |
| 59 | |
| 60 | Cloned in Sprint 0: |
| 61 | |
| 62 | - `.refs/ld64/` — Apple's open-source `ld64` (GitHub mirror of last publicly released tarball). Authoritative for byte-level parity edge cases when our diff harness disagrees with Apple's `ld`. |
| 63 | - `.refs/mold/` — Rui Ueyama's `mold` including its Darwin port. Leaner second opinion and a source of performance ideas. |
| 64 | |
| 65 | Spec-level: |
| 66 | |
| 67 | - Apple `<mach-o/loader.h>`, `<mach-o/nlist.h>`, `<mach-o/reloc.h>`, `<mach-o/arm64/reloc.h>` — mirrored numerically in our `macho/constants.rs`. |
| 68 | - `dyld` open source — bind/rebase/lazy-bind opcode semantics and chained-fixups format. |
| 69 | - ARM Architecture Reference Manual (ARMv8-A) — encoding of relocated instructions (ADRP, ADD, LDR, B/BL). |
| 70 | |
| 71 | ## Repo layout |
| 72 | |
| 73 | afs-ld is a Cargo workspace member of `armfortas` and a Git submodule at `armfortas/afs-ld` pointing at `git@github.com:FortranGoingOnForty/afs-ld.git`, mirroring how `afs-as` is organized. |
| 74 | |
| 75 | ``` |
| 76 | afs-ld/ |
| 77 | ├── Cargo.toml # no deps outside std |
| 78 | ├── CLAUDE.md # mirrors afs-as/CLAUDE.md, tailored to linker |
| 79 | ├── README.md |
| 80 | ├── .docs/ |
| 81 | │ ├── overview.md # this file |
| 82 | │ └── sprints/ # 32 sprint files |
| 83 | ├── src/ |
| 84 | │ ├── lib.rs # re-export Linker, LinkOptions, OutputKind |
| 85 | │ ├── main.rs # afs-ld binary |
| 86 | │ ├── args.rs # CLI parsing (hand-rolled, no clap) |
| 87 | │ ├── macho/ |
| 88 | │ │ ├── mod.rs |
| 89 | │ │ ├── constants.rs # LC_*, MH_*, S_*, N_*, ARM64_RELOC_* |
| 90 | │ │ ├── reader.rs # parse MH_OBJECT |
| 91 | │ │ ├── writer.rs # emit MH_EXECUTE or MH_DYLIB |
| 92 | │ │ ├── dylib.rs # parse MH_DYLIB |
| 93 | │ │ └── tbd.rs # parse TAPI TBD v4 |
| 94 | │ ├── archive.rs # ar/ranlib archive reader |
| 95 | │ ├── input.rs # InputFile enum, lazy member fetch |
| 96 | │ ├── symbol.rs # Symbol kinds, SymbolTable |
| 97 | │ ├── resolve.rs # name resolution pass |
| 98 | │ ├── atom.rs # subsections-via-symbols atom model |
| 99 | │ ├── section.rs # InputSection, OutputSection, OutputSegment |
| 100 | │ ├── layout.rs # VM addr + file offset assignment |
| 101 | │ ├── reloc/ |
| 102 | │ │ ├── mod.rs |
| 103 | │ │ ├── arm64.rs # ARM64_RELOC_* application |
| 104 | │ │ └── loh.rs # LOH preservation / relaxation |
| 105 | │ ├── synth/ # synthetic sections |
| 106 | │ │ ├── mod.rs |
| 107 | │ │ ├── got.rs |
| 108 | │ │ ├── stubs.rs |
| 109 | │ │ ├── tlv.rs |
| 110 | │ │ ├── symtab.rs |
| 111 | │ │ ├── dyld_info.rs # classic rebase/bind/lazy/weak + export trie |
| 112 | │ │ ├── chained.rs # LC_DYLD_CHAINED_FIXUPS |
| 113 | │ │ ├── unwind.rs |
| 114 | │ │ ├── eh_frame.rs |
| 115 | │ │ ├── func_starts.rs |
| 116 | │ │ ├── data_in_code.rs |
| 117 | │ │ └── code_sig.rs # ad-hoc SHA-256 code signature |
| 118 | │ ├── map.rs # -map text link map |
| 119 | │ ├── why_live.rs # -why_live dead-strip reasons |
| 120 | │ ├── gc.rs # -dead_strip |
| 121 | │ ├── icf.rs # -icf=safe |
| 122 | │ ├── driver.rs # orchestrator |
| 123 | │ └── diag.rs # diagnostics, path/line/col parity with afs-as |
| 124 | └── tests/ |
| 125 | ├── common/harness.rs # spawn afs-ld, diff output vs system ld |
| 126 | ├── reader_*.rs # round-trip object reads |
| 127 | ├── reloc_*.rs # golden-file reloc application |
| 128 | ├── resolve_*.rs # symbol resolution matrices |
| 129 | ├── hello_world.rs # first end-to-end executable link |
| 130 | ├── hello_library.rs # first end-to-end dylib link |
| 131 | ├── armfortas_integration.rs |
| 132 | └── corpus/ # hand-curated .o / .a / .dylib / .tbd fixtures |
| 133 | ``` |
| 134 | |
| 135 | ## Architecture pipeline |
| 136 | |
| 137 | ``` |
| 138 | args → inputs → resolve → atomize → layout → apply relocs → synth sections → write → sign |
| 139 | ``` |
| 140 | |
| 141 | 1. **args**: hand-rolled parser for the `ld`-compatible CLI surface. No clap. |
| 142 | 2. **inputs**: demultiplex `.o`, `.a`, `.dylib`, `.tbd`; lazy archive-member fetching. |
| 143 | 3. **resolve**: symbol-table fixed-point loop; weak/common/alias coalescing; archive-driven pulls. |
| 144 | 4. **atomize**: split input sections at symbol boundaries per `MH_SUBSECTIONS_VIA_SYMBOLS`. |
| 145 | 5. **layout**: assign VM addrs (`__PAGEZERO`/`__TEXT`/`__DATA_CONST`/`__DATA`/`__LINKEDIT` for executables; no `__PAGEZERO` for dylibs) and file offsets. |
| 146 | 6. **apply relocs**: ARM64_RELOC_* patching; GOT/stubs/lazy-pointer emission; LOH honoring. |
| 147 | 7. **synth sections**: `__LINKEDIT` payload — symbol table, string table, `LC_DYLD_INFO` and/or chained fixups, function starts, data-in-code, compact unwind, eh_frame passthrough. |
| 148 | 8. **write**: Mach-O header + load commands + segment data; `-no_uuid` deterministic. |
| 149 | 9. **sign**: ad-hoc SHA-256 page hashes in `LC_CODE_SIGNATURE` so the binary runs on bare arm64. |
| 150 | |
| 151 | ## Coding conventions |
| 152 | |
| 153 | - **Rust std only.** No `clap`, no `serde`, no `byteorder`, no `object`, no `goblin`. Hand-roll parsers, serializers, and the tiny YAML subset we need for TBD. |
| 154 | - **`unsafe` only where genuinely required.** Keep blocks small and commented. |
| 155 | - **Exhaustive pattern matching** on `Section`, `Symbol`, `Relocation`, `InputFile`, `Fixup` — no catch-all `_` arms outside tests. |
| 156 | - **Diagnostics**: path, offset, caret under source — mirror `afs-as/src/diag*.rs`. |
| 157 | - **Determinism**: no timestamps in output, sorted iteration, stable hashing. |
| 158 | - **Commit discipline**: terse imperative, no co-authors, per-file/per-chunk commits, never monoliths. Matches armfortas and afs-as house rules. |
| 159 | - **No borrowed constants across crates.** afs-ld duplicates `MH_*`, `LC_*`, `S_*`, `N_*`, `ARM64_RELOC_*` in `macho/constants.rs` rather than depending on afs-as at a type level. Each submodule stays independent. |
| 160 | |
| 161 | ## Testing strategy |
| 162 | |
| 163 | - **Unit**: every parser and encoder has a round-trip test — parse a fixture, re-emit, compare bytes. |
| 164 | - **Corpus**: `tests/corpus/` collects `.o`, `.a`, `.dylib`, `.tbd` fixtures. Every new relocation type or section kind lands a corpus entry in the same sprint that implements it. |
| 165 | - **Differential** (from Sprint 1): `tests/common/harness.rs` links the same inputs through `ld` and `afs-ld`, diffs load commands, symbol tables, bind/rebase or chained-fixup streams, and disassembly. Tolerated-diff allowlist covers UUID, timestamp, hash-backed temp names. CI gate from sprint one. |
| 166 | - **End-to-end** (from Sprint 18): hello-world executable must run; (Sprint 18.5) hello-library dylib must `dlopen`. (Sprint 21) the full armfortas integration suite must pass. |
| 167 | - **fortsh link** (Sprint 29): explicit milestone. A fortsh binary linked by afs-ld must behave identically to one linked by system `ld`. |
| 168 | - **Audits**: post-Sprint 18 (hello), 18.5 (dylib), 22 (first signed & running on bare arm64), 27 (parity gate), 29 (fortsh), 31 (final). Brutal honesty rules from armfortas/CLAUDE.md apply. |
| 169 | |
| 170 | ## Sprint roadmap (summary) |
| 171 | |
| 172 | See `.docs/sprints/index.md` for the full list. Ten phases, 32 sprints: |
| 173 | |
| 174 | - **Phase 0 — Scaffolding**: Sprint 0. |
| 175 | - **Phase 1 — Mach-O reading**: Sprints 1–3 (header/load commands, sections/symbols, relocations). |
| 176 | - **Phase 2 — Archives & dylibs**: Sprints 4–6 (`ar`, binary dylib, TBD). |
| 177 | - **Phase 3 — Symbol resolution**: Sprints 7–9 (model, resolution pass, atomization). |
| 178 | - **Phase 4 — Output construction**: Sprints 10–14 (layout, reloc application, GOT/stubs, TLV, symtab/strtab). MH_EXECUTE and MH_DYLIB both first-class. |
| 179 | - **Phase 5 — Dyld metadata**: Sprints 15 (classic `LC_DYLD_INFO`), 15.5 (chained fixups), 16 (function starts/data-in-code), 17 (unwind info). |
| 180 | - **Phase 6 — End-to-end**: Sprint 18 (hello-world executable), 18.5 (hello-library dylib). |
| 181 | - **Phase 7 — CLI & driver**: Sprints 19 (CLI + `-map`/`-why_live` diagnostics), 20 (driver swap). |
| 182 | - **Phase 8 — Runtime compatibility**: Sprints 21 (runtime archive + integration tests), 22 (ad-hoc code signature). |
| 183 | - **Phase 9 — Advanced**: Sprints 23 (`-dead_strip`), 24 (`-icf=safe`), 25 (LOH relaxation), 26 (thunks). |
| 184 | - **Phase 10 — Hardening**: Sprints 27 (differential gate), 28 (performance), 29 (fortsh link audit), 30 (polish), 31 (final audit). |
| 185 | |
| 186 | ## Scope decisions (confirmed) |
| 187 | |
| 188 | - Dylib output is in scope from Phase 4; the writer is dylib-aware from Sprint 10. Dylib milestone at Sprint 18.5. |
| 189 | - Both classic `LC_DYLD_INFO` (Sprint 15) and chained fixups (Sprint 15.5) are in scope. Chained becomes default on macOS 12+ after Sprint 27 parity gate. |
| 190 | - `.refs/` gains ld64 and mold alongside lld. lld is the architectural reference, ld64 is authoritative for Apple-parity edge cases, mold informs performance. |
| 191 | - `-map` and `-why_live` land in Sprint 19 with the core CLI. They are the debugging surface during driver adoption, not a polish item. |