fortrangoingonforty/afs-ld / b13dcdc

Browse files

add overview and sprint plan

Authored by espadonne
SHA
b13dcdc9e353cbcd1acb4f9f55a5632927465000
Parents
9c5dcc1
Tree
418f4a8

36 changed files

StatusFile+-
A .docs/overview.md 191 0
A .docs/sprints/index.md 59 0
A .docs/sprints/sprint00.md 119 0
A .docs/sprints/sprint01.md 92 0
A .docs/sprints/sprint02.md 104 0
A .docs/sprints/sprint03.md 91 0
A .docs/sprints/sprint04.md 92 0
A .docs/sprints/sprint05.md 78 0
A .docs/sprints/sprint06.md 80 0
A .docs/sprints/sprint07.md 85 0
A .docs/sprints/sprint08.md 92 0
A .docs/sprints/sprint09.md 74 0
A .docs/sprints/sprint10.md 98 0
A .docs/sprints/sprint11.md 71 0
A .docs/sprints/sprint12.md 96 0
A .docs/sprints/sprint13.md 64 0
A .docs/sprints/sprint14.md 77 0
A .docs/sprints/sprint15.md 101 0
A .docs/sprints/sprint15_5.md 100 0
A .docs/sprints/sprint16.md 51 0
A .docs/sprints/sprint17.md 90 0
A .docs/sprints/sprint18.md 71 0
A .docs/sprints/sprint18_5.md 55 0
A .docs/sprints/sprint19.md 146 0
A .docs/sprints/sprint20.md 72 0
A .docs/sprints/sprint21.md 70 0
A .docs/sprints/sprint22.md 107 0
A .docs/sprints/sprint23.md 74 0
A .docs/sprints/sprint24.md 65 0
A .docs/sprints/sprint25.md 74 0
A .docs/sprints/sprint26.md 87 0
A .docs/sprints/sprint27.md 113 0
A .docs/sprints/sprint28.md 86 0
A .docs/sprints/sprint29.md 91 0
A .docs/sprints/sprint30.md 89 0
A .docs/sprints/sprint31.md 101 0
.docs/overview.mdadded
@@ -0,0 +1,191 @@
1
+# afs-ld
2
+
3
+**Bespoke ARM64 Mach-O linker for Apple Silicon, written in Rust, stdlib only.**
4
+
5
+## Why
6
+
7
+armfortas already owns the compiler (`armfortas`) and the assembler (`afs-as`). Every binary the toolchain produces today is still shaped by Apple's `ld` — which puts the same class of opaque, untouchable bugs back in our path that motivated abandoning LLVM in the first place. `afs-ld` closes the loop. We own every byte from `.f90` source to the final Mach-O executable on disk.
8
+
9
+This is **not** a toy or educational linker. The target is production parity with Apple `ld` for:
10
+
11
+- Everything armfortas produces today: arm64 PIE executables statically linking `libarmfortas_rt.a` and dynamically linking `libSystem`.
12
+- The fortsh milestone: ~57 KLoC Fortran 2018, 55 modules, `iso_c_binding`, allocatable strings, derived types.
13
+- Deterministic output: `-no_uuid` parity, reproducible byte layout across invocations.
14
+- All ARM64 Mach-O relocation types, static archives (`.a`), binary dylibs, and TAPI TBD v4 text stubs (libSystem ships as `.tbd` on modern SDKs).
15
+- Both classic `LC_DYLD_INFO` opcodes **and** modern `LC_DYLD_CHAINED_FIXUPS`.
16
+- Dylib output (`-dylib`) as a first-class feature, not an afterthought.
17
+- Ad-hoc code signing — macOS 11+ will not execute an unsigned arm64 binary, even if every other byte is perfect.
18
+
19
+## Non-goals (current)
20
+
21
+- ELF, COFF, PE, or any non-Mach-O format.
22
+- x86_64, arm64_32, armv7, or any architecture other than arm64.
23
+- Bitcode/LTO. armfortas emits assembly, not bitcode; lto is not part of the armfortas pipeline.
24
+- ObjC / Swift metadata merging. No armfortas code emits it. Hooks exist for later.
25
+- Cross-compilation. We target the host Mac; no `-sdk_version` time-travel.
26
+
27
+These are non-goals **now**; afs-ld is built to grow into them without architectural retrofit.
28
+
29
+## What afs-as hands us
30
+
31
+afs-as emits `MH_OBJECT` only (hand-rolled, no external Mach-O crate). Its output defines the contract afs-ld reads:
32
+
33
+- Load commands: `LC_SEGMENT_64`, `LC_BUILD_VERSION` (PLATFORM_MACOS), optional `LC_LINKER_OPTIMIZATION_HINT`, `LC_SYMTAB`, `LC_DYSYMTAB`.
34
+- Section kinds: `__TEXT,__text`, `__TEXT,__cstring`, `__TEXT,__literal16`, `__TEXT,__const`, `__TEXT,__compact_unwind`, `__TEXT,__eh_frame`, `__DATA,__data`, `__DATA,__bss`, `__DATA,__thread_data`, `__DATA,__thread_bss`, `__DATA,__thread_vars`.
35
+- Symbol flags: `N_UNDF`/`N_SECT`/`N_ABS` with `N_EXT`, `N_PEXT`, `N_WEAK_REF`, `N_WEAK_DEF`, `N_NO_DEAD_STRIP`. Common symbols live in `N_UNDF` with `n_desc`-encoded alignment.
36
+- Relocation types: `ARM64_RELOC_UNSIGNED`, `SUBTRACTOR`, `BRANCH26`, `PAGE21`, `PAGEOFF12`, `GOT_LOAD_PAGE21`, `GOT_LOAD_PAGEOFF12`, `POINTER_TO_GOT`, `TLVP_LOAD_PAGE21`, `TLVP_LOAD_PAGEOFF12`, `ADDEND` (paired prefix).
37
+- Flags: `MH_SUBSECTIONS_VIA_SYMBOLS` always set — atomization model is in play.
38
+- LOH hints: `AdrpAdd`, `AdrpLdr`, `AdrpLdrGot`, `AdrpLdrGotLdr` — afs-ld preserves them from Sprint 0 and relaxes them in Sprint 25.
39
+
40
+afs-as exposes no Mach-O reader. afs-ld ships its own.
41
+
42
+## Current driver contract
43
+
44
+`armfortas/src/driver/mod.rs:497-565` shells out to `ld`:
45
+
46
+```
47
+ld <obj1> <obj2> ... <libarmfortas_rt.a> \
48
+   -lSystem -no_uuid -syslibroot <SDK> -e _main -o <output>
49
+```
50
+
51
+Inputs are `.o` from afs-as plus `libarmfortas_rt.a` from the `runtime/` crate. Output is an arm64 PIE executable with entry `_main` (a synthetic wrapper at `src/driver/mod.rs:371-392` calling `_afs_program_init` → user PROGRAM → `_afs_program_finalize`). afs-ld drops into this contract unchanged; the driver swap (Sprint 20) is initially gated behind `AFS_LD=1`.
52
+
53
+## Reference material
54
+
55
+Already in parent `.refs/`:
56
+
57
+- `.refs/llvm/lld/MachO/` (~21 KLoC C++) — primary architectural reference. Most relevant files: `Driver.cpp` (pipeline), `InputFiles.cpp` (object/archive/dylib parsing), `SymbolTable.cpp` (resolution), `SyntheticSections.cpp` (GOT/stubs/binding), `Arch/ARM64.cpp` (reloc math), `Writer.cpp` (layout).
58
+- `.refs/llvm/lld/docs/MachO/index.rst` — design notes comparing lld and ld64.
59
+
60
+Cloned in Sprint 0:
61
+
62
+- `.refs/ld64/` — Apple's open-source `ld64` (GitHub mirror of last publicly released tarball). Authoritative for byte-level parity edge cases when our diff harness disagrees with Apple's `ld`.
63
+- `.refs/mold/` — Rui Ueyama's `mold` including its Darwin port. Leaner second opinion and a source of performance ideas.
64
+
65
+Spec-level:
66
+
67
+- Apple `<mach-o/loader.h>`, `<mach-o/nlist.h>`, `<mach-o/reloc.h>`, `<mach-o/arm64/reloc.h>` — mirrored numerically in our `macho/constants.rs`.
68
+- `dyld` open source — bind/rebase/lazy-bind opcode semantics and chained-fixups format.
69
+- ARM Architecture Reference Manual (ARMv8-A) — encoding of relocated instructions (ADRP, ADD, LDR, B/BL).
70
+
71
+## Repo layout
72
+
73
+afs-ld is a Cargo workspace member of `armfortas` and a Git submodule at `armfortas/afs-ld` pointing at `git@github.com:FortranGoingOnForty/afs-ld.git`, mirroring how `afs-as` is organized.
74
+
75
+```
76
+afs-ld/
77
+├── Cargo.toml                 # no deps outside std
78
+├── CLAUDE.md                  # mirrors afs-as/CLAUDE.md, tailored to linker
79
+├── README.md
80
+├── .docs/
81
+│   ├── overview.md            # this file
82
+│   └── sprints/               # 32 sprint files
83
+├── src/
84
+│   ├── lib.rs                 # re-export Linker, LinkOptions, OutputKind
85
+│   ├── main.rs                # afs-ld binary
86
+│   ├── args.rs                # CLI parsing (hand-rolled, no clap)
87
+│   ├── macho/
88
+│   │   ├── mod.rs
89
+│   │   ├── constants.rs       # LC_*, MH_*, S_*, N_*, ARM64_RELOC_*
90
+│   │   ├── reader.rs          # parse MH_OBJECT
91
+│   │   ├── writer.rs          # emit MH_EXECUTE or MH_DYLIB
92
+│   │   ├── dylib.rs           # parse MH_DYLIB
93
+│   │   └── tbd.rs             # parse TAPI TBD v4
94
+│   ├── archive.rs             # ar/ranlib archive reader
95
+│   ├── input.rs               # InputFile enum, lazy member fetch
96
+│   ├── symbol.rs              # Symbol kinds, SymbolTable
97
+│   ├── resolve.rs             # name resolution pass
98
+│   ├── atom.rs                # subsections-via-symbols atom model
99
+│   ├── section.rs             # InputSection, OutputSection, OutputSegment
100
+│   ├── layout.rs              # VM addr + file offset assignment
101
+│   ├── reloc/
102
+│   │   ├── mod.rs
103
+│   │   ├── arm64.rs           # ARM64_RELOC_* application
104
+│   │   └── loh.rs             # LOH preservation / relaxation
105
+│   ├── synth/                 # synthetic sections
106
+│   │   ├── mod.rs
107
+│   │   ├── got.rs
108
+│   │   ├── stubs.rs
109
+│   │   ├── tlv.rs
110
+│   │   ├── symtab.rs
111
+│   │   ├── dyld_info.rs       # classic rebase/bind/lazy/weak + export trie
112
+│   │   ├── chained.rs         # LC_DYLD_CHAINED_FIXUPS
113
+│   │   ├── unwind.rs
114
+│   │   ├── eh_frame.rs
115
+│   │   ├── func_starts.rs
116
+│   │   ├── data_in_code.rs
117
+│   │   └── code_sig.rs        # ad-hoc SHA-256 code signature
118
+│   ├── map.rs                 # -map text link map
119
+│   ├── why_live.rs            # -why_live dead-strip reasons
120
+│   ├── gc.rs                  # -dead_strip
121
+│   ├── icf.rs                 # -icf=safe
122
+│   ├── driver.rs              # orchestrator
123
+│   └── diag.rs                # diagnostics, path/line/col parity with afs-as
124
+└── tests/
125
+    ├── common/harness.rs      # spawn afs-ld, diff output vs system ld
126
+    ├── reader_*.rs            # round-trip object reads
127
+    ├── reloc_*.rs             # golden-file reloc application
128
+    ├── resolve_*.rs           # symbol resolution matrices
129
+    ├── hello_world.rs         # first end-to-end executable link
130
+    ├── hello_library.rs       # first end-to-end dylib link
131
+    ├── armfortas_integration.rs
132
+    └── corpus/                # hand-curated .o / .a / .dylib / .tbd fixtures
133
+```
134
+
135
+## Architecture pipeline
136
+
137
+```
138
+args → inputs → resolve → atomize → layout → apply relocs → synth sections → write → sign
139
+```
140
+
141
+1. **args**: hand-rolled parser for the `ld`-compatible CLI surface. No clap.
142
+2. **inputs**: demultiplex `.o`, `.a`, `.dylib`, `.tbd`; lazy archive-member fetching.
143
+3. **resolve**: symbol-table fixed-point loop; weak/common/alias coalescing; archive-driven pulls.
144
+4. **atomize**: split input sections at symbol boundaries per `MH_SUBSECTIONS_VIA_SYMBOLS`.
145
+5. **layout**: assign VM addrs (`__PAGEZERO`/`__TEXT`/`__DATA_CONST`/`__DATA`/`__LINKEDIT` for executables; no `__PAGEZERO` for dylibs) and file offsets.
146
+6. **apply relocs**: ARM64_RELOC_* patching; GOT/stubs/lazy-pointer emission; LOH honoring.
147
+7. **synth sections**: `__LINKEDIT` payload — symbol table, string table, `LC_DYLD_INFO` and/or chained fixups, function starts, data-in-code, compact unwind, eh_frame passthrough.
148
+8. **write**: Mach-O header + load commands + segment data; `-no_uuid` deterministic.
149
+9. **sign**: ad-hoc SHA-256 page hashes in `LC_CODE_SIGNATURE` so the binary runs on bare arm64.
150
+
151
+## Coding conventions
152
+
153
+- **Rust std only.** No `clap`, no `serde`, no `byteorder`, no `object`, no `goblin`. Hand-roll parsers, serializers, and the tiny YAML subset we need for TBD.
154
+- **`unsafe` only where genuinely required.** Keep blocks small and commented.
155
+- **Exhaustive pattern matching** on `Section`, `Symbol`, `Relocation`, `InputFile`, `Fixup` — no catch-all `_` arms outside tests.
156
+- **Diagnostics**: path, offset, caret under source — mirror `afs-as/src/diag*.rs`.
157
+- **Determinism**: no timestamps in output, sorted iteration, stable hashing.
158
+- **Commit discipline**: terse imperative, no co-authors, per-file/per-chunk commits, never monoliths. Matches armfortas and afs-as house rules.
159
+- **No borrowed constants across crates.** afs-ld duplicates `MH_*`, `LC_*`, `S_*`, `N_*`, `ARM64_RELOC_*` in `macho/constants.rs` rather than depending on afs-as at a type level. Each submodule stays independent.
160
+
161
+## Testing strategy
162
+
163
+- **Unit**: every parser and encoder has a round-trip test — parse a fixture, re-emit, compare bytes.
164
+- **Corpus**: `tests/corpus/` collects `.o`, `.a`, `.dylib`, `.tbd` fixtures. Every new relocation type or section kind lands a corpus entry in the same sprint that implements it.
165
+- **Differential** (from Sprint 1): `tests/common/harness.rs` links the same inputs through `ld` and `afs-ld`, diffs load commands, symbol tables, bind/rebase or chained-fixup streams, and disassembly. Tolerated-diff allowlist covers UUID, timestamp, hash-backed temp names. CI gate from sprint one.
166
+- **End-to-end** (from Sprint 18): hello-world executable must run; (Sprint 18.5) hello-library dylib must `dlopen`. (Sprint 21) the full armfortas integration suite must pass.
167
+- **fortsh link** (Sprint 29): explicit milestone. A fortsh binary linked by afs-ld must behave identically to one linked by system `ld`.
168
+- **Audits**: post-Sprint 18 (hello), 18.5 (dylib), 22 (first signed & running on bare arm64), 27 (parity gate), 29 (fortsh), 31 (final). Brutal honesty rules from armfortas/CLAUDE.md apply.
169
+
170
+## Sprint roadmap (summary)
171
+
172
+See `.docs/sprints/index.md` for the full list. Ten phases, 32 sprints:
173
+
174
+- **Phase 0 — Scaffolding**: Sprint 0.
175
+- **Phase 1 — Mach-O reading**: Sprints 1–3 (header/load commands, sections/symbols, relocations).
176
+- **Phase 2 — Archives & dylibs**: Sprints 4–6 (`ar`, binary dylib, TBD).
177
+- **Phase 3 — Symbol resolution**: Sprints 7–9 (model, resolution pass, atomization).
178
+- **Phase 4 — Output construction**: Sprints 10–14 (layout, reloc application, GOT/stubs, TLV, symtab/strtab). MH_EXECUTE and MH_DYLIB both first-class.
179
+- **Phase 5 — Dyld metadata**: Sprints 15 (classic `LC_DYLD_INFO`), 15.5 (chained fixups), 16 (function starts/data-in-code), 17 (unwind info).
180
+- **Phase 6 — End-to-end**: Sprint 18 (hello-world executable), 18.5 (hello-library dylib).
181
+- **Phase 7 — CLI & driver**: Sprints 19 (CLI + `-map`/`-why_live` diagnostics), 20 (driver swap).
182
+- **Phase 8 — Runtime compatibility**: Sprints 21 (runtime archive + integration tests), 22 (ad-hoc code signature).
183
+- **Phase 9 — Advanced**: Sprints 23 (`-dead_strip`), 24 (`-icf=safe`), 25 (LOH relaxation), 26 (thunks).
184
+- **Phase 10 — Hardening**: Sprints 27 (differential gate), 28 (performance), 29 (fortsh link audit), 30 (polish), 31 (final audit).
185
+
186
+## Scope decisions (confirmed)
187
+
188
+- Dylib output is in scope from Phase 4; the writer is dylib-aware from Sprint 10. Dylib milestone at Sprint 18.5.
189
+- Both classic `LC_DYLD_INFO` (Sprint 15) and chained fixups (Sprint 15.5) are in scope. Chained becomes default on macOS 12+ after Sprint 27 parity gate.
190
+- `.refs/` gains ld64 and mold alongside lld. lld is the architectural reference, ld64 is authoritative for Apple-parity edge cases, mold informs performance.
191
+- `-map` and `-why_live` land in Sprint 19 with the core CLI. They are the debugging surface during driver adoption, not a polish item.
.docs/sprints/index.mdadded
@@ -0,0 +1,59 @@
1
+# afs-ld Sprint Index
2
+
3
+32 sprints across 10 phases. Small bites, clear milestones, testable deliverables at every stage. Each sprint is independently reviewable and mergeable; every sprint that lands new surface area also lands corpus fixtures and differential coverage.
4
+
5
+## Phase 0 — Scaffolding
6
+- [Sprint 0](sprint00.md) — Scaffolding, References, Harness
7
+
8
+## Phase 1 — Mach-O Reading
9
+- [Sprint 1](sprint01.md) — MH_OBJECT Header & Load Commands
10
+- [Sprint 2](sprint02.md) — Sections, Symbols, String Tables
11
+- [Sprint 3](sprint03.md) — Relocations (Read-Side)
12
+
13
+## Phase 2 — Archives & Dylibs
14
+- [Sprint 4](sprint04.md) — Static Archives (ar)
15
+- [Sprint 5](sprint05.md) — Dylibs (MH_DYLIB binary)
16
+- [Sprint 6](sprint06.md) — TAPI TBD Text Stubs
17
+
18
+## Phase 3 — Symbol Resolution
19
+- [Sprint 7](sprint07.md) — Symbol Model & Table
20
+- [Sprint 8](sprint08.md) — Name Resolution Pass
21
+- [Sprint 9](sprint09.md) — Subsections-via-Symbols Atomization
22
+
23
+## Phase 4 — Output Construction (MH_EXECUTE and MH_DYLIB both first-class)
24
+- [Sprint 10](sprint10.md) — Output Segment & Section Layout (dylib-aware)
25
+- [Sprint 11](sprint11.md) — Core Relocation Application (ARM64)
26
+- [Sprint 12](sprint12.md) — GOT, Stubs, Lazy Pointers
27
+- [Sprint 13](sprint13.md) — TLV Relocations
28
+- [Sprint 14](sprint14.md) — LC_SYMTAB / LC_DYSYMTAB / String Table
29
+
30
+## Phase 5 — Dynamic Linker Metadata
31
+- [Sprint 15](sprint15.md) — Classic LC_DYLD_INFO Opcodes
32
+- [Sprint 15.5](sprint15_5.md) — Chained Fixups (LC_DYLD_CHAINED_FIXUPS)
33
+- [Sprint 16](sprint16.md) — LC_FUNCTION_STARTS & LC_DATA_IN_CODE
34
+- [Sprint 17](sprint17.md) — Unwind Info
35
+
36
+## Phase 6 — First End-to-End
37
+- [Sprint 18](sprint18.md) — HELLO WORLD MILESTONE (Executable)
38
+- [Sprint 18.5](sprint18_5.md) — HELLO LIBRARY MILESTONE (Dylib)
39
+
40
+## Phase 7 — CLI & Driver Integration
41
+- [Sprint 19](sprint19.md) — CLI Surface + Diagnostics (-map, -why_live)
42
+- [Sprint 20](sprint20.md) — Driver Swap
43
+
44
+## Phase 8 — Runtime Compatibility
45
+- [Sprint 21](sprint21.md) — Runtime Archive Linking
46
+- [Sprint 22](sprint22.md) — Code Signature (Ad-Hoc)
47
+
48
+## Phase 9 — Advanced Features
49
+- [Sprint 23](sprint23.md) — Dead Strip (`-dead_strip`)
50
+- [Sprint 24](sprint24.md) — ICF (`-icf=safe`)
51
+- [Sprint 25](sprint25.md) — LOH Relaxation
52
+- [Sprint 26](sprint26.md) — Thunks for Out-of-Range Branches
53
+
54
+## Phase 10 — Production Hardening
55
+- [Sprint 27](sprint27.md) — Differential Harness vs Apple ld
56
+- [Sprint 28](sprint28.md) — Performance & Parallelism
57
+- [Sprint 29](sprint29.md) — fortsh Link Audit
58
+- [Sprint 30](sprint30.md) — Diagnostics & Polish
59
+- [Sprint 31](sprint31.md) — Final Audit
.docs/sprints/sprint00.mdadded
@@ -0,0 +1,119 @@
1
+# Sprint 0: Scaffolding, References, Harness
2
+
3
+## Prerequisites
4
+None — this is where afs-ld begins. Assumes armfortas and afs-as already exist and compile.
5
+
6
+## Current state to remediate
7
+
8
+The `afs-ld/` directory currently sits inside armfortas's working tree as a plain subdirectory. `afs-ld/.gitignore` was accidentally committed to armfortas history (commit `85d5ba8 "init"`), tracking a file that will shortly belong to a separate repo. Before anything else in this sprint, afs-ld must be extracted into its own repo and wired back in as a Git submodule. The tracked `.gitignore` must be removed from armfortas's index (history can stay; removing a single file at HEAD is clean) so that afs-ld's contents live in the submodule and nowhere else.
9
+
10
+## Goals
11
+Extract afs-ld into its own repo, wire it back as a submodule, stand up the crate (CLAUDE.md, README, Cargo.toml, skeleton source), clone reference linkers, build the differential harness. End state: `cargo test -p afs-ld` runs from the parent workspace, at least one test passes meaningfully, and `git submodule status` lists afs-ld alongside afs-as.
12
+
13
+## Deliverables
14
+
15
+### 1. Submodule remediation (do first)
16
+
17
+Goal: move from "afs-ld is a tracked subdirectory of armfortas" to "afs-ld is a submodule pointing at `git@github.com:FortranGoingOnForty/afs-ld.git`". Exact sequence:
18
+
19
+1. **Preserve the current afs-ld contents**: copy `armfortas/afs-ld/` to a temp location (the `.docs/overview.md` and sprint files produced in planning are the primary content to preserve; `.fackr/` is scratch and can be dropped).
20
+2. **Untrack from armfortas**: `git rm --cached afs-ld/.gitignore && git commit -m "remove accidentally-tracked afs-ld/.gitignore"`. Confirm `git ls-files afs-ld` returns empty.
21
+3. **Delete the directory from armfortas's working tree** (submodule-add will recreate it): `rm -rf afs-ld/`.
22
+4. **Create the external repo** `FortranGoingOnForty/afs-ld` on GitHub (empty, no README — submodule-add will seed it).
23
+5. **Initialize locally and push**: in a scratch directory,
24
+   ```
25
+   git init afs-ld
26
+   cd afs-ld
27
+   <copy preserved contents back>
28
+   git add -A
29
+   git commit -m "init"
30
+   git remote add origin git@github.com:FortranGoingOnForty/afs-ld.git
31
+   git push -u origin trunk
32
+   ```
33
+6. **Add as submodule in armfortas**:
34
+   ```
35
+   cd <armfortas root>
36
+   git submodule add git@github.com:FortranGoingOnForty/afs-ld.git afs-ld
37
+   git commit -m "add afs-ld submodule"
38
+   ```
39
+   Confirm `.gitmodules` gained the stanza:
40
+   ```
41
+   [submodule "afs-ld"]
42
+       path = afs-ld
43
+       url = git@github.com:FortranGoingOnForty/afs-ld.git
44
+   ```
45
+7. **Verify** with `git submodule status` — afs-as and afs-ld both listed, each pinned to a commit hash.
46
+
47
+Note: the old `git rm --cached` commit stays in armfortas history. Rewriting history to erase it is more destructive than it's worth for a single `.gitignore` file. The file at HEAD is gone; that is sufficient.
48
+
49
+### 2. Crate wiring
50
+
51
+- Root `Cargo.toml` adds `"afs-ld"` to `[workspace] members` alongside `afs-as`.
52
+- `afs-ld/Cargo.toml`: binary + library, zero external dependencies, `edition = "2021"`. Mirror `afs-as/Cargo.toml` — same `keywords`, `categories`, `license = "GPL-3.0-only"`, adjusted `description` and `repository`.
53
+- `afs-ld/src/lib.rs`: public re-exports of `Linker`, `LinkOptions`, `OutputKind` (types are stubs this sprint).
54
+- `afs-ld/src/main.rs`: CLI that prints usage and exits 0 when run with no args; forwards real args to `Linker::run` (which errors with "not yet implemented" this sprint).
55
+
56
+### 3. CLAUDE.md and README.md
57
+
58
+- `afs-ld/CLAUDE.md`: mirror `afs-as/CLAUDE.md`, replacing assembler-isms with linker-isms. Non-negotiable rules: Rust std only, exhaustive matching, caret diagnostics, per-chunk commits, no co-authors, no sprint-number references in commits.
59
+- `afs-ld/README.md`: one-page intro, supported CLI subset at current state (nothing yet — say so), build/test commands.
60
+- `afs-ld/.gitignore`: `target/`, `.fackr/`, no `.docs/` since those files live in the repo. This one is intentionally tracked because it lives inside afs-ld's own repo now, not armfortas's.
61
+
62
+### 4. Reference clones
63
+
64
+Add to parent `.refs/` (gitignored):
65
+
66
+- `.refs/ld64/` — `git clone --depth 1 https://github.com/apple-oss-distributions/ld64.git`. Apple's last publicly released ld64. Authoritative for Apple-parity edge cases.
67
+- `.refs/mold/` — `git clone --depth 1 https://github.com/rui314/mold.git`. Performance reference and a second Rust-adjacent angle on Mach-O.
68
+
69
+`.refs/llvm/lld/MachO/` already exists from armfortas Sprint 0 — primary architectural reference.
70
+
71
+### 5. Differential harness
72
+
73
+`afs-ld/tests/common/harness.rs`:
74
+
75
+```rust
76
+pub struct LinkCase {
77
+    pub name: &'static str,
78
+    pub inputs: Vec<PathBuf>,       // .o / .a / .tbd
79
+    pub args: Vec<String>,          // -o, -e, -syslibroot, -l, ...
80
+}
81
+
82
+pub struct LinkOutputs {
83
+    pub ours: Vec<u8>,              // afs-ld output
84
+    pub theirs: Vec<u8>,            // system ld output
85
+}
86
+
87
+pub fn link_both(case: &LinkCase) -> LinkOutputs;
88
+pub fn diff_macho(ours: &[u8], theirs: &[u8]) -> DiffReport;
89
+```
90
+
91
+`DiffReport` categorizes byte differences as `Tolerated` (UUID, timestamp, temp-path hashes) or `Critical` (anything else). Critical diffs fail the test. `link_both` shells out to `ld` via `xcrun -f ld` so it picks up the active toolchain.
92
+
93
+### 6. Skeleton CLI and first failing test
94
+
95
+- `afs-ld/src/args.rs`: hand-rolled argv parser stub that recognizes `-o`, `-e`, `-arch`, and positional inputs. Unknown flags error loudly with a hint.
96
+- `afs-ld/tests/reader_empty.rs`: attempts to link `0 inputs → empty output`, expects the diagnostic `"afs-ld: error: no input files"`. Passes today by producing that exact string.
97
+- `afs-ld/tests/diff_harness_sanity.rs`: runs the harness against a known-identical pair (two copies of the same pre-linked binary produced by `xcrun ld`) and expects zero diffs. Passes.
98
+- `afs-ld/tests/diff_harness_finds_critical.rs`: feeds the harness two binaries that differ in a non-tolerated byte range (e.g. different text bytes) and asserts the harness reports `Critical`. Passes.
99
+
100
+## Testing Strategy
101
+
102
+- `cargo build -p afs-ld` compiles from a fresh clone of the parent with `git submodule update --init --recursive`.
103
+- `cargo test -p afs-ld` runs harness-sanity, critical-detection, and empty-input tests.
104
+- `cargo clippy -p afs-ld -- -D warnings` clean.
105
+- Manual verification of submodule state:
106
+  - `git ls-files | grep afs-ld` in armfortas prints only the `.gitmodules` entry (and nothing under `afs-ld/`).
107
+  - `git submodule status` shows both afs-as and afs-ld with valid commit hashes.
108
+  - `git submodule update --init --recursive` on a fresh armfortas clone populates afs-ld correctly.
109
+
110
+## Definition of Done
111
+
112
+- The accidentally-tracked `afs-ld/.gitignore` is removed from armfortas's index at HEAD.
113
+- afs-ld exists as a standalone GitHub repo under `FortranGoingOnForty`.
114
+- afs-ld is wired into armfortas as a Git submodule, visible in `.gitmodules` and `git submodule status`.
115
+- `armfortas/Cargo.toml` lists `afs-ld` in `[workspace] members`.
116
+- `afs-ld/CLAUDE.md`, `README.md`, `Cargo.toml`, `src/lib.rs`, `src/main.rs`, `src/args.rs` all committed in the new repo.
117
+- `.refs/ld64/` and `.refs/mold/` cloned.
118
+- Differential harness runs, correctly reports zero diffs on identical binaries, correctly reports critical diffs on intentionally-different binaries.
119
+- `cargo test --workspace` green.
.docs/sprints/sprint01.mdadded
@@ -0,0 +1,92 @@
1
+# Sprint 1: MH_OBJECT Header & Load Commands
2
+
3
+## Prerequisites
4
+Sprint 0 — crate, harness, references in place.
5
+
6
+## Goals
7
+Read a Mach-O relocatable object file: parse the header and every load command afs-as emits. End state: given any `.o` in `afs-as/tests/corpus/`, afs-ld can pretty-print its structure and round-trip-compare it to a golden.
8
+
9
+## Deliverables
10
+
11
+### 1. Mach-O constants
12
+`afs-ld/src/macho/constants.rs`: duplicate the constants afs-as uses. Numeric literals only, no imports from afs-as.
13
+
14
+```rust
15
+pub const MH_MAGIC_64: u32 = 0xFEEDFACF;
16
+pub const CPU_TYPE_ARM64: u32 = 0x0100000C;
17
+pub const MH_OBJECT: u32 = 1;
18
+pub const MH_EXECUTE: u32 = 2;
19
+pub const MH_DYLIB: u32 = 6;
20
+pub const MH_SUBSECTIONS_VIA_SYMBOLS: u32 = 0x2000;
21
+
22
+pub const LC_SEGMENT_64: u32 = 0x19;
23
+pub const LC_SYMTAB: u32 = 0x02;
24
+pub const LC_DYSYMTAB: u32 = 0x0B;
25
+pub const LC_BUILD_VERSION: u32 = 0x32;
26
+pub const LC_LINKER_OPTIMIZATION_HINT: u32 = 0x2E;
27
+// ... plus LC_MAIN, LC_DYLD_INFO_ONLY, LC_DYLD_CHAINED_FIXUPS,
28
+//         LC_FUNCTION_STARTS, LC_DATA_IN_CODE, LC_CODE_SIGNATURE,
29
+//         LC_ID_DYLIB, LC_LOAD_DYLIB, LC_LOAD_WEAK_DYLIB,
30
+//         LC_REEXPORT_DYLIB, LC_RPATH, LC_UUID, LC_SOURCE_VERSION.
31
+```
32
+
33
+### 2. Header parser
34
+`afs-ld/src/macho/reader.rs`:
35
+
36
+```rust
37
+pub struct MachHeader64 {
38
+    pub magic: u32, pub cputype: u32, pub cpusubtype: u32,
39
+    pub filetype: u32, pub ncmds: u32, pub sizeofcmds: u32,
40
+    pub flags: u32, pub reserved: u32,
41
+}
42
+
43
+pub fn parse_header(bytes: &[u8]) -> Result<MachHeader64, ReadError>;
44
+```
45
+
46
+Validate: magic matches MH_MAGIC_64, cputype matches CPU_TYPE_ARM64, `ncmds * 8 <= sizeofcmds`, `32 + sizeofcmds <= bytes.len()`. Clear, sourced diagnostics via `src/diag.rs`.
47
+
48
+### 3. Load-command dispatcher
49
+`LoadCommand` enum with variants for each command afs-as emits:
50
+
51
+```rust
52
+pub enum LoadCommand {
53
+    Segment64(Segment64),
54
+    Symtab(SymtabCmd),
55
+    Dysymtab(DysymtabCmd),
56
+    BuildVersion(BuildVersionCmd),
57
+    LinkerOptimizationHint(LohCmd),
58
+    // placeholders for later sprints:
59
+    DyldInfoOnly, DyldChainedFixups, Main, FunctionStarts,
60
+    DataInCode, CodeSignature, IdDylib, LoadDylib, LoadWeakDylib,
61
+    ReexportDylib, Rpath, Uuid, SourceVersion,
62
+    Unknown { cmd: u32, cmdsize: u32, data: Vec<u8> },
63
+}
64
+
65
+pub fn parse_commands(header: &MachHeader64, bytes: &[u8]) -> Result<Vec<LoadCommand>, ReadError>;
66
+```
67
+
68
+Exhaustive matching. Unknown commands preserved (not erased) so round-trips survive.
69
+
70
+### 4. Segment + section header parsing (metadata only — contents in Sprint 2)
71
+Decode `segment_command_64` (72 bytes) + N `section_64` structs (80 bytes each). Store:
72
+- segname (fixed 16 bytes, null-padded)
73
+- sectname (fixed 16 bytes, null-padded)
74
+- addr, size, offset, align (as log2), reloff, nreloc, flags, reserved1, reserved2, reserved3
75
+
76
+### 5. LC_BUILD_VERSION + LC_LINKER_OPTIMIZATION_HINT
77
+Decode platform (PLATFORM_MACOS = 1), minos, sdk, ntools, tool records. Decode the LOH blob as raw bytes (interpretation in Sprint 25).
78
+
79
+### 6. Pretty-printer
80
+`afs-ld/src/bin/dump.rs` (optional subcommand `afs-ld --dump <path>`): otool-like output. Used by the round-trip harness.
81
+
82
+## Testing Strategy
83
+- Round-trip test: for every `.o` in `afs-as/tests/corpus/`, parse, serialize back into the same byte layout (no reshuffling in this sprint — just read+echo), compare.
84
+- Malformed-input tests: truncated header, wrong magic, wrong cputype, `ncmds` lying about `sizeofcmds`, unaligned commands. Each must produce a specific diagnostic, never a panic.
85
+- Differential: `otool -lV` against our dumper for the full corpus. Diff must be zero after whitespace normalization.
86
+
87
+## Definition of Done
88
+- All afs-as corpus `.o` files parse cleanly.
89
+- Every load command afs-as emits is represented in `LoadCommand`.
90
+- Malformed-input fuzz finds no panics.
91
+- Round-trip byte-level equality on the full corpus.
92
+- `otool -lV` and our dumper agree after whitespace normalization.
.docs/sprints/sprint02.mdadded
@@ -0,0 +1,104 @@
1
+# Sprint 2: Sections, Symbols, String Tables
2
+
3
+## Prerequisites
4
+Sprint 1 — header + load commands parsed.
5
+
6
+## Goals
7
+Decode section payloads, the symbol table (nlist_64), and the string table. Expose the full section/symbol/string model that later sprints build on.
8
+
9
+## Deliverables
10
+
11
+### 1. Section attributes and kinds
12
+`afs-ld/src/section.rs` — `SectionKind` mirrors afs-as's but richer on the reader side (we receive inputs with flags already set):
13
+
14
+```rust
15
+pub enum SectionKind {
16
+    Text, CStringLiterals, Literal4, Literal8, Literal16,
17
+    ConstData, Data, ZeroFill, GbZeroFill,
18
+    ThreadLocalRegular, ThreadLocalZerofill,
19
+    ThreadLocalVariables, ThreadLocalInitPointers,
20
+    CompactUnwind, EhFrame, Coalesced,
21
+    Regular, Unknown,
22
+}
23
+
24
+pub fn kind_from_flags(flags: u32) -> SectionKind; // S_* attribute bits
25
+```
26
+
27
+Respect all `S_ATTR_*` flags and section-type nibble (`flags & 0xff`).
28
+
29
+### 2. Section content slicing
30
+`InputSection` struct: segment, name, kind, addr, size, align (log2), flags, raw `data: &[u8]` borrowed from the mmap'd input, plus the raw relocation entries as `&[u8]` (decoded in Sprint 3). For `S_ZEROFILL` / `S_THREAD_LOCAL_ZEROFILL`, `data` is empty; size is virtual.
31
+
32
+### 3. nlist_64 and symbol flags
33
+`afs-ld/src/symbol.rs`:
34
+
35
+```rust
36
+pub const N_STAB: u8 = 0xe0;
37
+pub const N_PEXT: u8 = 0x10;
38
+pub const N_TYPE: u8 = 0x0e; // mask
39
+pub const N_EXT:  u8 = 0x01;
40
+
41
+pub const N_UNDF: u8 = 0x0;
42
+pub const N_ABS:  u8 = 0x2;
43
+pub const N_SECT: u8 = 0xe;
44
+pub const N_INDR: u8 = 0xa;
45
+
46
+pub const N_NO_DEAD_STRIP: u16 = 0x0020;
47
+pub const N_WEAK_REF:      u16 = 0x0040;
48
+pub const N_WEAK_DEF:      u16 = 0x0080;
49
+pub const N_ARM_THUMB_DEF: u16 = 0x0008;
50
+pub const N_SYMBOL_RESOLVER: u16 = 0x0100;
51
+
52
+pub struct RawNlist {
53
+    pub strx: u32,
54
+    pub n_type: u8, pub n_sect: u8, pub n_desc: u16,
55
+    pub n_value: u64,
56
+}
57
+
58
+pub struct InputSymbol<'a> {
59
+    pub name: &'a str,
60
+    pub kind: SymKind,                // Undef, Abs, SectLocal, SectExt, PExt, Indirect
61
+    pub weak_ref: bool, pub weak_def: bool,
62
+    pub no_dead_strip: bool, pub private_extern: bool,
63
+    pub sect_idx: u8, pub value: u64,
64
+    pub common_align_pow2: Option<u8>, // from n_desc bits 8..15 when UNDF + value != 0
65
+}
66
+```
67
+
68
+Common symbols detected the way afs-as emits them: `N_UNDF | N_EXT` with nonzero `n_value` encoding the size and `n_desc >> 8` encoding alignment.
69
+
70
+### 4. Indirect (N_INDR) pass-through
71
+Alias symbols: record the aliased name from the string table via `n_value` used as a strx into the string table. Resolution lives in Sprint 7; this sprint just surfaces the data.
72
+
73
+### 5. String table reader
74
+`StringTable` wraps the raw bytes of `__LINKEDIT` string table, exposes `name_at(strx: u32) -> &str`, validates null termination, gracefully handles the suffix-dedup trick afs-as uses (`"_foo\0"` can overlap with a later `"_bar_foo\0"` by pointing mid-string).
75
+
76
+### 6. DYSYMTAB partitioning
77
+Decode the partition `(ilocalsym, nlocalsym)`, `(iextdefsym, nextdefsym)`, `(iundefsym, nundefsym)`. Record `toc`, `modtab`, `extrefsym`, `indirectsymoff/nindirectsyms`, `extreloff`, `locreloff` offsets for later phases (most are for dylibs).
78
+
79
+### 7. Input file model
80
+`afs-ld/src/input.rs`:
81
+
82
+```rust
83
+pub struct ObjectFile {
84
+    pub path: PathBuf,
85
+    pub header: MachHeader64,
86
+    pub commands: Vec<LoadCommand>,
87
+    pub sections: Vec<InputSection>,
88
+    pub symbols: Vec<InputSymbol>,
89
+    pub strings: StringTable,
90
+    pub dysymtab: DysymtabView,
91
+}
92
+```
93
+
94
+## Testing Strategy
95
+- Round-trip: parse every section/symbol/string from the afs-as corpus; re-emit; match bytes.
96
+- Diffing against `nm -a` and `otool -r` for symbols and relocation offsets (relocation bodies come in Sprint 3).
97
+- Edge cases: empty `__bss`, tentative common with 16-byte alignment, weak-def with `N_NO_DEAD_STRIP`, indirect symbol chains.
98
+- Fuzz: malformed nlist entries (strx out of bounds, n_sect out of range, invalid n_type bits) produce sourced diagnostics, never panics.
99
+
100
+## Definition of Done
101
+- Every symbol attribute afs-as can emit is recognized and round-trips.
102
+- Common symbols surface with correct size and alignment.
103
+- String table reader handles suffix-dedup overlaps correctly.
104
+- Corpus-wide symbol and section parity against `nm -a` / `otool -v`.
.docs/sprints/sprint03.mdadded
@@ -0,0 +1,91 @@
1
+# Sprint 3: Relocations (Read-Side)
2
+
3
+## Prerequisites
4
+Sprints 1–2 — header/load commands and section/symbol parsing in place.
5
+
6
+## Goals
7
+Decode every ARM64 relocation type afs-as emits. Normalize paired relocations (ADDEND + primary, SUBTRACTOR + UNSIGNED) into a linker-friendly form. End state: the linker's reloc model captures every arithmetic and semantic constraint needed by Sprint 11.
8
+
9
+## Deliverables
10
+
11
+### 1. Relocation constants and raw form
12
+`afs-ld/src/macho/constants.rs` additions:
13
+
14
+```rust
15
+pub const ARM64_RELOC_UNSIGNED:             u8 = 0;
16
+pub const ARM64_RELOC_SUBTRACTOR:           u8 = 1;
17
+pub const ARM64_RELOC_BRANCH26:             u8 = 2;
18
+pub const ARM64_RELOC_PAGE21:               u8 = 3;
19
+pub const ARM64_RELOC_PAGEOFF12:            u8 = 4;
20
+pub const ARM64_RELOC_GOT_LOAD_PAGE21:      u8 = 5;
21
+pub const ARM64_RELOC_GOT_LOAD_PAGEOFF12:   u8 = 6;
22
+pub const ARM64_RELOC_POINTER_TO_GOT:       u8 = 7;
23
+pub const ARM64_RELOC_TLVP_LOAD_PAGE21:     u8 = 8;
24
+pub const ARM64_RELOC_TLVP_LOAD_PAGEOFF12:  u8 = 9;
25
+pub const ARM64_RELOC_ADDEND:               u8 = 10;
26
+```
27
+
28
+Raw `relocation_info`: 8 bytes. `r_address: i32`, `r_info: u32` packed as `[r_symbolnum:24][r_pcrel:1][r_length:2][r_extern:1][r_type:4]`.
29
+
30
+### 2. Parsed relocation form
31
+`afs-ld/src/reloc/mod.rs`:
32
+
33
+```rust
34
+pub struct Reloc {
35
+    pub offset: u32,            // byte offset into input section
36
+    pub kind: RelocKind,
37
+    pub length: RelocLength,    // Byte=0, Half=1, Word=2, Quad=3
38
+    pub pcrel: bool,
39
+    pub referent: Referent,
40
+    pub addend: i64,            // folded from ARM64_RELOC_ADDEND prefix or inline
41
+}
42
+
43
+pub enum RelocKind {
44
+    Unsigned, Branch26,
45
+    Page21, PageOff12,
46
+    GotLoadPage21, GotLoadPageOff12, PointerToGot,
47
+    TlvpLoadPage21, TlvpLoadPageOff12,
48
+    Subtractor,     // minuend in `referent`; paired with a following Unsigned subtrahend
49
+}
50
+
51
+pub enum Referent {
52
+    Symbol(SymRef),        // r_extern = 1
53
+    Section(SectRef),      // r_extern = 0
54
+}
55
+```
56
+
57
+### 3. Paired reloc fusion
58
+Two pairings afs-as emits:
59
+
60
+1. **ARM64_RELOC_ADDEND**: a prefix reloc whose `r_symbolnum` field is actually a 24-bit signed addend. The next reloc in the list is the primary (UNSIGNED, PAGE21, PAGEOFF12, BRANCH26, or GOT/TLVP variant). Parser fuses: `addend_reloc.symnum → primary.addend`, primary kept.
61
+
62
+2. **ARM64_RELOC_SUBTRACTOR + ARM64_RELOC_UNSIGNED**: difference expression. Emitted as a pair where SUBTRACTOR names the subtrahend symbol and UNSIGNED names the minuend. Parser fuses into a single `RelocKind::Subtractor { minuend, subtrahend }` record on the minuend-carrying entry.
63
+
64
+After fusion no ADDEND or SUBTRACTOR relocs should leak out of the reader.
65
+
66
+### 4. Integrity checks
67
+- `r_address + (length_bytes)` within section bounds.
68
+- `r_extern = 1` → `r_symbolnum < nsyms`.
69
+- `r_extern = 0` → `r_symbolnum` is a 1-based section index in range.
70
+- `r_pcrel` matches the reloc kind (PC-relative: BRANCH26, PAGE21 variants, PAGEOFF12 that looks at ADRP page, TLVP variants; not PC-relative: UNSIGNED, PAGEOFF12 for immediates, POINTER_TO_GOT is PC-relative).
71
+- `r_length` matches kind (all ARM64 reloc kinds are length = 2 except UNSIGNED which is length = 2 or 3 and SUBTRACTOR which matches UNSIGNED).
72
+
73
+### 5. Round-trip serializer (for golden tests)
74
+`afs-ld/src/reloc/mod.rs::write_relocs(sect: &InputSection, relocs: &[Reloc]) -> Vec<u8>` reassembles into Mach-O wire form, including the ADDEND prefix when necessary. Used to prove the reader lost nothing.
75
+
76
+## Testing Strategy
77
+- Round-trip every reloc in the afs-as corpus. Byte equality after ADDEND/SUBTRACTOR fusion + re-emission.
78
+- Synthetic fixtures for each reloc kind (smallest possible `.s` input through afs-as):
79
+  - `bl _extern` → BRANCH26 external.
80
+  - `adrp x0, _g@PAGE; add x0, x0, _g@PAGEOFF` → PAGE21 + PAGEOFF12.
81
+  - `adrp x0, _g@GOTPAGE; ldr x0, [x0, _g@GOTPAGEOFF]` → GOT_LOAD_PAGE21 + GOT_LOAD_PAGEOFF12.
82
+  - `.quad _g + 0x1000` → ADDEND + UNSIGNED pair.
83
+  - `.quad _a - _b` → SUBTRACTOR + UNSIGNED pair.
84
+  - `adrp x0, _tlv@TLVPPAGE; ldr x0, [x0, _tlv@TLVPPAGEOFF]` → TLVP_LOAD_* pair.
85
+- Malformed-input: reloc pointing past section end, unpaired SUBTRACTOR, unpaired ADDEND. Each produces a specific diagnostic citing input path and offset.
86
+
87
+## Definition of Done
88
+- Every ARM64 reloc afs-as emits is represented in `RelocKind` post-fusion.
89
+- Paired relocs never leak as separate entries into downstream code.
90
+- Corpus-wide round-trip byte equality.
91
+- Integrity checks trigger diagnostics on malformed fixtures.
.docs/sprints/sprint04.mdadded
@@ -0,0 +1,92 @@
1
+# Sprint 4: Static Archives (`ar`)
2
+
3
+## Prerequisites
4
+Sprints 1–3 — Mach-O reading complete.
5
+
6
+## Goals
7
+Read static archives (`.a`) including the BSD, System V, and GNU-thin variants. Support lazy member fetching: a member is only parsed when an undefined symbol names it. This is the mechanism by which `libarmfortas_rt.a` gets pulled in.
8
+
9
+## Deliverables
10
+
11
+### 1. Archive format recognizer
12
+`afs-ld/src/archive.rs`:
13
+
14
+```rust
15
+pub struct Archive<'a> {
16
+    pub path: PathBuf,
17
+    pub flavor: Flavor,         // Bsd, Sysv, GnuThin
18
+    pub symdef: SymbolIndex,    // names → member offsets
19
+    pub members: Vec<Member<'a>>,
20
+}
21
+
22
+pub enum Flavor { Bsd, Sysv, GnuThin }
23
+```
24
+
25
+Detection by magic: `!<arch>\n` for all flavors; thin archives use `!<thin>\n`. BSD vs SysV distinguished by the first entry: `#1/<N>` BSD extended filenames vs `//` SysV long-name string table.
26
+
27
+### 2. Header parsing
28
+Each member preceded by a 60-byte `ar_hdr`:
29
+```
30
+char name[16];   // "#1/<N>" on BSD, or "//" string-table index on SysV, or "foo.o/" on SysV short
31
+char date[12];
32
+char uid[6];
33
+char gid[6];
34
+char mode[8];
35
+char size[10];
36
+char fmag[2];    // "`\n"
37
+```
38
+
39
+Parse field-by-field with tight bounds checks. Size is a decimal ASCII integer, not a C literal.
40
+
41
+### 3. Name decoding
42
+- BSD: name field `#1/<N>`, real name is the first N bytes of the member body (body shrinks accordingly).
43
+- SysV: name field holds a byte offset into the `//` string table.
44
+- SysV short: `foo.o/ ` — slash-terminated, space-padded.
45
+- GNU-thin: member body is zero bytes; the name encodes a path relative to the archive. afs-ld `mmap`s the external file.
46
+
47
+Names stored canonical (null-stripped, slash-stripped).
48
+
49
+### 4. Symbol index
50
+SysV `/` member or BSD `__.SYMDEF` / `__.SYMDEF SORTED` member. BSD layout:
51
+```
52
+uint32 ranlib_count
53
+ranlib[ranlib_count] { uint32 strx; uint32 offset; }
54
+uint32 stringsize
55
+char strings[stringsize]
56
+```
57
+
58
+SysV: big-endian `nsyms: u32`, then `nsyms` big-endian `u32` offsets, then packed null-terminated strings.
59
+
60
+`SymbolIndex` exposes `fn members_defining(name: &str) -> impl Iterator<Item = MemberRef>`.
61
+
62
+### 5. Lazy fetch API
63
+```rust
64
+impl<'a> Archive<'a> {
65
+    pub fn fetch(&mut self, name: &str) -> Option<ObjectFile>;
66
+}
67
+```
68
+
69
+Returns `None` if the archive does not define `name`. Fetching an archive member memoizes: a second lookup for the same member returns a cached handle. The resolution pass (Sprint 8) is the only caller.
70
+
71
+### 6. `-force_load` / `-all_load` support (semantics, not CLI yet)
72
+Archive has a `force_all(&mut self)` method that pre-fetches every member. Sprint 19 wires the CLI.
73
+
74
+### 7. Archive-of-archives
75
+Rare but legal: member can be another `.a`. Recurse one level. If a sub-archive defines `name`, the outer `fetch` returns the sub-member's object file and records a provenance chain for diagnostics.
76
+
77
+## Testing Strategy
78
+- Fixtures in `tests/corpus/archives/`:
79
+  - `libbsd.a` made by Apple `ar` (BSD flavor, extended filenames).
80
+  - `libsysv.a` made by GNU `ar` on Linux (for cross-check).
81
+  - `libthin.a` made by `ar --thin` (GNU-thin).
82
+  - `libmulti.a` containing several members each defining one or more symbols.
83
+- `cargo test -p afs-ld test_archive_bsd` verifies BSD index → correct member for each name.
84
+- Symbol-defining-two-members scenario: archive picks the one whose member comes first (ld's traditional rule).
85
+- Missing-symbol lookup returns `None`, does not error.
86
+- Thin-archive member file missing on disk produces a path-qualified diagnostic.
87
+
88
+## Definition of Done
89
+- All three archive flavors read.
90
+- `libarmfortas_rt.a` (built by parent workspace) parses and every runtime symbol is findable by name.
91
+- Archive-of-archives works one level deep.
92
+- Differential: `ar -t libarmfortas_rt.a` output matches our `--dump-archive` output.
.docs/sprints/sprint05.mdadded
@@ -0,0 +1,78 @@
1
+# Sprint 5: Dylibs (MH_DYLIB Binary)
2
+
3
+## Prerequisites
4
+Sprints 1–3 — Mach-O reading complete.
5
+
6
+## Goals
7
+Parse binary dylibs (`MH_DYLIB`). Extract exported symbols via the export trie or `LC_DYLD_CHAINED_FIXUPS` exports, resolve re-exports through umbrella frameworks, and expose a linkable `DylibFile` surface.
8
+
9
+## Deliverables
10
+
11
+### 1. DylibFile model
12
+`afs-ld/src/macho/dylib.rs`:
13
+
14
+```rust
15
+pub struct DylibFile {
16
+    pub path: PathBuf,
17
+    pub install_name: String,
18
+    pub current_version: u32,         // X.Y.Z packed
19
+    pub compat_version: u32,
20
+    pub is_umbrella: bool,
21
+    pub load_kind: DylibLoadKind,     // Normal, Weak, Reexport, Upward
22
+    pub ordinal: u16,                 // two-level namespace ordinal
23
+    pub reexports: Vec<PathBuf>,      // LC_REEXPORT_DYLIB paths
24
+    pub exports: ExportTrie,          // resolved during loading
25
+}
26
+
27
+pub enum DylibLoadKind { Normal, Weak, Reexport, Upward }
28
+```
29
+
30
+### 2. Load command decoding
31
+- `LC_ID_DYLIB` (for the dylib itself): install_name, timestamp, current_version, compat_version.
32
+- `LC_LOAD_DYLIB`: normal dependency.
33
+- `LC_LOAD_WEAK_DYLIB`: weak dep (imports allowed to be null at runtime).
34
+- `LC_REEXPORT_DYLIB`: dependency whose exports we rebroadcast (umbrella-framework case).
35
+- `LC_LOAD_UPWARD_DYLIB`: cyclic dependency escape hatch.
36
+
37
+### 3. Export trie decoder
38
+Export trie lives in `__LINKEDIT` pointed at by either `LC_DYLD_INFO_ONLY.export_off/export_size` (classic) or `LC_DYLD_CHAINED_FIXUPS.exports_trie_offset` (modern). Trie format:
39
+
40
+- Each node: ULEB128 terminal-size, optional terminal payload (flags ULEB + address ULEB, plus re-export or resolver data), then child count, then `(edge_string, child_offset_ULEB)` pairs.
41
+- Terminal flags: `EXPORT_SYMBOL_FLAGS_KIND_REGULAR`/`_THREAD_LOCAL`/`_ABSOLUTE`, `EXPORT_SYMBOL_FLAGS_WEAK_DEFINITION`, `EXPORT_SYMBOL_FLAGS_REEXPORT`, `EXPORT_SYMBOL_FLAGS_STUB_AND_RESOLVER`.
42
+
43
+```rust
44
+pub struct ExportTrie { /* walk-only view */ }
45
+impl ExportTrie {
46
+    pub fn lookup(&self, name: &str) -> Option<ExportEntry>;
47
+    pub fn iter(&self) -> impl Iterator<Item = (String, ExportEntry)>;
48
+}
49
+
50
+pub struct ExportEntry {
51
+    pub flags: u32,
52
+    pub address: u64,
53
+    pub reexport: Option<(u16 /*ordinal*/, String /*imported_name*/)>,
54
+    pub resolver: Option<u64>,
55
+}
56
+```
57
+
58
+Walking is recursive; we guard against malformed trees with a depth cap and visited-offset set.
59
+
60
+### 4. Two-level namespace ordinals
61
+Each dylib loaded by path gets an ordinal (1..=N) assigned in load-command order; `BIND_SPECIAL_DYLIB_SELF=0`, `BIND_SPECIAL_DYLIB_MAIN_EXECUTABLE=-1`, `BIND_SPECIAL_DYLIB_FLAT_LOOKUP=-2`, `BIND_SPECIAL_DYLIB_WEAK_LOOKUP=-3`. When an imported symbol is bound in Sprint 15, we use this ordinal.
62
+
63
+### 5. Re-export resolution
64
+Loading a dylib recursively loads its `LC_REEXPORT_DYLIB` chain. Names looked up in the umbrella are delegated down the chain. For CoreFoundation / Foundation style umbrella frameworks (not strictly required for armfortas today but landed now to avoid retrofit).
65
+
66
+### 6. SDK path resolution
67
+`-syslibroot <SDK>` + `-l<name>` needs to locate `${SDK}/usr/lib/lib<name>.{dylib,tbd}`. This sprint establishes the search order; the rest lands in Sprint 19's CLI work.
68
+
69
+## Testing Strategy
70
+- Fixtures: tiny hand-built `.dylib` via the system toolchain (one exported symbol, one re-export). Parsed and exports match `nm -g`.
71
+- Differential: load `CoreFoundation.tbd` in Sprint 6, not here; this sprint uses real binary `.dylib`s from `/usr/lib/` (where present on older macOS) or synthetic ones.
72
+- Malformed trie: cycle, out-of-bounds child offset, ULEB128 overrun — diagnostics, no panics.
73
+
74
+## Definition of Done
75
+- Export trie walker handles real `.dylib` files correctly.
76
+- `DylibFile` constructed with correct install_name, versions, ordinal.
77
+- Re-exports chained through umbrella fixtures.
78
+- `dyld_info -export <dylib>` output matches our export dumper.
.docs/sprints/sprint06.mdadded
@@ -0,0 +1,80 @@
1
+# Sprint 6: TAPI TBD Text Stubs
2
+
3
+## Prerequisites
4
+Sprint 5 — binary dylib reader works.
5
+
6
+## Goals
7
+Read `.tbd` files (TAPI text dylib stubs). On modern SDKs `libSystem`, `libc++`, and `CoreFoundation` ship only as `.tbd` — linking without this sprint means no system libraries, full stop.
8
+
9
+## Deliverables
10
+
11
+### 1. Minimal YAML subset
12
+TBD is YAML with a well-defined schema. We implement only the subset TAPI emits, not a general YAML parser:
13
+
14
+- Flow scalars (plain, single-quoted, double-quoted).
15
+- Flow sequences: `[ a, b, c ]`.
16
+- Block sequences: `- item`.
17
+- Block mappings: `key: value`.
18
+- Multi-document files with `---` / `...`.
19
+- Tags: `!tapi-tbd`.
20
+- Version directives: `%YAML 1.2`.
21
+
22
+No anchors, no aliases, no complex types, no folded scalars. If a real `.tbd` in the wild uses features outside the subset, the parser fails loudly with line/column.
23
+
24
+### 2. TBD schema
25
+`afs-ld/src/macho/tbd.rs`:
26
+
27
+```rust
28
+pub struct Tbd {
29
+    pub tbd_version: u32,      // 3 or 4
30
+    pub targets: Vec<Target>,  // arch + platform
31
+    pub install_name: String,
32
+    pub current_version: Option<String>,
33
+    pub compatibility_version: Option<String>,
34
+    pub parent_umbrella: Vec<Scoped<String>>,
35
+    pub allowable_clients: Vec<Scoped<String>>,
36
+    pub reexported_libraries: Vec<Scoped<String>>,
37
+    pub exports: Vec<Scoped<Exports>>,
38
+    pub reexports: Vec<Scoped<Exports>>,
39
+}
40
+
41
+pub struct Target { pub arch: Arch, pub platform: Platform }
42
+pub struct Scoped<T> { pub targets: Vec<Target>, pub value: T }
43
+pub struct Exports {
44
+    pub symbols: Vec<String>,
45
+    pub weak_symbols: Vec<String>,
46
+    pub thread_local_symbols: Vec<String>,
47
+    pub objc_classes: Vec<String>,
48
+    pub objc_eh_types: Vec<String>,
49
+    pub objc_ivars: Vec<String>,
50
+}
51
+```
52
+
53
+v3 and v4 both supported; v4 is what modern Xcode ships.
54
+
55
+### 3. Materialize into DylibFile
56
+`Tbd::into_dylib_file(tbd: Tbd, for_target: Target) -> DylibFile`. Filters scoped entries to only those matching `arm64 / macos`. Produces the same `DylibFile` surface Sprint 5 produces, so downstream code doesn't care about source format.
57
+
58
+### 4. SDK search implementation
59
+Integrate with `-syslibroot`. Search order for `-l<name>`:
60
+1. `${SDK}/usr/lib/lib<name>.tbd`
61
+2. `${SDK}/usr/lib/lib<name>.dylib`
62
+3. `${SDK}/usr/local/lib/lib<name>.tbd`
63
+4. `${SDK}/usr/local/lib/lib<name>.dylib`
64
+5. `-L<dir>` entries in order, same four suffixes.
65
+
66
+For frameworks (`-framework Foo`): `${SDK}/System/Library/Frameworks/Foo.framework/Foo.{tbd,dylib}`.
67
+
68
+### 5. Platform/arch filtering
69
+`Target { arch: Arm64, platform: MacOS }` is what armfortas cares about. If the TBD has no matching target, produce a clear diagnostic: "`<path>` does not export for arm64-macos".
70
+
71
+## Testing Strategy
72
+- Fixtures: copies of `${SDK}/usr/lib/libSystem.tbd`, `libc++.tbd`, `libobjc.tbd` checked into `tests/corpus/tbd/` (small, just headers of exported symbols — confirm they're not under a license that forbids redistribution; if so, generate equivalent fixtures).
73
+- Parse `libSystem.tbd`, assert that `_dyld_stub_binder`, `_malloc`, `_free`, `_printf` are all in exports.
74
+- Verify `DylibFile` produced is byte-level equivalent (in the fields we populate) to one produced by loading an actual `libSystem.dylib` from an older SDK.
75
+- Malformed YAML: missing `install_name`, tabs in indentation, unterminated quoted scalar — each with a precise diagnostic.
76
+
77
+## Definition of Done
78
+- Can read a modern Xcode `libSystem.tbd` and enumerate its exports.
79
+- SDK + `-l` + `-framework` resolution picks the right file on a real toolchain.
80
+- Differential test: hello-world link with `libSystem.tbd` produces the same bind entries as with a binary dylib (on older SDKs where both exist).
.docs/sprints/sprint07.mdadded
@@ -0,0 +1,85 @@
1
+# Sprint 7: Symbol Model & Table
2
+
3
+## Prerequisites
4
+Sprints 2, 4, 5, 6 — object, archive, dylib, TBD readers in place.
5
+
6
+## Goals
7
+A uniform symbol table that fuses definitions from every input kind. Establishes the invariants Sprint 8's resolution pass will preserve.
8
+
9
+## Deliverables
10
+
11
+### 1. `Symbol` sum type
12
+`afs-ld/src/symbol.rs`:
13
+
14
+```rust
15
+pub enum Symbol {
16
+    Undefined   { name: Istr, origin: InputId,        weak_ref: bool },
17
+    Defined     { name: Istr, origin: InputId, atom:  AtomId, value: u64,
18
+                  weak: bool, private_extern: bool, no_dead_strip: bool },
19
+    Common      { name: Istr, origin: InputId, size: u64, align_pow2: u8 },
20
+    DylibImport { name: Istr, dylib: DylibId, ordinal: u16, weak_import: bool },
21
+    LazyArchive { name: Istr, archive: ArchiveId, member: MemberId },
22
+    LazyObject  { name: Istr, origin: InputId },        // --start-lib / --end-lib
23
+    Alias       { name: Istr, aliased: Istr },          // N_INDR
24
+}
25
+```
26
+
27
+`Istr` = interned string handle. Interning happens once when a name first enters the table; all comparisons are handle-equality.
28
+
29
+### 2. `SymbolTable`
30
+```rust
31
+pub struct SymbolTable {
32
+    names: StringInterner,
33
+    by_name: HashMap<Istr, SymbolId>,
34
+    symbols: Vec<Symbol>,
35
+    // replacement log for diagnostics + -why_live
36
+    transitions: Vec<Transition>,
37
+}
38
+
39
+pub struct Transition { pub at: SymbolId, pub from: SymbolKindTag, pub to: SymbolKindTag, pub cause: Cause }
40
+```
41
+
42
+`HashMap` is fine for Sprint 7; Sprint 28 may swap in a custom open-addressing table.
43
+
44
+### 3. Insertion semantics
45
+`SymbolTable::insert(sym: Symbol)` runs the resolution rules inline:
46
+
47
+| Existing \ New       | Undefined | Defined | Common | DylibImport | LazyArchive | LazyObject |
48
+|----------------------|-----------|---------|--------|-------------|-------------|------------|
49
+| *vacant*             | insert    | insert  | insert | insert      | insert      | insert     |
50
+| Undefined            | keep      | replace | replace| replace     | replace     | replace    |
51
+| Defined (strong)     | keep      | **error if both strong and same kind** | keep | keep | keep | keep |
52
+| Defined (weak)       | keep      | replace if new strong | keep | keep | keep | keep |
53
+| Common               | keep      | replace (common → Defined) | pick larger size / stricter align | keep | keep | keep |
54
+| DylibImport          | keep      | replace (definition shadows import) | keep | keep | keep | keep |
55
+| LazyArchive          | **fetch** | replace | replace | replace | keep first | replace |
56
+| LazyObject           | **fetch** | replace | replace | replace | replace | keep |
57
+
58
+"Fetch" means: load the member/object, enqueue its symbols, mark this entry's transition.
59
+
60
+### 4. Weak coalescing rules
61
+- `weak_def` + `weak_def` → first wins.
62
+- `weak_def` + strong → strong wins.
63
+- Strong + strong → hard error, diagnostic cites both input paths.
64
+- `weak_ref` without a definition is not an error; the reference resolves to address 0 (handled in relocation pass).
65
+
66
+### 5. Aliases (N_INDR)
67
+Flattened on insertion: an `Alias(name → aliased)` is resolved by looking up `aliased`. If `aliased` is itself an alias, walk until a non-alias is found; cycle detection with a depth cap.
68
+
69
+### 6. Transition log
70
+Every `insert` records the old/new kind + input path + (for lazy fetches) the reason the fetch happened. The `-why_live` diagnostic introduced in Sprint 19 reads this log.
71
+
72
+### 7. Tombstoned symbols
73
+Common → Defined promotion preserves the common size and alignment (so the BSS slot is large enough). Dead-stripping (Sprint 23) can tombstone a Defined without removing it from the table.
74
+
75
+## Testing Strategy
76
+- Unit tests for every cell in the resolution matrix. Each combination has a named test.
77
+- Synthetic inputs: two `.o`s both defining `_foo` strong → error; one strong + one weak → strong wins; two weak → first wins; common + strong → common replaced.
78
+- Alias-chain cycles detected with a diagnostic, not a stack overflow.
79
+- Interner stress test: 100K unique names, membership queries are O(1) average.
80
+
81
+## Definition of Done
82
+- Every matrix cell has a passing test.
83
+- Weak coalescing matches `ld` on a corpus of 20+ scenarios (differential test: both linkers produce the same `nm` output).
84
+- Alias flattening correct and cycle-safe.
85
+- Transition log surfaces replacement causes for `-why_live`.
.docs/sprints/sprint08.mdadded
@@ -0,0 +1,92 @@
1
+# Sprint 8: Name Resolution Pass
2
+
3
+## Prerequisites
4
+Sprint 7 — `SymbolTable` with insertion semantics.
5
+
6
+## Goals
7
+Drive the symbol table to a fixed point: every undefined reference either resolves to a Defined (from an object), Common (promoted in BSS), DylibImport (from a dylib/TBD), or raises a clear, actionable diagnostic. `-force_load` / `-all_load` / `-undefined <treatment>` all handled.
8
+
9
+## Deliverables
10
+
11
+### 1. Resolution algorithm
12
+`afs-ld/src/resolve.rs`:
13
+
14
+```rust
15
+pub fn resolve(inputs: &mut Inputs, table: &mut SymbolTable, opts: &LinkOptions)
16
+    -> Result<(), Vec<ResolveError>>
17
+{
18
+    seed_table_with_objects_and_dylib_imports(inputs, table, opts);
19
+    if opts.all_load    { force_load_everything(inputs, table); }
20
+    for forced in &opts.force_load { force_load_one(inputs, table, forced); }
21
+    fixed_point_pull_from_archives(inputs, table);
22
+    classify_unresolved(table, opts);
23
+}
24
+```
25
+
26
+### 2. Seed phase
27
+Walk every explicit `.o` first, then every `.dylib` / `.tbd`: add Defined / Common from objects, DylibImport from dylibs. Archives are added as LazyArchive entries only — their members are not parsed until pulled.
28
+
29
+### 3. Fixed-point pull
30
+```
31
+while let Some(name) = table.undefined_pending.pop() {
32
+    for archive in &inputs.archives_in_command_line_order {
33
+        if let Some(member) = archive.fetch(name) {
34
+            ingest_member(member, table);
35
+            break;
36
+        }
37
+    }
38
+}
39
+```
40
+
41
+Order matters: armfortas's driver currently passes `<objs> <runtime.a> -lSystem`, and resolution must match `ld`'s left-to-right behavior. Ingesting a member can create new undefined pending names; loop terminates when no member was fetched this round.
42
+
43
+### 4. `-force_load` and `-all_load`
44
+- `-force_load <archive>`: pull every member of that archive before fixed-point.
45
+- `-all_load`: pull every member of every archive.
46
+- Both happen before the fixed-point loop so their transitively-pulled symbols feed into the same fixed point.
47
+
48
+### 5. `-undefined <treatment>`
49
+After the fixed point, any still-Undefined entry is classified by the `-undefined` setting:
50
+- `error` (default): hard error, cite every input that references the name (collected via the transition log).
51
+- `warning`: warn but emit, writing the symbol as address 0 (bind to nothing).
52
+- `suppress`: silent, address 0.
53
+- `dynamic_lookup`: flat-namespace DylibImport with ordinal `BIND_SPECIAL_DYLIB_FLAT_LOOKUP=-2`.
54
+
55
+### 6. Weak references
56
+`weak_ref` to a missing symbol is always valid regardless of `-undefined`; it resolves to address 0 at bind time and the runtime tests for null.
57
+
58
+### 7. Diagnostics
59
+Undefined errors must cite every referrer input, not just one. Output format:
60
+
61
+```
62
+afs-ld: error: undefined symbol: _afs_print
63
+      referenced by program.o(text section + 0x34)
64
+      referenced by runtime.o(text section + 0x120)
65
+      (also via 2 relocations in libarmfortas_rt.a(io.o))
66
+Hint: did you mean _afs_print_real? (Levenshtein distance 5)
67
+```
68
+
69
+Did-you-mean uses a basic Levenshtein-3 search over defined symbols.
70
+
71
+### 8. Diagnostics for duplicate strong
72
+```
73
+afs-ld: error: duplicate symbol _foo
74
+  defined in: a.o (text + 0x0)
75
+  also in:    b.o (text + 0x0)
76
+```
77
+
78
+No suggestion — two strong defs is a real ambiguity.
79
+
80
+## Testing Strategy
81
+- Resolution matrix revisited from Sprint 7, but with real archives and dylibs.
82
+- Order sensitivity: `a.o b.a` vs `b.a a.o` — first resolves when `a.o` references a symbol in `b.a`; second does not (matches `ld`'s classic behavior).
83
+- `-force_load` pulls in a member whose symbols would otherwise go unreferenced.
84
+- `-all_load` across a multi-member archive.
85
+- Weak-import from a dylib that at runtime will be missing.
86
+- Did-you-mean fires on a close misspell, stays silent when the closest match is > 3 edits away.
87
+
88
+## Definition of Done
89
+- Fixed-point loop terminates on all corpus inputs.
90
+- Diagnostics match the format above, include every referrer, include did-you-mean suggestions.
91
+- Differential test against `ld` for order-dependent resolution on 10+ scenarios.
92
+- `-force_load` / `-all_load` / `-undefined=*` all pass dedicated tests.
.docs/sprints/sprint09.mdadded
@@ -0,0 +1,74 @@
1
+# Sprint 9: Subsections-via-Symbols Atomization
2
+
3
+## Prerequisites
4
+Sprints 2, 7, 8 — sections, symbols, resolved table.
5
+
6
+## Goals
7
+Split each input section into **atoms** at symbol boundaries when `MH_SUBSECTIONS_VIA_SYMBOLS` is set (afs-as always sets this). Atoms are the unit of dead-stripping (Sprint 23), ICF (Sprint 24), and output layout (Sprint 10). Every Defined symbol owns exactly one atom.
8
+
9
+## Deliverables
10
+
11
+### 1. Atom model
12
+`afs-ld/src/atom.rs`:
13
+
14
+```rust
15
+pub struct Atom {
16
+    pub id: AtomId,
17
+    pub owner: SymbolId,           // the primary symbol defining this atom
18
+    pub alt_entries: Vec<SymbolId>, // .alt_entry chains
19
+    pub section: OutputSectionKey,  // which output section it will land in
20
+    pub input_origin: InputId,
21
+    pub input_section: SectIdx,
22
+    pub offset: u32,                // offset within input section
23
+    pub size: u32,
24
+    pub align_pow2: u8,
25
+    pub data: DataRef,              // borrowed from input mmap, or ZeroFill
26
+    pub relocs: Vec<RelocIdx>,      // relocs originating inside this atom
27
+    pub flags: AtomFlags,           // NoDeadStrip, WeakDef, ThreadLocal, ...
28
+}
29
+```
30
+
31
+### 2. Atomization algorithm
32
+For each input section:
33
+1. Collect every Defined symbol whose section is this section, sorted by value.
34
+2. If `MH_SUBSECTIONS_VIA_SYMBOLS` is set: split the section at each symbol's offset. Each slice becomes an atom owned by the symbol at its head.
35
+3. If a symbol is `.alt_entry`, fold it into the previous atom's `alt_entries`, don't split.
36
+4. If the flag is not set: one atom per section (Apple-style consolidated section).
37
+
38
+Atoms for text preserve instruction alignment; atoms for zerofill carry size only.
39
+
40
+### 3. Literal atoms (C strings, 16-byte literals)
41
+`__TEXT,__cstring` and `__TEXT,__literal16` are special. Every null-terminated string / every 16-byte block is an atom candidate for de-duplication (Sprint 24 ICF). For now, store each literal as its own atom with a content-hash annotation.
42
+
43
+### 4. Unwind + compact-unwind atoms
44
+`__TEXT,__compact_unwind` contains 32-byte records, each referring (via a reloc) to a function atom. One unwind atom per function; tracked as `parent_of: AtomId` so unwind atoms get stripped alongside dead functions.
45
+
46
+### 5. Reloc → atom remapping
47
+Every reloc has an input offset into its source section. After atomization, recompute as `(atom, offset_within_atom)`. When a reloc crosses atom boundaries it can only point at a whole symbol (subsections-via-symbols invariant); confirm this and diagnose if not.
48
+
49
+### 6. Reloc references to atoms
50
+`Reloc::referent` gains:
51
+```rust
52
+pub enum Referent {
53
+    SymbolExternal(SymbolId),       // undefined or dylib import
54
+    SymbolLocal(AtomId, i64),       // same-tu reference, addend in bytes
55
+    AbsoluteSection(AtomId, i64),   // rare, section-relative
56
+}
57
+```
58
+
59
+The "local" case is what the atomization unlocks: a reloc from function `_a` to function `_b` in the same `.o` becomes a reference to `_b`'s atom, not to an offset within a monolithic text section.
60
+
61
+### 7. `.no_dead_strip` propagation
62
+Symbol flag propagates to its atom. Unwind atoms inherit `NoDeadStrip` from their parent function. Entry point symbol is marked `NoDeadStrip`.
63
+
64
+## Testing Strategy
65
+- Fixture: a `.s` with several functions where one branches to another. After atomization, reloc's referent must be the callee atom, not a byte-offset.
66
+- `.alt_entry` folding: `_foo` and `.alt_entry _bar` in the same input produce one atom whose `alt_entries = [_bar]`.
67
+- Boundary-crossing reloc (synthesized maliciously): parser diagnoses.
68
+- Differential: `ld -dead_strip` behavior on a corpus of ~20 atomization fixtures compared to what Sprint 23 will produce.
69
+
70
+## Definition of Done
71
+- Every `.o` in the afs-as corpus atomizes without diagnostics.
72
+- `.alt_entry` correctly folded.
73
+- Relocs re-targeted to atoms; no raw section-relative references leak into Sprint 10.
74
+- Unwind atoms track their parent function atom.
.docs/sprints/sprint10.mdadded
@@ -0,0 +1,98 @@
1
+# Sprint 10: Output Segment & Section Layout (dylib-aware)
2
+
3
+## Prerequisites
4
+Sprints 7–9 — resolved table and atomized inputs.
5
+
6
+## Goals
7
+One layout engine, two modes: `MH_EXECUTE` and `MH_DYLIB`. Assign VM addresses, file offsets, and segment membership to every atom. End state: the writer can emit a valid-but-empty Mach-O for both modes that `otool -lV` accepts.
8
+
9
+## Deliverables
10
+
11
+### 1. Output segment & section model
12
+`afs-ld/src/section.rs`:
13
+
14
+```rust
15
+pub struct OutputSegment {
16
+    pub name: String,           // "__TEXT", "__DATA_CONST", "__DATA", "__LINKEDIT", "__PAGEZERO"
17
+    pub sections: Vec<OutputSectionId>,
18
+    pub vm_addr: u64, pub vm_size: u64,
19
+    pub file_off: u64, pub file_size: u64,
20
+    pub init_prot: Prot, pub max_prot: Prot,
21
+}
22
+
23
+pub struct OutputSection {
24
+    pub segment: String, pub name: String,    // e.g. ("__TEXT", "__text")
25
+    pub kind: SectionKind,
26
+    pub align_pow2: u8, pub flags: u32,
27
+    pub atoms: Vec<AtomId>,
28
+    pub addr: u64, pub size: u64, pub file_off: u64,
29
+}
30
+```
31
+
32
+### 2. Segment plan
33
+Two plans, keyed by `OutputKind::Executable | Dylib`:
34
+
35
+**Executable**:
36
+- `__PAGEZERO`: VM `[0, 0x1_0000_0000)`, prot `---`. No file backing.
37
+- `__TEXT`: prot `r-x`. Contains `__text`, `__stubs`, `__stub_helper`, `__cstring`, `__const`, `__literal16`, `__unwind_info`, `__eh_frame`.
38
+- `__DATA_CONST`: prot `r--` (rebased to `r--` by dyld after fixups). Contains `__got`, `__const` data.
39
+- `__DATA`: prot `rw-`. Contains `__data`, `__bss`, `__la_symbol_ptr`, `__thread_ptrs`, `__thread_vars`, `__thread_data`, `__thread_bss`.
40
+- `__LINKEDIT`: prot `r--`. Symbol table, string table, dyld-info opcodes (or chained fixups), function starts, data-in-code, code signature.
41
+
42
+**Dylib**:
43
+- No `__PAGEZERO`. `__TEXT` starts at VM `0`.
44
+- Everything else the same.
45
+
46
+### 3. Section placement order within segments
47
+Matches ld's defaults so differential testing converges:
48
+- `__TEXT`: `__text`, `__stubs`, `__stub_helper`, `__cstring`, `__const`, `__literal16`, `__unwind_info`, `__eh_frame`.
49
+- `__DATA_CONST`: `__got`, `__const`.
50
+- `__DATA`: `__la_symbol_ptr`, `__data`, `__thread_vars`, `__thread_ptrs`, `__thread_data`, `__thread_bss`, `__bss`.
51
+- `__LINKEDIT`: fixup stream, function starts, data-in-code, symbol table, string table, code signature (in that order — matches ld's observed layout).
52
+
53
+Missing sections are simply absent; empty sections are dropped entirely.
54
+
55
+### 4. Atom placement
56
+Each atom maps to one output section by its `OutputSectionKey`. Within a section, atoms ordered by:
57
+1. Input-file command-line order.
58
+2. Within an input, atom original offset.
59
+3. Tiebreaker: symbol name (for determinism).
60
+
61
+ICF (Sprint 24) and `-order_file` (later polish) will override this later; for now, deterministic default.
62
+
63
+### 5. Address assignment
64
+Pass 1: accumulate sizes per section, respecting atom alignment. Pass 2: assign section `addr` by accumulating `vm_addr + padding-to-section-alignment`. Pass 3: file offsets — `__TEXT` starts at file 0 (header lives there); other segments at next 4 KiB boundary. Zerofill sections have `size > 0` but contribute 0 to file size.
65
+
66
+Page alignment is 16 KiB on arm64 (Apple Silicon always). Section alignment comes from atoms.
67
+
68
+### 6. MH_EXECUTE vs MH_DYLIB writer dispatch
69
+`afs-ld/src/macho/writer.rs`:
70
+
71
+```rust
72
+pub enum OutputKind { Executable, Dylib }
73
+
74
+pub fn write(layout: &Layout, kind: OutputKind, opts: &LinkOptions, out: &mut Vec<u8>)
75
+    -> Result<(), WriteError>;
76
+```
77
+
78
+Dispatches to the right load-command set:
79
+- Executable: `LC_MAIN` with entry offset, optional `LC_UUID`, `LC_SOURCE_VERSION`.
80
+- Dylib: `LC_ID_DYLIB` with install-name, current-version, compat-version; no `LC_MAIN`.
81
+- Both: `LC_SEGMENT_64` per segment, `LC_BUILD_VERSION`, `LC_SYMTAB`, `LC_DYSYMTAB`, `LC_DYLD_INFO_ONLY` or `LC_DYLD_CHAINED_FIXUPS`, `LC_FUNCTION_STARTS`, `LC_DATA_IN_CODE`, one `LC_LOAD_DYLIB` per dylib dependency, `LC_RPATH` entries, `LC_CODE_SIGNATURE`.
82
+
83
+### 7. Minimum-viable empty output
84
+End-of-sprint gate: both `OutputKind::Executable` (empty `_main`) and `OutputKind::Dylib` (no exports) emit a file that:
85
+- `otool -lV` accepts without complaint.
86
+- `file` identifies as `Mach-O 64-bit executable arm64` / `Mach-O 64-bit dynamically linked shared library arm64`.
87
+- Does not yet need to run or load — just parse.
88
+
89
+## Testing Strategy
90
+- Snapshot tests: produce the minimal empty executable and dylib; compare load-command layout against a golden captured from `ld`.
91
+- Differential: for empty inputs, our load-command order and segment protections must match `ld`'s.
92
+- Golden section-ordering tests for the standard ld section order.
93
+
94
+## Definition of Done
95
+- Empty executable output passes `otool -lV`.
96
+- Empty dylib output passes `otool -lV`.
97
+- Section placement order matches `ld` on a corpus of staged fixtures.
98
+- Address assignment deterministic across 100 invocations of the same input.
.docs/sprints/sprint11.mdadded
@@ -0,0 +1,71 @@
1
+# Sprint 11: Core Relocation Application (ARM64)
2
+
3
+## Prerequisites
4
+Sprints 3, 9, 10 — relocs parsed, atoms sized, addresses assigned.
5
+
6
+## Goals
7
+Patch atom bytes according to every basic ARM64 reloc kind. This sprint covers `BRANCH26`, `PAGE21`, `PAGEOFF12`, `UNSIGNED`, `SUBTRACTOR`, and the folded `ADDEND`. GOT/stubs land in Sprint 12, TLV in Sprint 13.
8
+
9
+## Deliverables
10
+
11
+### 1. Reloc application pass
12
+`afs-ld/src/reloc/arm64.rs`:
13
+
14
+```rust
15
+pub fn apply(layout: &Layout, atom: &Atom, bytes: &mut [u8]) -> Result<(), RelocError>;
16
+```
17
+
18
+For each reloc in the atom:
19
+1. Resolve `Referent` to a final address (atom.addr + referent.atom.offset + addend, or dylib import → 0 for now, handled fully in Sprint 12).
20
+2. Compute the reloc value per kind.
21
+3. Patch `bytes` at `reloc.offset`.
22
+
23
+### 2. Reloc math (reference)
24
+
25
+| Kind | Formula | Encoding |
26
+|---|---|---|
27
+| `Unsigned` (length=2) | `S + A` | little-endian u32 write |
28
+| `Unsigned` (length=3) | `S + A` | little-endian u64 write |
29
+| `Subtractor` | `S_min - S_sub + A` | u32 or u64 depending on length |
30
+| `Branch26` | `(S + A - P) >> 2` | 26-bit sign-check, OR into bottom 26 bits of the instruction |
31
+| `Page21` | `(page(S+A) - page(P)) >> 12` | ADRP immhi:immlo encoding, 21-bit sign-check |
32
+| `PageOff12` | `(S + A) & 0xFFF` | ADD imm12 or LDR imm12 (scaled per LDR size!) |
33
+
34
+Where `S` = symbol/section address, `A` = addend, `P` = address of the relocated instruction, `page(x) = x & ~0xFFF`.
35
+
36
+### 3. PAGEOFF12 scaling detail
37
+For `LDR` immediate-offset forms the 12-bit immediate is scaled by the load size (1 for `LDRB`, 2 for `LDRH`, 4 for `LDR W`, 8 for `LDR X`). afs-as sets the instruction bits correctly; our job is to right-shift the 12-bit offset by the load size's log2 before OR'ing into the instruction. The `size` nibble of the LDR encoding tells us the shift.
38
+
39
+For `ADD` immediate-offset the 12-bit immediate is unscaled — write as-is.
40
+
41
+Disambiguate by disassembling the instruction: opcode bits `[31:24]` distinguish ADD vs LDR (B/H/W/X).
42
+
43
+### 4. Range checks
44
+- `Branch26`: `(S + A - P)` must fit in signed 28 bits (26 bits × 4-byte scale). If not, emit a hard error citing the caller atom and the out-of-range target. Thunks land in Sprint 26.
45
+- `Page21`: `(page(S+A) - page(P))` must fit in signed 33 bits (21 bits × 4 KiB scale). In practice, always satisfiable on macOS.
46
+- `PageOff12`: always fits by construction.
47
+- `Unsigned`: wraps silently.
48
+
49
+### 5. Subtractor + Unsigned pair
50
+A `RelocKind::Subtractor` entry carries both the minuend and subtrahend. Formula: `target = minuend.addr + minuend_addend - (subtrahend.addr + subtrahend_addend)`. Write as u32 or u64 depending on length. afs-as uses this for `.quad _a - _b` and for CIE offset diff in `__eh_frame`.
51
+
52
+### 6. PC vs atom address
53
+`P` is the address of the relocated 4-byte instruction, `atom.addr + reloc.offset`. `P` for an `Unsigned` reloc still evaluates even when `pcrel=false`; the formula just doesn't use it. A wrong `P` is the most common reloc bug — unit test every kind against a hand-computed value.
54
+
55
+### 7. Error reporting
56
+Every failed reloc cites the originating input + atom + offset + kind + referent. No panics.
57
+
58
+### 8. Defer: GOT and TLVP
59
+`GOT_LOAD_PAGE21`, `GOT_LOAD_PAGEOFF12`, `POINTER_TO_GOT`, `TLVP_LOAD_*` — Sprint 12 and Sprint 13 allocate the synthetic sections and wire them. This sprint emits a clear "not yet implemented" error if encountered.
60
+
61
+## Testing Strategy
62
+- Unit test each kind with hand-computed encodings cross-checked against ARM ARM and `otool -tv` disassembly.
63
+- Differential: identical inputs through `ld` and afs-ld produce the same patched bytes (within allowed-diff categories from Sprint 0's harness).
64
+- Corner cases: max-negative branch, wrap-around UNSIGNED addend, SUBTRACTOR across sections, SUBTRACTOR within same section.
65
+- Regression fixture: a tiny `.o` that exercises every kind covered in this sprint.
66
+
67
+## Definition of Done
68
+- All covered reloc kinds apply correctly against a corpus of fixtures.
69
+- Out-of-range BRANCH26 emits an actionable error.
70
+- Differential pass: 10+ fixtures link to byte-identical `__text` under afs-ld and `ld`.
71
+- GOT/TLVP kinds emit "not yet implemented" errors with a pointer to Sprints 12/13.
.docs/sprints/sprint12.mdadded
@@ -0,0 +1,96 @@
1
+# Sprint 12: GOT, Stubs, Lazy Pointers
2
+
3
+## Prerequisites
4
+Sprints 5, 10, 11 — dylibs loaded, layout pass, core reloc application.
5
+
6
+## Goals
7
+Synthesize `__got`, `__stubs`, `__stub_helper`, `__la_symbol_ptr`. Wire `GOT_LOAD_*` / `POINTER_TO_GOT` relocations to GOT slots. Rewire `BRANCH26` to a dylib import through a stub. Classic lazy-binding model; chained fixups land in Sprint 15.5.
8
+
9
+## Deliverables
10
+
11
+### 1. GOT synthetic section
12
+`afs-ld/src/synth/got.rs`:
13
+
14
+```rust
15
+pub struct GotSection {
16
+    entries: Vec<GotEntry>,
17
+    index: HashMap<SymbolId, usize>,
18
+}
19
+
20
+pub struct GotEntry { pub symbol: SymbolId, pub weak_import: bool }
21
+```
22
+
23
+- Lives in `__DATA_CONST,__got`, section flags `S_NON_LAZY_SYMBOL_POINTERS` (type = 6). `reserved1` in the section header = the starting indirect-symbol-table index.
24
+- 8 bytes per entry, aligned 8.
25
+- GOT entry for a Defined symbol holds that symbol's address directly (no dyld bind).
26
+- GOT entry for a DylibImport is zeroed in the file; dyld binds it at load time via the non-lazy bind stream (Sprint 15).
27
+
28
+### 2. Stubs synthetic section
29
+`afs-ld/src/synth/stubs.rs`:
30
+
31
+ARM64 stub is 12 bytes:
32
+```
33
+ADRP x16, la_symbol_ptr@PAGE
34
+LDR  x16, [x16, la_symbol_ptr@PAGEOFF]
35
+BR   x16
36
+```
37
+
38
+- Lives in `__TEXT,__stubs`, section flags `S_SYMBOL_STUBS | S_ATTR_PURE_INSTRUCTIONS | S_ATTR_SOME_INSTRUCTIONS` (type = 8). `reserved1` = starting indirect-sym index, `reserved2` = 12 (stub size).
39
+- One stub per dylib-imported function whose address is branched to (`BRANCH26` target).
40
+
41
+### 3. Lazy symbol pointers
42
+`__DATA,__la_symbol_ptr`, section flags `S_LAZY_SYMBOL_POINTERS` (type = 7). Each 8-byte entry is initialized to point at the corresponding `__stub_helper` entry; at first call, the stub-helper resolves the symbol and patches the lazy pointer.
43
+
44
+### 4. Stub helper
45
+`__TEXT,__stub_helper`:
46
+
47
+Header (24 bytes on arm64):
48
+```
49
+ADRP x17, __dyld_private@PAGE
50
+ADD  x17, x17, __dyld_private@PAGEOFF
51
+STP  x16, x17, [sp, #-16]!
52
+ADRP x16, dyld_stub_binder@GOTPAGE
53
+LDR  x16, [x16, dyld_stub_binder@GOTPAGEOFF]
54
+BR   x16
55
+```
56
+
57
+Per-symbol entry (12 bytes):
58
+```
59
+LDR  w16, =<lazy_bind_offset>
60
+B    <header_addr>
61
+```
62
+
63
+Where `<lazy_bind_offset>` is the offset of this symbol's opcode sequence within the `__LINKEDIT` lazy-bind stream (Sprint 15 wires this).
64
+
65
+Needs `___dyld_private` (local anchor) and `_dyld_stub_binder` (dylib import from `libSystem`).
66
+
67
+### 5. Binding strategy
68
+- `_dyld_stub_binder` is imported from `libSystem`. Gets a GOT entry; no stub (we take its address directly).
69
+- `___dyld_private` is a 0-filled 8-byte slot in `__DATA,__data`. Not exported. Dyld uses it as scratch during binding.
70
+
71
+### 6. Reloc rewiring
72
+Relocation application pass (Sprint 11 + this):
73
+
74
+- `GOT_LOAD_PAGE21` / `GOT_LOAD_PAGEOFF12` → target = GOT slot address.
75
+- `POINTER_TO_GOT` → target = GOT slot address (used for 32-bit pointer-to-GOT references).
76
+- `BRANCH26` to a dylib import → target = stub address.
77
+- `BRANCH26` to a Defined → target unchanged (direct call).
78
+- `BRANCH26` to an Undefined resolved via `-undefined dynamic_lookup` → target = stub address.
79
+
80
+### 7. Indirect symbol table
81
+`__LINKEDIT` indirect-symbol table = list of u32 symbol-table indices, used by dyld to map each stub / lazy pointer / GOT slot entry back to its symbol. Populated here, pointed at by `LC_DYSYMTAB.indirectsymoff`.
82
+
83
+### 8. Weak-import dylib functions
84
+`weak_import` symbols get stubs whose lazy binding opcode sequence includes the `BIND_SYMBOL_FLAGS_WEAK_IMPORT` flag. At runtime, if the symbol is missing, dyld patches the lazy pointer to 0 instead of erroring. The call site must test for null before branching — that's user code's responsibility.
85
+
86
+## Testing Strategy
87
+- Hello-world staging fixture: a `.o` that calls `_printf` + references `_errno`. Produces `__stubs`, `__la_symbol_ptr`, `__stub_helper`, `__got` in the expected order and sizes.
88
+- Differential: stub/lazy-pointer/GOT layout byte-identical to `ld` on the staging fixture.
89
+- Reloc-rewire test: `BRANCH26` to a dylib-imported function lands in the stub, not in the dylib directly.
90
+- Disassembly test: `otool -v -t` on `__stubs` matches the expected three-instruction sequence for every entry.
91
+
92
+## Definition of Done
93
+- GOT, stubs, lazy pointers, stub helper all emitted with correct flags and `reserved*` fields.
94
+- Indirect symbol table populated correctly (Sprint 14 consumes it).
95
+- BRANCH26-to-dylib correctly rewired to stubs.
96
+- Differential pass on the staging hello-world fixture.
.docs/sprints/sprint13.mdadded
@@ -0,0 +1,64 @@
1
+# Sprint 13: TLV Relocations
2
+
3
+## Prerequisites
4
+Sprint 12 — GOT-like synthesis patterns established.
5
+
6
+## Goals
7
+Support thread-local variables: the full chain from afs-as's `__thread_vars` / `__thread_data` / `__thread_bss` through `__DATA,__thread_ptrs` into the ARM64 TLV runtime call. `TLVP_LOAD_PAGE21` and `TLVP_LOAD_PAGEOFF12` relocations applied correctly.
8
+
9
+## Deliverables
10
+
11
+### 1. TLV descriptor layout
12
+Apple's TLV model: each TLV gets a 3-word descriptor in `__DATA,__thread_vars` (section type `S_THREAD_LOCAL_VARIABLES`, 0x13):
13
+
14
+```
15
+u64 thunk_addr;   // pointer to tlv_get_addr (libSystem) — rebased/bound at load
16
+u64 key;          // pthread_key_t, set to 0 initially
17
+u64 offset;       // offset of the variable's initial data within __thread_data
18
+```
19
+
20
+afs-as emits the descriptor template (thunk_addr = 0, key = 0, offset = section-relative to `__thread_data` or `__thread_bss`). afs-ld:
21
+
22
+- Patches `thunk_addr` to reference `_tlv_bootstrap` (from libSystem) via `__DATA,__thread_ptrs`.
23
+- Leaves `key = 0` (runtime initializes on first access).
24
+- Adjusts `offset` to be the final VM offset into the laid-out `__thread_data` / `__thread_bss` section.
25
+
26
+### 2. `__DATA,__thread_ptrs` synth
27
+Section type `S_THREAD_LOCAL_VARIABLE_POINTERS` (0x16). Contains non-lazy pointers to the TLV thunk function (`_tlv_bootstrap` from libSystem). One 8-byte entry per imported TLV thunk. Equivalent to the GOT for TLVs.
28
+
29
+### 3. TLVP reloc application
30
+`TLVP_LOAD_PAGE21` and `TLVP_LOAD_PAGEOFF12` resolve to a `__thread_ptrs` entry (not a `__thread_vars` descriptor directly). The thread-local access sequence afs-as emits:
31
+
32
+```
33
+ADRP  x0, _tlv@TLVPPAGE
34
+LDR   x0, [x0, _tlv@TLVPPAGEOFF]   ; x0 = &thread_ptrs[_tlv]
35
+LDR   x1, [x0]                       ; x1 = &tlv_descriptor (actually thunk ptr!)
36
+BLR   x1                              ; returns address of TLV body in x0
37
+```
38
+
39
+Wait — re-check Apple's TLV ABI. The correct sequence: `__thread_ptrs` entry is a pointer to the TLV descriptor (the 3-word thing in `__thread_vars`). The sequence loads `[desc+0]` = thunk, `[desc+8]` = key, and calls the thunk with the descriptor address in `x0`. The thunk reads the key, calls `pthread_getspecific` if needed, and returns the body address. Verify against reference in `.refs/ld64/` before coding.
40
+
41
+### 4. Coordinate with afs-as section layout
42
+afs-as emits:
43
+- `__DATA,__thread_data` (S_THREAD_LOCAL_REGULAR, 0x11): TLV initializers.
44
+- `__DATA,__thread_bss` (S_THREAD_LOCAL_ZEROFILL, 0x12): zero-initialized TLVs.
45
+- `__DATA,__thread_vars` (S_THREAD_LOCAL_VARIABLES, 0x13): descriptors.
46
+
47
+afs-ld preserves these three sections and adds `__DATA,__thread_ptrs` (S_THREAD_LOCAL_VARIABLE_POINTERS, 0x16).
48
+
49
+### 5. `_tlv_bootstrap` import
50
+Auto-injected as an undefined symbol (if any TLV descriptor needs it), resolves from `libSystem`. Its GOT-equivalent entry lives in `__DATA,__thread_ptrs`, not `__DATA_CONST,__got` (TLV has its own indirection).
51
+
52
+### 6. Zero TLVs early-out
53
+If no input section has `S_THREAD_LOCAL_*` contents and no reloc has a TLVP kind, emit no TLV sections at all.
54
+
55
+## Testing Strategy
56
+- Fixture: a `.f90` with `THREADPRIVATE` → `.o` with `__thread_vars`, `__thread_data`, `__thread_bss` and TLVP relocs. Link with afs-ld and with `ld`. Diff the resulting TLV descriptors, `__thread_ptrs`, and reloc-patched bytes.
57
+- Runtime test: link a tiny C program that reads a TLV via the Apple TLV ABI sequence, run it, check output.
58
+- Zero-TLV fixture: no TLV sections leak into the output.
59
+
60
+## Definition of Done
61
+- `TLVP_LOAD_*` relocs apply correctly.
62
+- `__thread_ptrs` emitted with correct type flag and entries.
63
+- `_tlv_bootstrap` imported only when needed.
64
+- Runtime test loads and reads a TLV correctly under afs-ld.
.docs/sprints/sprint14.mdadded
@@ -0,0 +1,77 @@
1
+# Sprint 14: LC_SYMTAB / LC_DYSYMTAB / String Table
2
+
3
+## Prerequisites
4
+Sprints 10–13 — layout, relocs, GOT/stubs/TLV all emitted.
5
+
6
+## Goals
7
+Build the final symbol table, string table, and `LC_DYSYMTAB` partitioning expected by dyld. Byte-level matches with `ld`'s layout on simple inputs.
8
+
9
+## Deliverables
10
+
11
+### 1. Symbol table partitioning
12
+dyld requires symbols in this order inside `LC_SYMTAB`:
13
+
14
+1. **Locals** (ilocalsym..ilocalsym+nlocalsym): private Defined + `N_PEXT` private-external symbols, debug stabs (we have none from afs-as today), `N_STAB` entries.
15
+2. **External defined** (iextdefsym..iextdefsym+nextdefsym): `N_EXT` Defined symbols sorted by name for dyld lookups.
16
+3. **Undefined** (iundefsym..iundefsym+nundefsym): dylib imports, sorted by name.
17
+
18
+`LC_DYSYMTAB` records each partition's start and count.
19
+
20
+### 2. Symbol entry construction
21
+Per output symbol, emit an `nlist_64`:
22
+
23
+```
24
+strx:    offset into the string table
25
+n_type:  N_SECT | N_EXT for external Defined;
26
+         N_SECT | N_PEXT for private Defined;
27
+         N_UNDF | N_EXT for undefined / dylib import;
28
+         N_ABS for absolute
29
+n_sect:  1-based index into the section-table-in-header order; 0 for UNDF/ABS
30
+n_desc:  for UNDF: high 16 bits = library ordinal (1-based) or special (0..-3)
31
+         N_WEAK_REF | N_WEAK_DEF | N_NO_DEAD_STRIP as appropriate
32
+n_value: Defined's VM address; UNDF = 0
33
+```
34
+
35
+Two-level namespace: every dylib-imported symbol gets the ordinal of its DylibFile in `n_desc`'s high 16 bits. Flat lookup = 0; special ordinals per `<mach-o/nlist.h>`.
36
+
37
+### 3. String table
38
+- Starts with a null byte at offset 0 (dyld-required).
39
+- All symbol names follow, each null-terminated.
40
+- Suffix-dedup like afs-as: sort names by reverse-lexicographic suffix order and reuse trailing bytes where possible. Cheap space win and preserves the style contract with afs-as.
41
+- 8-byte pad at end.
42
+
43
+### 4. Indirect symbol table
44
+Already populated by Sprint 12 via GOT/stubs/lazy-pointers. Lives in `__LINKEDIT`, pointed at by `LC_DYSYMTAB.indirectsymoff / nindirectsyms`. Each entry is a u32:
45
+- Symbol-table index for symbols in `__stubs`, `__la_symbol_ptr`, `__got`.
46
+- Special sentinel `INDIRECT_SYMBOL_LOCAL=0x80000000` for entries pointing at local symbols (not exported).
47
+- Special sentinel `INDIRECT_SYMBOL_ABS=0x40000000` for absolute symbols.
48
+
49
+### 5. Local-symbol stripping (`-x`)
50
+`ld` supports `-x` to strip local symbols from the output. We record the flag (Sprint 19 wires CLI) and at emission time drop locals from the symbol table. Relocs (if any were external-only) and debug info are unaffected. If `-x` is not set, emit all locals.
51
+
52
+### 6. Relocations in the output
53
+For `MH_EXECUTE` and `MH_DYLIB`, dyld-era outputs don't emit per-section relocations — `LC_DYLD_INFO` (or chained fixups) does that job. afs-ld writes zero `nreloc`/`reloff` on output sections. (For `MH_OBJECT`, which we're not emitting, it would matter.)
54
+
55
+### 7. File-offset sequencing in __LINKEDIT
56
+`__LINKEDIT` data layout order ld uses (we match for differential ease):
57
+1. Chained fixups blob (if present) or dyld-info opcode streams.
58
+2. Function starts blob.
59
+3. Data-in-code blob.
60
+4. Symbol table (`nsyms * 16` bytes).
61
+5. Indirect symbol table.
62
+6. String table.
63
+7. Code signature.
64
+
65
+Each block aligned to 8 bytes; `__LINKEDIT` itself page-aligned. `LC_SYMTAB`, `LC_DYSYMTAB`, `LC_FUNCTION_STARTS`, `LC_DATA_IN_CODE`, `LC_DYLD_INFO_ONLY`, `LC_CODE_SIGNATURE` all point into this region with file offsets.
66
+
67
+## Testing Strategy
68
+- Build a fixture with one local, one external Defined, one undefined dylib import. Verify `LC_SYMTAB` / `LC_DYSYMTAB` partitions match `ld`'s output exactly.
69
+- String-table dedup: two symbols `_afs_array_sum` and `_array_sum` share suffix bytes.
70
+- Two-level namespace ordinals assigned in load-command order; mismatches produce hard errors when the dylib isn't listed.
71
+- Differential: symbol-table byte-level match for every staging fixture.
72
+
73
+## Definition of Done
74
+- `nm -v` output identical (modulo address offsets allowed by differential harness) between afs-ld and `ld` on all staging fixtures.
75
+- `LC_DYSYMTAB` partition boundaries exact.
76
+- Indirect symbol table entries point to the correct nlist indices for stubs, lazy pointers, and GOT.
77
+- String-table byte length within 5% of `ld`'s (suffix-dedup parity).
.docs/sprints/sprint15.mdadded
@@ -0,0 +1,101 @@
1
+# Sprint 15: Classic LC_DYLD_INFO Opcodes
2
+
3
+## Prerequisites
4
+Sprints 12, 14 — GOT/stubs/lazy-pointers in place, symbol table shaped.
5
+
6
+## Goals
7
+Generate the four ULEB128 opcode streams and the export trie that dyld reads via `LC_DYLD_INFO_ONLY`. This is the classic format (macOS 11–13 default) and the `-no_fixup_chains` path on newer macOS. Chained fixups land in Sprint 15.5.
8
+
9
+## Deliverables
10
+
11
+### 1. The five streams
12
+
13
+`LC_DYLD_INFO_ONLY` load command points at five blobs in `__LINKEDIT`:
14
+- **rebase_off / rebase_size**: rebase opcodes — fix up absolute pointers for ASLR slide.
15
+- **bind_off / bind_size**: bind opcodes — non-lazy imports from dylibs.
16
+- **weak_bind_off / weak_bind_size**: weak-bind opcodes — C++-style weak symbol coalescing at runtime.
17
+- **lazy_bind_off / lazy_bind_size**: lazy-bind opcodes — one block per stub_helper entry.
18
+- **export_off / export_size**: export trie — what this image exports to other images.
19
+
20
+### 2. Opcode encoder
21
+`afs-ld/src/synth/dyld_info.rs`:
22
+
23
+```rust
24
+pub struct OpcodeStream { buf: Vec<u8> }
25
+
26
+impl OpcodeStream {
27
+    pub fn uleb(&mut self, v: u64);
28
+    pub fn sleb(&mut self, v: i64);
29
+    pub fn string(&mut self, s: &str);   // null-terminated
30
+    pub fn byte(&mut self, op_and_imm: u8);
31
+    pub fn done(&mut self);              // terminating REBASE_OPCODE_DONE / BIND_OPCODE_DONE
32
+}
33
+```
34
+
35
+Opcode byte = (opcode_nibble << 4) | imm_nibble.
36
+
37
+### 3. Rebase stream
38
+For every absolute pointer in output `__DATA` / `__DATA_CONST` (an `Unsigned` reloc or a GOT entry resolved to a local address), emit rebase opcodes:
39
+
40
+```
41
+REBASE_OPCODE_SET_TYPE_IMM(REBASE_TYPE_POINTER)
42
+REBASE_OPCODE_SET_SEGMENT_AND_OFFSET_ULEB(seg_idx, offset_within_seg)
43
+REBASE_OPCODE_DO_REBASE_ULEB_TIMES(count) or _IMM(count)
44
+```
45
+
46
+Batching: consecutive rebases collapse into single `_ULEB_TIMES`; strided rebases use `_ULEB_TIMES_SKIPPING_ULEB`. Matching ld's batching is what keeps the differential harness happy.
47
+
48
+### 4. Non-lazy bind stream
49
+For every GOT entry pointing at a dylib import:
50
+
51
+```
52
+BIND_OPCODE_SET_DYLIB_ORDINAL_IMM(ordinal) or _ULEB(ordinal)
53
+BIND_OPCODE_SET_SYMBOL_TRAILING_FLAGS_IMM(flags) + <name>\0
54
+BIND_OPCODE_SET_TYPE_IMM(BIND_TYPE_POINTER)
55
+BIND_OPCODE_SET_SEGMENT_AND_OFFSET_ULEB(seg_idx, offset)
56
+BIND_OPCODE_DO_BIND
57
+```
58
+
59
+Flags: `BIND_SYMBOL_FLAGS_WEAK_IMPORT`, `BIND_SYMBOL_FLAGS_NON_WEAK_DEFINITION`.
60
+
61
+### 5. Weak bind stream
62
+For symbols that participate in weak coalescing across the program (weak defs that can be overridden by other images). For armfortas today this is empty; fortsh may or may not need it. Emit a terminator-only stream by default.
63
+
64
+### 6. Lazy bind stream
65
+One block per stub_helper entry (one dylib-imported callable per stub). Each block:
66
+```
67
+BIND_OPCODE_SET_SEGMENT_AND_OFFSET_ULEB(seg_idx_of_la_symbol_ptr, offset_of_this_slot)
68
+BIND_OPCODE_SET_DYLIB_ORDINAL_IMM/ULEB(ordinal)
69
+BIND_OPCODE_SET_SYMBOL_TRAILING_FLAGS_IMM(flags) + <name>\0
70
+BIND_OPCODE_DO_BIND
71
+BIND_OPCODE_DONE
72
+```
73
+
74
+The stub_helper entry pushes the byte offset of its block; `dyld_stub_binder` reads from that offset, interprets the block, patches the lazy pointer.
75
+
76
+### 7. Export trie
77
+Rooted at `__LINKEDIT[export_off]`. Built from the output's external Defined symbols (including re-exports from dylibs we re-export). Tree construction:
78
+
79
+- Collect `(name, ExportEntry)` pairs.
80
+- Build a prefix trie.
81
+- Emit depth-first: each node = ULEB terminal-size, optional terminal payload (flags + address ULEB), child-count, (edge_string, child_offset) pairs.
82
+- Child offsets are fixed up in a second pass once sizes are known.
83
+
84
+Terminal payload formats:
85
+- Regular: `flags ULEB | address_from_file_start ULEB`.
86
+- Re-export: `flags ULEB | dylib_ordinal ULEB | imported_name\0`.
87
+- Stub-and-resolver: `flags ULEB | stub_addr ULEB | resolver_addr ULEB`.
88
+
89
+### 8. Stream-size determinism
90
+Every stream must be deterministic across invocations given identical inputs. Sort keys everywhere, no hashmap iteration order.
91
+
92
+## Testing Strategy
93
+- Differential: for every staging fixture, afs-ld and `ld` produce byte-identical opcode streams after normalizing any tolerated differences.
94
+- Unit tests for ULEB128 encoding at boundary values (0, 127, 128, 16383, 16384, big).
95
+- Export-trie walker (Sprint 5's `DylibFile::exports` reader) round-trips our emitted tries: emit a trie, parse it back, every name resolves.
96
+
97
+## Definition of Done
98
+- All five streams emitted correctly.
99
+- Export trie round-trips through our own reader.
100
+- Differential byte-level parity with `ld` on 10+ staging fixtures.
101
+- Opcode emission is deterministic.
.docs/sprints/sprint15_5.mdadded
@@ -0,0 +1,100 @@
1
+# Sprint 15.5: Chained Fixups (LC_DYLD_CHAINED_FIXUPS)
2
+
3
+## Prerequisites
4
+Sprint 15 — classic dyld-info format working.
5
+
6
+## Goals
7
+Emit the modern `LC_DYLD_CHAINED_FIXUPS` format, introduced in macOS 12 and mandatory on arm64e. `LC_DYLD_EXPORTS_TRIE` pairs with it for the export side. Coexists with Sprint 15 under `-fixup_chains` / `-no_fixup_chains`; chained becomes default once Sprint 27's parity gate clears.
8
+
9
+## Deliverables
10
+
11
+### 1. Chained-fixups header
12
+`__LINKEDIT` blob pointed at by `LC_DYLD_CHAINED_FIXUPS`:
13
+
14
+```
15
+struct dyld_chained_fixups_header {
16
+    uint32 fixups_version;     // 0
17
+    uint32 starts_offset;      // offset of dyld_chained_starts_in_image
18
+    uint32 imports_offset;     // offset of imports table
19
+    uint32 symbols_offset;     // offset of symbol strings
20
+    uint32 imports_count;
21
+    uint32 imports_format;     // DYLD_CHAINED_IMPORT (1), _ADDEND (2), or _ADDEND64 (3)
22
+    uint32 symbols_format;     // 0 = uncompressed
23
+}
24
+```
25
+
26
+### 2. Per-segment fixup starts
27
+```
28
+struct dyld_chained_starts_in_image {
29
+    uint32 seg_count;
30
+    uint32 seg_info_offset[seg_count];   // 0 = no fixups in this segment
31
+}
32
+
33
+struct dyld_chained_starts_in_segment {
34
+    uint32 size;
35
+    uint16 page_size;               // 0x4000 on arm64 Apple Silicon
36
+    uint16 pointer_format;          // DYLD_CHAINED_PTR_64 (2) or _64_OFFSET (6)
37
+    uint64 segment_offset;
38
+    uint32 max_valid_pointer;       // 0 for 64-bit
39
+    uint16 page_count;
40
+    uint16 page_start[page_count];   // offset of first chain within each page (0xFFFF = no chain)
41
+}
42
+```
43
+
44
+### 3. Pointer formats
45
+arm64 uses `DYLD_CHAINED_PTR_64` (plain 64-bit) or `DYLD_CHAINED_PTR_64_OFFSET` (offsets from image base). arm64e uses `DYLD_CHAINED_PTR_ARM64E` (with auth bits); skip arm64e for now. Each chained pointer is a 64-bit word with fields:
46
+
47
+```
48
+bind:  31-bit ordinal | 1-bit bind | 8-bit next | 1-bit auth=0
49
+rebase: 36-bit target | 19-bit high | 1-bit bind=0 | 8-bit next | 1-bit auth=0
50
+```
51
+
52
+The `next` field is the distance in 4-byte units from this fixup to the next one within the page (0 = end of chain). Rebuilding chains after layout is the bulk of this sprint.
53
+
54
+### 4. Imports table
55
+One entry per imported symbol. `DYLD_CHAINED_IMPORT` format:
56
+```
57
+uint32 lib_ordinal : 8;        // dylib ordinal
58
+uint32 weak_import : 1;
59
+uint32 name_offset : 23;       // into the symbol strings blob
60
+```
61
+
62
+`_ADDEND` (32-bit) and `_ADDEND64` (64-bit) formats add an explicit addend — we pick the smallest that fits our inputs (ADDEND64 only if any addend exceeds i32 range).
63
+
64
+### 5. Chain construction
65
+Walk every output fixup (rebase or bind), grouped by segment and page. Within a page, chain them in ascending file offset; the `next` field of each points at the next. Pages with no fixups set `page_start = 0xFFFF`. Validate that no chain ever crosses a page boundary.
66
+
67
+### 6. Exports trie → LC_DYLD_EXPORTS_TRIE
68
+The export trie is unchanged from Sprint 15's format. In chained-fixups mode the trie lives under `LC_DYLD_EXPORTS_TRIE` instead of inside `LC_DYLD_INFO_ONLY`.
69
+
70
+### 7. CLI flag wiring
71
+`-fixup_chains` forces chained, `-no_fixup_chains` forces classic. Default policy:
72
+- If `-platform_version macos` minimum ≥ 12.0: chained.
73
+- Otherwise: classic.
74
+
75
+Sprint 19's CLI sprint consumes these flags; this sprint just implements both paths.
76
+
77
+### 8. Removing `__stub_helper` under chained
78
+Chained fixups don't use lazy binding — `__stub_helper` is unnecessary, `__la_symbol_ptr` becomes an ordinary bind slot in `__DATA` or `__AUTH_DATA`. Under chained mode the writer skips emitting `__stub_helper` and wires `BRANCH26` through a different stub that loads directly from the bind slot.
79
+
80
+Modified ARM64 stub for chained:
81
+```
82
+ADRP x16, _symbol@PAGE
83
+LDR  x16, [x16, _symbol@PAGEOFF]
84
+BR   x16
85
+```
86
+
87
+Where `_symbol@PAGE/PAGEOFF` resolves to the bind slot in `__DATA`. Dyld has already bound it by the time the stub runs.
88
+
89
+## Testing Strategy
90
+- Parity vs `ld -fixup_chains` on staging fixtures: byte-identical chain layout and imports table.
91
+- Parity vs `ld -no_fixup_chains`: retains Sprint 15 byte-identical output.
92
+- Page-boundary test: a fixup at the last 4 bytes of a page followed by one at byte 0 of the next page — each in its own chain, both reachable.
93
+- Default-policy test: `-platform_version macos 11.0` → classic, `-platform_version macos 12.0` → chained.
94
+- Runtime test: binaries linked both ways load and execute correctly on an M-series Mac.
95
+
96
+## Definition of Done
97
+- Both `-fixup_chains` and `-no_fixup_chains` produce runnable binaries.
98
+- Chain layout byte-identical to `ld` on 10+ staging fixtures.
99
+- Default format switches on `-platform_version`.
100
+- `__stub_helper` correctly omitted in chained mode.
.docs/sprints/sprint16.mdadded
@@ -0,0 +1,51 @@
1
+# Sprint 16: LC_FUNCTION_STARTS & LC_DATA_IN_CODE
2
+
3
+## Prerequisites
4
+Sprint 14 — `__LINKEDIT` layout sequencing; Sprint 11 — atoms placed.
5
+
6
+## Goals
7
+Emit the two small `__LINKEDIT` blobs used by debuggers, disassemblers, and the dynamic loader: `LC_FUNCTION_STARTS` (delta-encoded entry points) and `LC_DATA_IN_CODE` (markers for data embedded in `__text`).
8
+
9
+## Deliverables
10
+
11
+### 1. LC_FUNCTION_STARTS
12
+
13
+Format: a single stream of ULEB128 deltas. First ULEB = offset from the Mach-O header to the first function entry. Each subsequent ULEB = delta from the previous entry. A terminating `0` ends the stream. 8-byte aligned.
14
+
15
+Source: every atom in `__TEXT,__text` plus `.alt_entry` chain members. Exclude atoms from `__stubs` and `__stub_helper` — ld doesn't list those.
16
+
17
+### 2. LC_DATA_IN_CODE
18
+
19
+Format: a packed array of:
20
+```
21
+struct data_in_code_entry {
22
+    uint32 offset;    // from Mach-O header
23
+    uint16 length;    // bytes
24
+    uint16 kind;      // DICE_KIND_DATA=1, _JUMP_TABLE8=2, _JUMP_TABLE16=3,
25
+                      //           _JUMP_TABLE32=4, _ABS_JUMP_TABLE32=5
26
+}
27
+```
28
+
29
+Source: per-input `LC_DATA_IN_CODE` blocks. Remap each entry's offset from its input-section base to the final output VM address. Entries sorted by offset.
30
+
31
+afs-as doesn't emit jump tables today, but we preserve whatever the input has so future C/Objective-C objects with jump tables survive linking.
32
+
33
+### 3. Sorting determinism
34
+
35
+Function starts: strictly ascending by VM address. Data-in-code: strictly ascending by output offset. Ties resolved by input command-line order.
36
+
37
+### 4. Integration with `__LINKEDIT` layout (Sprint 14)
38
+
39
+Both blobs get file offsets assigned after chained fixups / classic dyld-info but before the symbol table. Pointed at by their respective load commands with `dataoff / datasize`.
40
+
41
+## Testing Strategy
42
+
43
+- Differential: function starts list byte-identical between afs-ld and `ld` on every staging fixture.
44
+- Data-in-code: fixture with a jump table input; entries survive linking with correct remapped offsets.
45
+- Empty output: fixtures with no functions produce zero-byte LC_FUNCTION_STARTS (actually: ld still emits a terminator? check) and absent LC_DATA_IN_CODE when no input had data-in-code.
46
+
47
+## Definition of Done
48
+
49
+- LC_FUNCTION_STARTS parity with `ld` on every staging fixture.
50
+- LC_DATA_IN_CODE entries remapped correctly across linking.
51
+- Both blobs placed in the right `__LINKEDIT` slot per Sprint 14.
.docs/sprints/sprint17.mdadded
@@ -0,0 +1,90 @@
1
+# Sprint 17: Unwind Info
2
+
3
+## Prerequisites
4
+Sprints 9, 10, 11 — atoms, output layout, reloc application.
5
+
6
+## Goals
7
+Synthesize `__TEXT,__unwind_info` from per-function `__compact_unwind` records that afs-as already emits. Pass `__TEXT,__eh_frame` through as the DWARF fallback path. Without this sprint, `_Unwind_Backtrace`, C++ exceptions, and some system panics produce garbage or abort.
8
+
9
+## Deliverables
10
+
11
+### 1. Input: afs-as `__compact_unwind`
12
+
13
+afs-as emits one 32-byte record per function:
14
+```
15
+uint64 function_start;    // reloc to function atom
16
+uint32 code_len;
17
+uint32 encoding;          // ARM64 compact-unwind encoding (UNWIND_ARM64_MODE_*)
18
+uint64 personality;       // reloc to personality function or 0
19
+uint64 lsda;              // reloc to LSDA or 0
20
+```
21
+
22
+ARM64 encoding nibbles (`UNWIND_ARM64_MODE_MASK = 0x0F000000`):
23
+- `UNWIND_ARM64_MODE_FRAMELESS = 0x02000000` (+ stack size in 16-byte units)
24
+- `UNWIND_ARM64_MODE_DWARF = 0x03000000` (falls back to __eh_frame)
25
+- `UNWIND_ARM64_MODE_FRAME = 0x04000000` (+ saved-register bitfield for x19-x28, d8-d15)
26
+
27
+### 2. `__TEXT,__unwind_info` layout
28
+
29
+Complex, but structured. Header:
30
+```
31
+uint32 version;                   // UNWIND_SECTION_VERSION = 1
32
+uint32 common_encodings_offset;
33
+uint32 common_encodings_count;
34
+uint32 personalities_offset;
35
+uint32 personalities_count;
36
+uint32 indices_offset;            // first-level index
37
+uint32 indices_count;
38
+```
39
+
40
+Then three variable-length arrays:
41
+
42
+1. **Common encodings**: up to 127 most-frequent 32-bit encodings. Lookups in per-page tables reference them by index instead of repeating the 32-bit value.
43
+2. **Personalities**: array of 32-bit offsets from mach header to each personality function (usually `___gxx_personality_v0` or `___objc_personality_v0`).
44
+3. **First-level indices**: `(function_offset, second_level_page_offset, lsda_index_offset)` triples, one per page worth of functions. Last entry is a sentinel with function_offset = text section end.
45
+
46
+Then **second-level pages** — one per first-level index — each starting with a kind tag:
47
+- `UNWIND_SECOND_LEVEL_REGULAR = 2`: array of `(function_offset, encoding)` pairs. Larger, uncompressed.
48
+- `UNWIND_SECOND_LEVEL_COMPRESSED = 3`: delta-encoded `(function_delta, encoding_index)` pairs in 32 bits each; encoding_index ≤ 127 indexes common encodings, ≥ 128 indexes a page-local encodings array.
49
+
50
+Plus an **LSDA table**: sorted `(function_offset, lsda_offset)` pairs for functions that have LSDAs.
51
+
52
+### 3. Construction algorithm
53
+
54
+1. Gather input `__compact_unwind` records; remap function_start to output VM.
55
+2. Sort by function_start.
56
+3. Tally encoding frequencies; pick top 127 as common encodings.
57
+4. Walk the sorted list, packing up to `pageSize/4 - header` records per compressed page (ld uses 4 KB pages here, ~1020 entries max).
58
+5. Records with DWARF encoding: defer to `__eh_frame` — we still emit them but dyld's unwinder will follow the encoding to DWARF.
59
+6. Write the three top arrays, the per-page second-level tables, and the LSDA index.
60
+
61
+### 4. `__eh_frame` pass-through
62
+
63
+afs-as emits DWARF CIEs and FDEs in `__TEXT,__eh_frame`. We don't re-encode — we concatenate per-input `__eh_frame` contents, adjust personality function references (LC_SUBTRACTOR pairs), and emit. CIE deduplication is a nice-to-have (Sprint 30); for this sprint we pass through without deduping.
64
+
65
+### 5. Coordination with dead-strip
66
+
67
+If Sprint 23 removes a function, its compact-unwind and eh_frame records must go too. Compact-unwind atoms are already `parent_of` linked to function atoms from Sprint 9; Sprint 23 walks that link. Eh_frame FDEs similarly reference their function via a SUBTRACTOR pair — when the function atom dies, strip the FDE.
68
+
69
+### 6. Correctness validation
70
+
71
+After writing, we can re-read our own `__unwind_info` (write a tiny walker) and verify:
72
+- Every function in `__text` is represented (either in compact form or with DWARF encoding).
73
+- Every personality/LSDA reference resolves to a valid VM address.
74
+- First-level index is strictly ascending.
75
+- Second-level compressed encoding_index < common_count + 255.
76
+
77
+## Testing Strategy
78
+
79
+- Fixture from afs-as emitting a function with prologue (`stp x29, x30, [sp, #-16]!`) → compact-unwind FRAME encoding. Parity byte-level with `ld`.
80
+- Function with no prologue (leaf) → FRAMELESS encoding with size 0.
81
+- Function that falls back to DWARF → DWARF encoding, associated FDE survives in `__eh_frame`.
82
+- C++ fixture compiled by clang (C interop via iso_c_binding is in-scope for armfortas) — personality + LSDA survive; `try/throw/catch` still works when executed.
83
+- Backtrace test: program calls `backtrace()` from execinfo.h; output lists the right function names.
84
+
85
+## Definition of Done
86
+
87
+- `__unwind_info` byte-identical to `ld` on staging fixtures with prologues, leaves, and DWARF fallbacks.
88
+- `__eh_frame` passthrough preserves all FDEs with correct personality/LSDA references.
89
+- Backtraces produce real symbolic names on a binary linked by afs-ld.
90
+- C++ exceptions (via clang input) unwind correctly when linked by afs-ld.
.docs/sprints/sprint18.mdadded
@@ -0,0 +1,71 @@
1
+# Sprint 18: HELLO WORLD MILESTONE (Executable)
2
+
3
+## Prerequisites
4
+Sprints 0–17 — full read, resolve, atomize, layout, reloc-apply, dyld-info, function-starts, unwind pipeline.
5
+
6
+## Goals
7
+Produce a runnable arm64 PIE executable. afs-ld takes `hello.o + libarmfortas_rt.a + libSystem.tbd + -e _main` and emits a binary that, when executed on an M-series Mac, prints "Hello, World!". Not a demo — an exit criterion for everything that came before.
8
+
9
+## Deliverables
10
+
11
+### 1. Staging fixture
12
+`tests/corpus/hello/` contains:
13
+- `hello.f90`: trivial Fortran program with `print *, "Hello, World!"`.
14
+- `hello.o`: assembled by afs-as from armfortas's `hello.s` output.
15
+- Expected output on run: `Hello, World!\n` (Fortran print adds a leading blank on some paths — match what armfortas currently produces).
16
+
17
+### 2. End-to-end link
18
+Invocation:
19
+```
20
+afs-ld hello.o libarmfortas_rt.a \
21
+  -lSystem -syslibroot "$(xcrun --show-sdk-path)" \
22
+  -e _main -no_uuid -platform_version macos 11.0 14.0 \
23
+  -o hello
24
+```
25
+
26
+Expected:
27
+- Output file passes `file hello` → `Mach-O 64-bit executable arm64`.
28
+- `otool -lV hello` accepts without errors and shows `LC_MAIN`, expected segments, dylibs.
29
+- `codesign -dv hello` reports no signature (Sprint 22 adds ad-hoc signing — binaries still run unsigned on Apple Silicon **from** Xcode or a trusted source, but `./hello` from a Terminal prompt requires signing. Document this caveat.).
30
+- `./hello` (after the Sprint 22 signature) prints the expected string and exits 0.
31
+
32
+### 3. Differential gate
33
+`tests/hello_world.rs`:
34
+```rust
35
+let ours = link_with_afs_ld(&inputs, &args);
36
+let theirs = link_with_system_ld(&inputs, &args);
37
+let diff = diff_macho(&ours, &theirs);
38
+assert!(diff.critical.is_empty(), "critical diffs: {:#?}", diff.critical);
39
+```
40
+
41
+Allowed tolerated diffs:
42
+- UUID bytes (we emit zero with `-no_uuid`, ld may or may not — both should honor `-no_uuid`).
43
+- String table ordering within a partition as long as every symbol resolves to the same address.
44
+- LC_DYLD_INFO vs LC_DYLD_CHAINED_FIXUPS if defaults disagree — gate with the same `-fixup_chains` / `-no_fixup_chains` flag.
45
+
46
+### 4. Load-command `otool -lV` parity
47
+Run `otool -lV` on both outputs; diff should be empty after normalizing absolute file offsets (ld and afs-ld may interleave `__LINKEDIT` regions slightly differently — document and justify any remaining diffs).
48
+
49
+### 5. Execution gate (requires Sprint 22's ad-hoc signing, but staged here)
50
+Two cases:
51
+- Unsigned path: `./hello` fails with "killed: 9" (correct Apple Silicon behavior); `codesign -s - hello && ./hello` works.
52
+- Once Sprint 22 lands, afs-ld's own output is signed ad-hoc and `./hello` works directly.
53
+
54
+This sprint declares success as soon as `codesign -s - hello && ./hello` prints the expected string.
55
+
56
+### 6. Audit
57
+Brutal audit after Sprint 18. Same rules as armfortas audits:
58
+- No "placeholder" or "stub" explanations that hide wrong output.
59
+- Test every claim. Wrong output from a linker is critical.
60
+- If the binary runs but produces extra or missing newlines, investigate — don't rationalize.
61
+
62
+## Testing Strategy
63
+- `tests/hello_world.rs` runs the differential gate.
64
+- `tests/hello_world_run.rs` executes the binary (gated on CI-locally-on-Mac) and asserts stdout.
65
+- Regression fixtures: any hello-world variant that once broke gets its own test.
66
+
67
+## Definition of Done
68
+- `tests/hello_world.rs` passes — zero critical diffs vs `ld`.
69
+- Binary runs and produces correct output (after `codesign -s - hello` pre-Sprint-22).
70
+- `otool -lV` diff empty after documented normalizations.
71
+- Audit passes.
.docs/sprints/sprint18_5.mdadded
@@ -0,0 +1,55 @@
1
+# Sprint 18.5: HELLO LIBRARY MILESTONE (Dylib)
2
+
3
+## Prerequisites
4
+Sprint 18 — executable path works end-to-end.
5
+
6
+## Goals
7
+Validate `MH_DYLIB` output end-to-end. afs-ld emits a dylib that `dlopen`/`dlsym` can load and a minimal C or Fortran harness can call into. Proves that every dylib-specific decision in Sprints 10–17 is actually correct.
8
+
9
+## Deliverables
10
+
11
+### 1. Staging fixture
12
+`tests/corpus/hello_library/`:
13
+- `foo.f90`: module exporting a single `foo_add(a, b) -> c` interoperable procedure.
14
+- `foo.o`: assembled object.
15
+- `caller.c`: `int main() { void *h = dlopen("./libfoo.dylib", RTLD_NOW); int (*f)(int,int) = dlsym(h, "foo_add"); printf("%d\n", f(2, 3)); }`.
16
+- Expected runtime output: `5`.
17
+
18
+### 2. Dylib link invocation
19
+```
20
+afs-ld -dylib foo.o libarmfortas_rt.a \
21
+  -lSystem -syslibroot "$(xcrun --show-sdk-path)" \
22
+  -install_name @rpath/libfoo.dylib \
23
+  -compatibility_version 1.0 -current_version 1.0.0 \
24
+  -no_uuid -platform_version macos 11.0 14.0 \
25
+  -o libfoo.dylib
26
+```
27
+
28
+### 3. Validation checklist
29
+- `file libfoo.dylib` → `Mach-O 64-bit dynamically linked shared library arm64`.
30
+- `otool -lV libfoo.dylib` shows `LC_ID_DYLIB` with install-name `@rpath/libfoo.dylib`, `current_version = 1.0.0`, `compat_version = 1.0.0`. No `__PAGEZERO`, no `LC_MAIN`.
31
+- Export trie contains `_foo_add` (Fortran name-mangled per armfortas convention, or `bind(C)` if used).
32
+- `dlopen("./libfoo.dylib", RTLD_NOW)` returns non-null.
33
+- `dlsym(h, "foo_add")` returns the function address.
34
+- Calling `foo_add(2, 3)` returns `5`.
35
+
36
+### 4. Differential
37
+Link the same inputs with `ld -dylib` and afs-ld. Compare load commands, export trie contents, indirect symbol table. Tolerated diffs same as Sprint 18.
38
+
39
+### 5. `-rpath` interaction
40
+`caller` is linked against `libfoo.dylib` with `@rpath` indirection. Sprint 19 will wire the full `-rpath` CLI; this sprint validates that an install-name of `@rpath/...` is correctly emitted and that the binary's `DYLD_PRINT_LIBRARIES=1` output shows dyld resolving `@rpath` via the `LC_RPATH` entries of the caller.
41
+
42
+### 6. `dladdr`/`backtrace` in the dylib
43
+When `foo_add` calls into `libarmfortas_rt`, `backtrace_symbols()` should return readable names — proves the symbol table partitioning for a dylib is correct and the Sprint 17 unwind info is wired into dyld's unwinder.
44
+
45
+## Testing Strategy
46
+- `tests/hello_library.rs`: builds and `dlopen`s the dylib, calls `foo_add`, asserts the return.
47
+- `tests/hello_library_nm.rs`: runs `nm -D` on the dylib, asserts `_foo_add` appears as external.
48
+- Differential harness with `ld -dylib` on the same inputs.
49
+
50
+## Definition of Done
51
+- `libfoo.dylib` loads via `dlopen` and exports `_foo_add`.
52
+- Calling the exported function returns the expected value.
53
+- Differential parity with `ld` on the staging fixture.
54
+- `otool -lV` shows correct dylib-specific load commands with no `__PAGEZERO` or `LC_MAIN`.
55
+- Post-sprint audit passes.
.docs/sprints/sprint19.mdadded
@@ -0,0 +1,146 @@
1
+# Sprint 19: CLI Surface + Diagnostics (`-map`, `-why_live`)
2
+
3
+## Prerequisites
4
+Sprints 18–18.5 — executable and dylib milestones reached.
5
+
6
+## Goals
7
+Full `ld`-compatible CLI surface for the flags armfortas already uses and those fortsh is likely to invoke. Includes the two diagnostics surfaces we declared launch-blocking: `-map` (text link map) and `-why_live` (dead-strip reason chain). No polish-tier deferral.
8
+
9
+## Deliverables
10
+
11
+### 1. Full flag list
12
+Recognized:
13
+
14
+**Inputs/outputs**:
15
+- `-o <path>`
16
+- positional `<input>`
17
+- `-l<name>` / `-l <name>`
18
+- `-L <dir>`
19
+- `-framework <name>`
20
+- `-weak_framework <name>`
21
+- `-force_load <archive>`
22
+- `-all_load`
23
+- `-ObjC` (skippable no-op unless inputs have ObjC — they won't from armfortas today)
24
+
25
+**Target & platform**:
26
+- `-arch arm64`
27
+- `-syslibroot <path>`
28
+- `-platform_version macos <min> <sdk>`
29
+
30
+**Output kind**:
31
+- (default) executable
32
+- `-dylib`
33
+- `-r` (relocatable — deferred; errors for now)
34
+- `-bundle` (deferred; errors for now)
35
+
36
+**Entry & startup**:
37
+- `-e <symbol>` (default `_main` for executables)
38
+
39
+**Runtime search paths**:
40
+- `-rpath <path>`
41
+- `-install_name <path>` (dylib only)
42
+- `-compatibility_version <v>` (dylib only)
43
+- `-current_version <v>` (dylib only)
44
+
45
+**Symbol handling**:
46
+- `-undefined <error|warning|suppress|dynamic_lookup>` (default: error)
47
+- `-exported_symbols_list <file>`
48
+- `-unexported_symbols_list <file>`
49
+- `-exported_symbol <sym>`
50
+- `-unexported_symbol <sym>`
51
+- `-x` (strip locals)
52
+- `-S` (strip debug)
53
+
54
+**Layout & output metadata**:
55
+- `-no_uuid`
56
+- `-dead_strip` (gates Sprint 23 pass)
57
+- `-icf=safe` / `-icf=none` (gates Sprint 24 pass)
58
+- `-fixup_chains` / `-no_fixup_chains`
59
+
60
+**Diagnostics**:
61
+- `-map <path>`: emit text link map
62
+- `-why_live <symbol>`: print dead-strip reason chain
63
+- `-t` / `-trace`: print input file paths as they are loaded
64
+- `-v` / `--version`
65
+- `-h` / `--help`
66
+
67
+**Passthrough / compat**:
68
+- `-Wl,<comma-separated>`: normalize into separate flags.
69
+- Unknown flags: error with suggestion (Levenshtein-3 over the list above).
70
+
71
+### 2. `-map <path>` output format
72
+Text file mirroring ld's link map:
73
+```
74
+# Path: <output path>
75
+# Arch: arm64
76
+# Object files:
77
+[  0] linker synthesized
78
+[  1] hello.o
79
+[  2] libarmfortas_rt.a(runtime.o)
80
+...
81
+
82
+# Sections:
83
+# Address          Size         Segment   Section
84
+0x100003f9c        0x00000018   __TEXT    __text
85
+0x100003fb4        0x00000024   __TEXT    __stubs
86
+...
87
+
88
+# Symbols:
89
+# Address          Size         File    Name
90
+0x100003f9c        0x00000014   [  1]   _main
91
+0x100003fb0        0x00000004   [  1]   .alt_entry_of_main
92
+0x100003fb4        0x0000000c   linker  _printf (stub)
93
+...
94
+
95
+# Dead stripped:
96
+<file>               <symbol>
97
+[  2]                _unused_helper
98
+```
99
+
100
+### 3. `-why_live <symbol>` output
101
+Walks the live-edge graph from Sprint 23 backward from the named symbol to a root:
102
+```
103
+_main is live because:
104
+  _main is in -e _main (GC root)
105
+
106
+_afs_write_char is live because:
107
+  _afs_write_char is reachable from _afs_print
108
+  _afs_print is reachable from _main
109
+  _main is in -e _main (GC root)
110
+```
111
+
112
+When used before `-dead_strip` has been applied, the diagnostic explains that `-dead_strip` was not requested. Multiple `-why_live` names allowed.
113
+
114
+### 4. Exported / unexported symbols files
115
+Each line of the file is a symbol name. Wildcards: `*` matches any chars, `?` matches one. Used to adjust the final export trie and to mark symbols `N_PEXT` when `-unexported_symbol` is set. Consumed by Sprint 14's symbol-table construction (which this sprint amends).
116
+
117
+### 5. CLI parser
118
+`afs-ld/src/args.rs`:
119
+- Hand-rolled, no clap.
120
+- Streaming argv scan.
121
+- Error messages cite the flag, the invalid value, and the expected format.
122
+- `-Wl,-map,foo.txt` normalized to `-map foo.txt` before dispatch.
123
+
124
+### 6. `-t` trace output
125
+As each input file is loaded:
126
+```
127
+afs-ld: loading hello.o
128
+afs-ld: loading libarmfortas_rt.a
129
+afs-ld: loading libarmfortas_rt.a(io.o)
130
+afs-ld: loading /usr/lib/libSystem.tbd
131
+```
132
+
133
+## Testing Strategy
134
+- One test per flag: parse the flag, assert `LinkOptions` field set correctly.
135
+- Error-message snapshot tests for every invalid-flag case.
136
+- `-map` differential: produce a map, compare shape (not exact byte) to `ld`'s map on hello-world.
137
+- `-why_live _main` produces a root-only explanation.
138
+- `-why_live <transitively-reachable-sym>` produces a chain.
139
+- `-Wl,-map,foo.txt` parsed identically to `-map foo.txt`.
140
+
141
+## Definition of Done
142
+- Every flag listed above parses and wires correctly.
143
+- `-map` produces human-readable output covering object files, sections, symbols, dead-stripped entries.
144
+- `-why_live` produces a coherent chain on fixtures with dead-strip enabled.
145
+- Unknown-flag errors include a did-you-mean suggestion.
146
+- CLI surface passes a snapshot test against the `--help` output.
.docs/sprints/sprint20.mdadded
@@ -0,0 +1,72 @@
1
+# Sprint 20: Driver Swap
2
+
3
+## Prerequisites
4
+Sprints 18–19 — hello-world works, CLI complete.
5
+
6
+## Goals
7
+Wire afs-ld into the armfortas driver. Initially gated behind `AFS_LD=1`. After the Sprint 27 parity gate, flip the default. Keep a fallback to system `ld` for at least one sprint after default-on.
8
+
9
+## Deliverables
10
+
11
+### 1. Driver change site
12
+`armfortas/src/driver/mod.rs`:
13
+
14
+Two call sites ship today:
15
+- Single-file link path at lines 497–530.
16
+- Multi-file link path at lines 533–565.
17
+
18
+Both build a `Command::new("ld")`. Refactor to:
19
+
20
+```rust
21
+fn linker_command() -> (Command, &'static str) {
22
+    match env::var("AFS_LD").as_deref() {
23
+        Ok("1") | Ok("true") => (Command::new(find_afs_ld()), "afs-ld"),
24
+        _                    => (Command::new("ld"),        "system ld"),
25
+    }
26
+}
27
+```
28
+
29
+`find_afs_ld()`:
30
+1. `AFS_LD_PATH` env var (full path to the binary).
31
+2. `<workspace>/target/debug/afs-ld`.
32
+3. `<workspace>/target/release/afs-ld`.
33
+4. `PATH` lookup.
34
+
35
+Failure produces a clear diagnostic pointing to the env var and build commands.
36
+
37
+### 2. Testing harness update
38
+`armfortas/src/testing.rs:871-908` (used by integration tests) — same refactor. Integration tests respect `AFS_LD=1`.
39
+
40
+### 3. Flag pass-through parity
41
+The driver today builds a fixed command. After this sprint it still builds the same command — afs-ld accepts the same flags. Differential in practice:
42
+
43
+```
44
+# System ld
45
+ld hello.o libarmfortas_rt.a -lSystem -no_uuid -syslibroot <SDK> -e _main -o hello
46
+
47
+# afs-ld (same args)
48
+afs-ld hello.o libarmfortas_rt.a -lSystem -no_uuid -syslibroot <SDK> -e _main -o hello
49
+```
50
+
51
+### 4. Fallback semantics
52
+If afs-ld errors, produce a driver-level diagnostic that cites afs-ld's exit status and stderr, plus a hint to retry with `AFS_LD=0`. Do **not** automatically retry with system `ld` — silently falling back masks real bugs.
53
+
54
+### 5. Test coverage on both paths
55
+`cargo test --workspace` runs all integration tests twice: once with `AFS_LD=0` (baseline), once with `AFS_LD=1`. Divergence is a test failure. This is the CI gate for afs-ld adoption.
56
+
57
+### 6. Preserving `-no_uuid` determinism
58
+Driver passes `-no_uuid` today. Verify afs-ld honors it byte-identically: same inputs under same seed produce the same output (no process-id, no timestamp, no random padding).
59
+
60
+### 7. Docs
61
+Update `armfortas/CLAUDE.md` with a note about `AFS_LD=1` and how to enable/disable. `armfortas/README.md` if/when it mentions linking.
62
+
63
+## Testing Strategy
64
+- `tests/linker_swap.rs`: runs hello-world both ways, asserts the binaries differ only in tolerated regions.
65
+- Integration suite under `AFS_LD=1`: every existing integration test must pass. This is the gate.
66
+- Failure-path test: a deliberately-broken link (missing symbol); both paths produce an error, not a segfault.
67
+
68
+## Definition of Done
69
+- `AFS_LD=1 cargo test --workspace` passes every test green.
70
+- Driver refactor lands on a branch that can be rolled back cleanly by flipping the env-var default.
71
+- Diagnostic quality on afs-ld failures matches or exceeds system ld's.
72
+- No silent fallback — afs-ld failures surface loudly.
.docs/sprints/sprint21.mdadded
@@ -0,0 +1,70 @@
1
+# Sprint 21: Runtime Archive Linking
2
+
3
+## Prerequisites
4
+Sprints 4, 8, 20 — archives, resolution, driver swap.
5
+
6
+## Goals
7
+Link `libarmfortas_rt.a` end-to-end into every armfortas-produced binary. The full parent integration suite runs green under `AFS_LD=1`. This is the sprint that proves afs-ld can do real work on real armfortas output, not just staging fixtures.
8
+
9
+## Deliverables
10
+
11
+### 1. Runtime inventory
12
+Walk `libarmfortas_rt.a` and catalog every exported symbol. Groups:
13
+
14
+- Lifecycle: `_afs_program_init`, `_afs_program_finalize`.
15
+- Array: `_afs_allocate_array`, `_afs_deallocate_array`, `_afs_check_bounds`, `_afs_fill_*`, `_afs_array_add_*`, `_afs_array_mul_*`, `_afs_transpose_*`, `_afs_matmul_*`.
16
+- I/O: `_afs_write_*`, `_afs_read_*`, `_afs_open_file`, `_afs_close_file`, `_afs_flush`, formatted/unformatted/list-directed helpers.
17
+- String: `_afs_string_*` (allocatable, deferred-length variants).
18
+- Math intrinsics: `_afs_i128_*`, `_afs_cmplx_*`, etc.
19
+- System: `_afs_stop`, `_afs_command_argument_*`, `_afs_get_environment`.
20
+
21
+This inventory gets persisted as `tests/runtime_symbols.txt` so the test suite can assert no symbol silently disappears between runtime rebuilds.
22
+
23
+### 2. Archive fetch verification
24
+Verify Sprint 4's archive reader pulls members correctly:
25
+
26
+- Parse `libarmfortas_rt.a`, walk its BSD symbol index.
27
+- For each inventory symbol, look up the defining member.
28
+- Cross-check: `nm` on each member file agrees.
29
+
30
+### 3. End-to-end integration tests
31
+Run the parent `armfortas/tests/` suite under `AFS_LD=1`:
32
+
33
+- `tests/run_programs.rs`: full program tests (array, I/O, derived types, modules).
34
+- `tests/multifile.rs`: multi-object link; module globals resolved correctly.
35
+- `tests/i128_cross_object.rs`: 128-bit integer interop across C/Fortran boundary.
36
+- `tests/fortsh_module_graph.rs`: complex USE chains.
37
+- `tests/incremental.rs`: incremental module dependency tracking.
38
+
39
+Every failure here is a real linker bug — triage and fix.
40
+
41
+### 4. Known gotchas to verify
42
+Based on afs-as + runtime history, pay particular attention to:
43
+
44
+- **`_afs_program_init` lifecycle wrapping**: the driver-synthesized `_main` at `src/driver/mod.rs:371-392` calls `_afs_program_init` → user prog → `_afs_program_finalize`. All three must resolve.
45
+- **I/O state machine in `libarmfortas_rt`**: references `_errno`, `_malloc`, `_free` from libSystem; `_afs_io_state` as a BSS symbol. Verify `__DATA,__bss` placement matches ld.
46
+- **Common symbols**: some module globals come through as common; verify promotion to BSS (Sprint 7 matrix).
47
+- **Weak refs** to optional runtime hooks (if any). Check that unresolved weak refs evaluate to 0 and the call-site null-check dispatches correctly.
48
+
49
+### 5. Archive-ordering edge cases
50
+Some programs pull symbols that create new undefined references in the middle of resolution. Fixed-point loop from Sprint 8 handles this; verify it holds for the runtime archive with its ~40 members.
51
+
52
+### 6. Diagnostic polish
53
+Every runtime symbol that fails to resolve must produce a diagnostic that:
54
+- Names the missing symbol.
55
+- Cites at least one referrer in user code.
56
+- Hints at the rebuild path (`cargo build -p armfortas-rt`).
57
+
58
+### 7. Regression corpus
59
+Any test that once broke becomes a permanent corpus entry. The afs-ld `tests/runtime_*.rs` pattern mirrors armfortas's.
60
+
61
+## Testing Strategy
62
+- Full parent integration suite under `AFS_LD=1` (this is the primary deliverable).
63
+- `tests/runtime_inventory.rs`: assert every symbol in `tests/runtime_symbols.txt` is still defined by the current `libarmfortas_rt.a`.
64
+- Archive fetch coverage: every inventoried symbol pulls its member exactly once.
65
+
66
+## Definition of Done
67
+- `AFS_LD=1 cargo test -p armfortas` green.
68
+- Runtime inventory stable and asserted.
69
+- No silent skips; every test that was passing under `AFS_LD=0` passes under `AFS_LD=1`.
70
+- Diagnostics on missing runtime symbols are actionable.
.docs/sprints/sprint22.mdadded
@@ -0,0 +1,107 @@
1
+# Sprint 22: Code Signature (Ad-Hoc)
2
+
3
+## Prerequisites
4
+Sprints 10, 14 — segment layout and `__LINKEDIT` finalized.
5
+
6
+## Goals
7
+Emit a valid ad-hoc `LC_CODE_SIGNATURE`. On macOS 11+, arm64 binaries without a signature are killed by the kernel at exec time; without this sprint every afs-ld output requires manual `codesign -s -` to run. This is existence-blocking, not optional.
8
+
9
+## Deliverables
10
+
11
+### 1. SuperBlob structure
12
+`LC_CODE_SIGNATURE` points to a code-signing blob in `__LINKEDIT`:
13
+
14
+```
15
+struct CS_SuperBlob {
16
+    u32 magic;           // CSMAGIC_EMBEDDED_SIGNATURE = 0xfade0cc0
17
+    u32 length;          // total blob size including this header
18
+    u32 count;           // number of index entries
19
+    // then count × CS_BlobIndex { u32 type; u32 offset; }
20
+    // then each blob inlined at its offset
21
+}
22
+```
23
+
24
+For ad-hoc: two inner blobs — the CodeDirectory and an empty Requirements set. Entitlements absent.
25
+
26
+### 2. CodeDirectory
27
+```
28
+struct CS_CodeDirectory {
29
+    u32 magic;                  // CSMAGIC_CODEDIRECTORY = 0xfade0c02
30
+    u32 length;
31
+    u32 version;                // 0x20400 (modern)
32
+    u32 flags;                  // CS_ADHOC = 0x2
33
+    u32 hashOffset;             // from this struct's start to the main hash array
34
+    u32 identOffset;            // to the null-terminated identifier
35
+    u32 nSpecialSlots;          // 2 (info plist + requirements); 0 when absent
36
+    u32 nCodeSlots;             // pages × 1
37
+    u32 codeLimit;              // file offset of end-of-signed-data
38
+    u8  hashSize;               // 32 for SHA-256
39
+    u8  hashType;               // CS_HASHTYPE_SHA256 = 2
40
+    u8  platform;               // 0 for no platform binary
41
+    u8  pageSize;               // log2(page) = 12 for 4 KiB
42
+    u32 spare2;
43
+    u32 scatterOffset;          // 0
44
+    u32 teamOffset;             // 0
45
+    u32 spare3;
46
+    u64 codeLimit64;            // 0 unless codeLimit > 4 GiB
47
+    u64 execSegBase;
48
+    u64 execSegLimit;
49
+    u64 execSegFlags;           // CS_EXECSEG_MAIN_BINARY = 0x1 for executables
50
+}
51
+```
52
+
53
+After the struct:
54
+- `identifier\0` — we use the install-name for dylibs, the output binary basename for executables.
55
+- Special slots (filled with zeroes for ad-hoc): `nSpecialSlots` × 32 bytes of zero before the main slots.
56
+- Main slots: one SHA-256 per 4 KiB page of signed data, followed in file order.
57
+
58
+### 3. Signed data range
59
+Signing covers file bytes `[0, codeLimit)`. `codeLimit` is set to `LC_CODE_SIGNATURE.dataoff` (the start of the signature blob itself). The signature never signs itself.
60
+
61
+### 4. Page hashing
62
+- Page size 4 KiB (not 16 KiB — code-signing pageSize is independent of VM page size).
63
+- SHA-256 over each 4 KiB chunk; the final chunk is hashed over whatever bytes remain (not padded).
64
+- Hashes concatenated at `hashOffset`.
65
+
66
+### 5. Requirements blob
67
+```
68
+struct CS_RequirementsBlob {
69
+    u32 magic;    // CSMAGIC_REQUIREMENTS = 0xfade0c01
70
+    u32 length;   // 12
71
+    u32 count;    // 0
72
+}
73
+```
74
+
75
+Minimum legal empty requirements.
76
+
77
+### 6. SHA-256 implementation
78
+Hand-rolled. Standard 64-round SHA-256 from FIPS 180-4. ~200 LoC in Rust. Unit-tested against known vectors (empty string, "abc", "a"×1M, NIST test vectors).
79
+
80
+### 7. Layout recomputation
81
+The signature blob size depends on `codeLimit`, which depends on its own file offset. Two-pass approach:
82
+
83
+1. Compute layout excluding signature; know exactly the signature's start offset.
84
+2. Compute signature size = SuperBlob header + indices + CodeDirectory header + ident + special slots + (ceil(codeLimit / 4096) × 32 bytes hash) + Requirements blob.
85
+3. Reserve that many bytes at the signature offset.
86
+4. Write all other data.
87
+5. Hash pages and write signature in place.
88
+
89
+### 8. Platform binary opt-out
90
+Ad-hoc signatures from third-party tools are not platform binaries; `platform = 0`, `flags = CS_ADHOC`.
91
+
92
+### 9. Validation
93
+After writing, validate with `codesign -v <binary>`. Expected: zero output, exit 0.
94
+
95
+## Testing Strategy
96
+- Sign hello-world, then `./hello` (no manual `codesign` step). Expect "Hello, World!".
97
+- `codesign -dv <binary>` reports `Signature=adhoc`.
98
+- SHA-256 unit tests against NIST vectors.
99
+- Mutate a single byte in the binary post-sign, re-run, expect kernel kill ("Killed: 9") — proves the signature is real and the kernel is checking it.
100
+- Dylib ad-hoc sign: `dlopen` of `libfoo.dylib` from Sprint 18.5 still works.
101
+
102
+## Definition of Done
103
+- `./hello` runs directly (no `codesign -s -` needed).
104
+- `codesign -v` clean on every afs-ld output.
105
+- Dylib loading via `dlopen` works on Sprint 18.5 fixtures.
106
+- SHA-256 passes NIST test vectors.
107
+- Tampering detected by the kernel (confidence check).
.docs/sprints/sprint23.mdadded
@@ -0,0 +1,74 @@
1
+# Sprint 23: Dead Strip (`-dead_strip`)
2
+
3
+## Prerequisites
4
+Sprint 9 — atomization; Sprint 19 — `-dead_strip` CLI flag.
5
+
6
+## Goals
7
+Implement `-dead_strip`: remove atoms that are unreachable from the GC roots. Populates the side table that Sprint 19's `-why_live` diagnostic reads.
8
+
9
+## Deliverables
10
+
11
+### 1. GC roots
12
+Live set seeded from:
13
+- The entry-point symbol's atom (executable only).
14
+- Every exported symbol's atom (governed by `-exported_symbols_list` / defaults).
15
+- Every atom with `NoDeadStrip` flag (from `N_NO_DEAD_STRIP` or `.no_dead_strip`).
16
+- Every atom referenced by a `LC_RPATH` / `LC_LOAD_DYLIB` side-channel (usually none, but keep the hook).
17
+- `_dyld_stub_binder` — always referenced by `__stub_helper`.
18
+- Compact-unwind and eh_frame atoms are **not** roots; they are transitively live via their `parent_of` link to a function atom (Sprint 9).
19
+- Personality functions referenced by any live unwind FDE.
20
+
21
+### 2. Mark-live traversal
22
+Worklist algorithm over the atom reference graph:
23
+
24
+```rust
25
+pub fn mark_live(layout: &mut Layout, roots: &[AtomId]) {
26
+    let mut worklist: Vec<AtomId> = roots.to_vec();
27
+    while let Some(atom_id) = worklist.pop() {
28
+        if layout.atoms[atom_id].live { continue; }
29
+        layout.atoms[atom_id].live = true;
30
+        record_why_live(atom_id, cause);  // for -why_live diagnostic
31
+        for referent in atom_references(&layout.atoms[atom_id]) {
32
+            worklist.push(referent);
33
+        }
34
+    }
35
+}
36
+```
37
+
38
+`atom_references` pulls from the reloc list of the atom, yielding the atom-id each reloc points to.
39
+
40
+### 3. Transitive rules
41
+- A live function makes its `__compact_unwind` record live via the `parent_of` link from Sprint 9.
42
+- A live function makes its `__eh_frame` FDE live (FDE references function via SUBTRACTOR pair — the FDE's life is parasitic on the function's).
43
+- A live personality function is reached via the unwind records that reference it.
44
+- LSDA blobs are live iff their owning function is.
45
+
46
+### 4. Dead atoms purged
47
+After mark-live, walk the atom list; atoms with `live = false` are removed from the output. Output sections shrink accordingly; Sprint 10's layout pass re-runs to compact addresses.
48
+
49
+### 5. `-why_live` side table
50
+Every time an atom is marked live, record its cause: "GC root (entry point)", "reachable from <other atom>", "reachable via unwind parent", etc. Stored as a `HashMap<AtomId, LiveCause>`. Sprint 19's `-why_live` walks back from a named symbol through this map to a root.
51
+
52
+### 6. Interactions with ICF (Sprint 24)
53
+Dead-strip runs before ICF. ICF folds only live atoms. Dead atoms never get folded.
54
+
55
+### 7. Stripped-symbol enumeration
56
+The `-map` output (Sprint 19) lists dead-stripped symbols. Populate this list from the purged atoms.
57
+
58
+### 8. Default behavior
59
+`-dead_strip` is opt-in. Without it, no GC runs, no `-why_live` data is produced (the diagnostic explains this).
60
+
61
+## Testing Strategy
62
+- Fixture: two functions, one called by `_main`, one unreferenced. With `-dead_strip`: output's `__text` shrinks, `nm -n` lists only the called function.
63
+- `-why_live _main` on the fixture: "root".
64
+- `-why_live _called_fn`: "reachable from _main; _main is root".
65
+- `-why_live _unreferenced_fn`: "not live (dead-stripped)".
66
+- `N_NO_DEAD_STRIP` fixture: that symbol survives even without a reference.
67
+- Weak def reference: weak-coalesced winner survives, losers dead-stripped.
68
+- Differential: output's `__text` length matches `ld -dead_strip` on 10+ fixtures.
69
+
70
+## Definition of Done
71
+- Live set matches `ld -dead_strip` on a corpus of 15+ fixtures.
72
+- `-why_live` produces coherent chains.
73
+- Compact-unwind and eh_frame entries correctly follow their parent functions.
74
+- fortsh builds under `-dead_strip` (Sprint 29 retest).
.docs/sprints/sprint24.mdadded
@@ -0,0 +1,65 @@
1
+# Sprint 24: ICF (`-icf=safe`)
2
+
3
+## Prerequisites
4
+Sprints 9, 23 — atoms, dead-strip done.
5
+
6
+## Goals
7
+Identical Code Folding: merge atoms whose content, relocations, and attributes are identical, so one copy survives. Safe mode only — respects address-taken symbols so function-pointer equality is preserved.
8
+
9
+## Deliverables
10
+
11
+### 1. Safety rules
12
+
13
+Atom is **foldable** iff:
14
+- It lives in `__TEXT,__text`, `__TEXT,__cstring`, `__TEXT,__literal16`, `__TEXT,__const`, or `__DATA_CONST,__const`.
15
+- It has no `NoDeadStrip` flag.
16
+- It has no `AddressTaken` annotation (see below).
17
+- It is not the primary symbol of its atom under `-exported_symbols_list`.
18
+
19
+### 2. Address-taken detection
20
+
21
+During reloc application (Sprint 11 + this sprint's prep pass), mark any atom referenced via `Unsigned` or `PointerToGot` or any `GOT_LOAD_*` as `AddressTaken`. Function pointers, vtable slots, RTTI entries, anything comparable via `==` — all take the address, so they must not be folded.
22
+
23
+Only `Branch26` references are considered "fold-safe" on their own; direct calls don't create address equality.
24
+
25
+### 3. Segregation algorithm
26
+
27
+Two-phase refinement (lld/MachO style):
28
+
29
+1. **Initial hash**: for each foldable atom, compute a 64-bit hash over (size, flags, content bytes, reloc list normalized to (kind, addend, target class)).
30
+2. **Bucket**: group atoms by hash. Single-element buckets are unique and not foldable.
31
+3. **Refine**: within each bucket, check pairwise equality of content + relocs. Use equivalence classes — two atoms equivalent iff every referenced atom is equivalent (fixed-point refinement, à la Hopcroft's DFA minimization).
32
+4. **Repeat step 3 until no class splits**. Converges in log(N) iterations in practice.
33
+5. **Fold**: within each final equivalence class, pick a winner by input command-line order (stable, deterministic); rewrite every other atom's `owner_symbol` redirect to the winner; erase the loser atoms.
34
+
35
+### 4. Reloc patching
36
+
37
+After folding, some relocations now point at folded-away atoms. Rewrite every such reloc to target the winner. Layout recomputed; sizes shrink.
38
+
39
+### 5. `-icf=none` path
40
+
41
+Default off. `-icf=safe` enables. `-icf=all` (unsafe, doesn't respect AddressTaken) is not implemented — emits a diagnostic.
42
+
43
+### 6. Interaction with `-map` and `-why_live`
44
+
45
+- `-map` reports folded symbols with their winner: `_foo folded to _bar`.
46
+- `-why_live` of a folded symbol reports the winner's live-chain.
47
+
48
+### 7. Determinism
49
+
50
+Winner selection by command-line order is stable across invocations. Hash function is seeded with a fixed seed — same inputs, same winners, same output bytes.
51
+
52
+## Testing Strategy
53
+
54
+- Fixture: two functions with byte-identical code and relocs. `-icf=safe` folds them into one; `nm` shows only one entry, both symbol aliases point at the same address.
55
+- Address-taken fixture: two identical functions, one has its address stored in a table. Fold does not happen for the address-taken one; both survive.
56
+- String dedup: `__cstring` atoms with identical content folded.
57
+- Large-scale: 50+ near-identical functions, fold correctness verified by running each alias.
58
+- Differential: folded output's `__text` size matches `ld -icf=safe` within 1%.
59
+
60
+## Definition of Done
61
+
62
+- `-icf=safe` reduces text size on a constructed fixture.
63
+- Address-taken functions never folded.
64
+- All aliases point to the folded winner at runtime.
65
+- fortsh builds with `-icf=safe` and behaves identically to its `-icf=none` counterpart.
.docs/sprints/sprint25.mdadded
@@ -0,0 +1,74 @@
1
+# Sprint 25: LOH Relaxation
2
+
3
+## Prerequisites
4
+Sprints 1, 11 — LOH hints preserved, reloc application in place.
5
+
6
+## Goals
7
+Apply Linker Optimization Hints that afs-as emits. LOHs describe safe peephole opportunities — replace ADRP+ADD with a single ADR when the target is in ±1 MB, or nop-out an unnecessary LDR. Until this sprint, LOHs are preserved as-is and no relaxation happens.
8
+
9
+## Deliverables
10
+
11
+### 1. LOH kinds afs-as emits
12
+
13
+From the existing `.loh` directives:
14
+- `AdrpAdd`: ADRP + ADD (compute address). If target is in ±1 MB, replace with ADR + nop.
15
+- `AdrpLdr`: ADRP + LDR (load through page/pageoff). If target is in ±1 MB and aligned, can become ADR + LDR (using the LDR's literal form).
16
+- `AdrpLdrGot`: ADRP + LDR from GOT. If the GOT entry's content is a local symbol with known address, can skip the GOT load entirely (ADR + nop).
17
+- `AdrpLdrGotLdr`: ADRP + LDR (GOT) + LDR (final). Similar combo: if GOT can be skipped, fold into a direct LDR.
18
+
19
+### 2. LOH data format
20
+
21
+`LC_LINKER_OPTIMIZATION_HINT` points to a ULEB128 stream:
22
+```
23
+uleb128 kind
24
+uleb128 argcount
25
+uleb128 arg1  // file offset
26
+uleb128 arg2
27
+...
28
+```
29
+
30
+Kind constants: `LOH_ARM64_ADRP_ADRP=1`, `LOH_ARM64_ADRP_LDR=2`, `LOH_ARM64_ADRP_ADD_LDR=3`, `LOH_ARM64_ADRP_LDR_GOT_LDR=4`, `LOH_ARM64_ADRP_ADD_STR=5`, `LOH_ARM64_ADRP_LDR_GOT_STR=6`, `LOH_ARM64_ADRP_ADD=7`, `LOH_ARM64_ADRP_LDR_GOT=8`.
31
+
32
+afs-as emits kinds 3, 7, 8 (and 4 for load-from-pointer-in-GOT).
33
+
34
+### 3. Relaxation pass
35
+
36
+Runs **after** reloc application (Sprint 11) and **before** LOH re-serialization. For each LOH:
37
+
38
+1. Parse the referenced instructions.
39
+2. Compute if the symbolic target fits the tighter encoding.
40
+3. If yes: rewrite the instruction bytes; mark the LOH as "applied" so it can be either dropped or left in the output (ld's convention varies; we match).
41
+4. If no: leave untouched.
42
+
43
+Safety: every relaxation is reversible (the original instructions still achieve the goal), and no relaxation narrows a correctly-wider encoding to an incorrect one. Extensive testing required.
44
+
45
+### 4. Safe conservatism
46
+
47
+A LOH is only applied when the target fits **strictly** within the narrower range. Off-by-one guard: recompute both original and relaxed forms, assert the relaxed form computes the same address.
48
+
49
+### 5. Cross-LOH interaction
50
+
51
+A single instruction can participate in multiple LOHs (one as a member of an ADRP+ADD, another as a member of an ADRP+ADD+LDR). Apply LOHs in a deterministic order — longest first — and skip any LOH whose instructions have already been rewritten.
52
+
53
+### 6. Output LOH preservation
54
+
55
+ld emits `LC_LINKER_OPTIMIZATION_HINT` in the output even for executables (for the benefit of post-processing tools). We match: emit a new LOH blob with the final state (applied LOHs marked or omitted).
56
+
57
+### 7. `-no_loh` flag
58
+
59
+For debugging: `-no_loh` skips relaxation. Helpful when comparing output against a known-bad state.
60
+
61
+## Testing Strategy
62
+
63
+- Synthetic fixture with a function whose `__data` target is 1 MB away → AdrpAdd LOH applies; disassembly shows ADR + nop.
64
+- Fixture with a target 10 MB away → LOH does not apply; ADRP + ADD preserved.
65
+- Differential: afs-ld output byte-matches `ld` output for both fixtures.
66
+- Runtime test: the relaxed code still dereferences the right address.
67
+- Random-fuzz: 100 fixtures with various target distances; every relaxation verified against recomputed ground truth.
68
+
69
+## Definition of Done
70
+
71
+- LOH relaxation applied correctly on fixtures that fit.
72
+- LOH skipped correctly on fixtures that don't.
73
+- Byte-parity with `ld` on a representative corpus.
74
+- `-no_loh` flag produces a cleanly un-relaxed output.
.docs/sprints/sprint26.mdadded
@@ -0,0 +1,87 @@
1
+# Sprint 26: Thunks for Out-of-Range Branches
2
+
3
+## Prerequisites
4
+Sprint 11 — BRANCH26 reloc application; Sprint 10 — layout pass.
5
+
6
+## Goals
7
+When a `BRANCH26` target is more than ±128 MiB from the caller, insert a branch island that can reach any 32-bit-aligned address via an ADRP + BR sequence. Required for very large executables; fortsh is not that large today, but a statically-linked Fortran program with full-intrinsic binding could be.
8
+
9
+## Deliverables
10
+
11
+### 1. Detection pass
12
+
13
+After layout (Sprint 10) assigns addresses, walk every BRANCH26 reloc. Compute `distance = (target - P) >> 2`. If `|distance| > 0x0200_0000` (that's 2^25 = 33,554,432 × 4 bytes = 128 MiB), flag the reloc as needing a thunk.
14
+
15
+### 2. Thunk synthesis
16
+
17
+One thunk atom per (output segment, distant target). A thunk is 12 bytes:
18
+
19
+```
20
+ADRP x16, <target>@PAGE
21
+ADD  x16, x16, <target>@PAGEOFF
22
+BR   x16
23
+```
24
+
25
+Or, if the target is a Defined with a known value at link time:
26
+
27
+```
28
+ADRP x16, #<computed_page>
29
+ADD  x16, x16, #<pageoff>
30
+BR   x16
31
+```
32
+
33
+The ADRP+ADD form reaches anywhere in the process's 4 GiB virtual range (actually ±4 GiB, plenty).
34
+
35
+### 3. Placement
36
+
37
+Thunks land in `__TEXT,__thunks`, a new synthetic section placed between `__text` and `__stubs`. Placement must be within ±128 MiB of every call site that uses it — for very large binaries, multiple thunk islands may be needed.
38
+
39
+Algorithm:
40
+1. Run layout once.
41
+2. Detect overflow sites.
42
+3. Insert thunks near the caller cluster.
43
+4. Re-run layout (sizes changed).
44
+5. Re-check overflow — repeat until stable.
45
+
46
+Termination: adding a thunk can only make addresses shift by up to 12 bytes per thunk; overflow is a global property that converges rapidly.
47
+
48
+### 4. Thunk sharing
49
+
50
+Multiple callers to the same out-of-range target share one thunk. Keyed by `(output_section, target_atom_id)`.
51
+
52
+### 5. Reloc rewrite
53
+
54
+Each thunked BRANCH26 reloc gets rewritten to point at the thunk atom instead of the original target. Thunk atom's BR then reaches the real target via ADRP+ADD.
55
+
56
+### 6. Interaction with `-dead_strip` and ICF
57
+
58
+- Thunks are dead-stripped if no live caller remains.
59
+- Thunks are never ICF candidates (they have unique target addresses).
60
+- A dead-stripped target invalidates its thunk(s); easy since we generate thunks after dead-strip.
61
+
62
+### 7. `-thunks <none|safe|normal>`
63
+
64
+- `-thunks=none`: overflow is a hard error (default for small programs to catch bugs).
65
+- `-thunks=safe` (default on large programs): thunks inserted when needed.
66
+- `-thunks=all`: thunks inserted for every BRANCH26 (for testing).
67
+
68
+Sprint 19 CLI wires these; this sprint implements the behavior.
69
+
70
+### 8. Regression: small programs don't grow
71
+
72
+Default is `-thunks=safe`. Programs that don't need thunks emit no `__thunks` section and are byte-identical to the pre-sprint output.
73
+
74
+## Testing Strategy
75
+
76
+- Synthetic: compile a source that produces >128 MiB of code (requires artificially padding `.o` files, or using a large constant array in `__text`). Verify thunks inserted.
77
+- Every thunk target reachable from its caller cluster.
78
+- Runtime: the large binary's entry point actually runs and calls through thunks without crashing.
79
+- `-thunks=none` + overflow: produces a clear error citing the caller and target.
80
+- Small-program regression: fortsh output size unchanged vs pre-Sprint-26 (no thunks inserted).
81
+
82
+## Definition of Done
83
+
84
+- Thunks correctly inserted for out-of-range BRANCH26 on large fixtures.
85
+- Layout fixed-point converges rapidly.
86
+- Small programs unchanged.
87
+- `-thunks` CLI matrix all wired.
.docs/sprints/sprint27.mdadded
@@ -0,0 +1,113 @@
1
+# Sprint 27: Differential Harness vs Apple ld
2
+
3
+## Prerequisites
4
+All prior sprints, especially 18, 18.5, 21 — end-to-end milestones.
5
+
6
+## Goals
7
+Industrial-strength parity harness. Automated byte-level comparison of afs-ld output against `ld` across a curated corpus. Explicit tolerance lists, regression-gated CI. This is the Sprint 20 default-swap gate: afs-ld becomes the armfortas default only after this sprint's corpus is green.
8
+
9
+## Deliverables
10
+
11
+### 1. Corpus
12
+
13
+`tests/parity_corpus/` contains 50+ link scenarios, each a small test directory with:
14
+- `inputs/` (the `.o`, `.a`, `.tbd` files).
15
+- `args.txt` (the afs-ld / `ld` command-line).
16
+- `notes.md` (what this exercises).
17
+
18
+Scenarios cover:
19
+- Hello-world variants (classic vs chained, with/without `-dead_strip`, with/without `-icf`).
20
+- Every relocation type in isolation.
21
+- GOT and stub exercises.
22
+- TLV exercises.
23
+- Weak-def coalescing.
24
+- Common-symbol promotion.
25
+- Multi-archive resolution with order dependence.
26
+- Dylib-with-reexport chain.
27
+- LSystem links with real system SDK.
28
+- `libarmfortas_rt.a` + a 3-function Fortran program.
29
+
30
+### 2. Diff dimensions
31
+
32
+For each scenario, compare:
33
+- Load commands: count, order, contents (with tolerated-diff for UUID/timestamp).
34
+- Segment sizes and file offsets.
35
+- Section bytes (byte-level equality after reloc application).
36
+- Symbol table: same nlist entries in the same partition order.
37
+- String table: same content (byte-level is ideal, length within 5% is tolerated for suffix-dedup variation).
38
+- `LC_DYLD_INFO_ONLY` opcode streams (classic) or `LC_DYLD_CHAINED_FIXUPS` chains (chained).
39
+- Export trie walk equivalence (may differ in byte layout but must export the same names with the same flags and addresses).
40
+- `__unwind_info` byte-level.
41
+- Code signature: ignored in diff (ld signs with sha256 hashes over its output's bytes; we sign over ours; different bytes, different hashes — expected).
42
+
43
+### 3. Tolerated-diff rules
44
+
45
+```rust
46
+pub enum ToleratedDiff {
47
+    UuidBytes,
48
+    Timestamp,
49
+    PathHashInString(&'static str),      // e.g. temp path in stabs
50
+    StringTableSuffixDedupVariance,
51
+    CodeSignatureHashes,
52
+}
53
+```
54
+
55
+Each tolerance has a precise predicate — no loose "any byte in __LINKEDIT". Unknown diffs fail.
56
+
57
+### 4. Harness structure
58
+
59
+`afs-ld/tests/parity_matrix.rs` walks `tests/parity_corpus/` and runs each scenario:
60
+
61
+```rust
62
+#[test]
63
+fn parity_corpus() {
64
+    for case in load_corpus("tests/parity_corpus/") {
65
+        let ours = link_with_afs_ld(&case).unwrap();
66
+        let theirs = link_with_system_ld(&case).unwrap();
67
+        let diffs = diff_macho(&ours, &theirs);
68
+        let critical: Vec<_> = diffs.into_iter().filter(|d| !is_tolerated(d)).collect();
69
+        assert!(critical.is_empty(),
70
+            "{}: {} critical diff(s):\n{:#?}", case.name, critical.len(), critical);
71
+    }
72
+}
73
+```
74
+
75
+### 5. CI gating
76
+
77
+GitHub Actions job runs on every PR:
78
+- `cargo test --test parity_matrix` green.
79
+- Artifact uploaded: per-scenario HTML diff viewer for debugging.
80
+- A failing scenario blocks merge.
81
+
82
+### 6. Per-scenario allowed-diff annotation
83
+
84
+Some scenarios might have legitimate small differences we don't want to suppress globally. Each scenario's `notes.md` can declare case-specific tolerances:
85
+
86
+```yaml
87
+tolerated:
88
+  - region: __LINKEDIT  bytes 0x1000-0x1010  reason: "ld emits padding here"
89
+```
90
+
91
+Use sparingly; each tolerance must be justified and date-stamped.
92
+
93
+### 7. Runtime parity
94
+
95
+Beyond byte-level: each scenario that produces a runnable executable is also executed; stdout, stderr, and exit code must match between the two linked binaries.
96
+
97
+### 8. Parity budget
98
+
99
+The goal is **zero** critical diffs across the corpus. Sprint 27 is not done until the harness is fully green; if a diff can't be resolved within the sprint, it must be filed as a bug blocking default-swap in Sprint 20.
100
+
101
+## Testing Strategy
102
+
103
+- `cargo test --test parity_matrix` green.
104
+- Intentional-regression: mutate one byte in afs-ld's writer, confirm the harness catches it.
105
+- Scale test: full corpus runs in <2 minutes on a reasonable machine (gates Sprint 28 perf work).
106
+
107
+## Definition of Done
108
+
109
+- 50+ corpus scenarios all pass with zero critical diffs.
110
+- CI-enforced.
111
+- Every tolerated-diff category has a justification and a test that proves it triggers.
112
+- Intentional-regression canary detects any change outside the allowlist.
113
+- Sprint 20's default-swap is unblocked.
.docs/sprints/sprint28.mdadded
@@ -0,0 +1,86 @@
1
+# Sprint 28: Performance & Parallelism
2
+
3
+## Prerequisites
4
+Sprint 27 — correctness gate in place; can freely refactor for speed.
5
+
6
+## Goals
7
+Make afs-ld fast enough to feel like a production tool. Target: within 2× of Apple `ld`'s wall time on the fortsh link. Mold demonstrates linkers can be very fast; we don't need mold's speed, but we need to not be painful.
8
+
9
+## Deliverables
10
+
11
+### 1. Baseline profile
12
+
13
+Profile the fortsh link (Sprint 29 produces the fixture). Categorize wall time:
14
+
15
+- Input parsing (Mach-O headers, sections, symbols, relocations).
16
+- Symbol resolution (hash-map probes, archive lookups).
17
+- Atomization.
18
+- Layout.
19
+- Reloc application.
20
+- Synth sections (`__unwind_info` is often a hotspot).
21
+- Writing output.
22
+- Code signature hashing.
23
+
24
+Identify the biggest bucket; optimize there first.
25
+
26
+### 2. Parallel input parsing
27
+
28
+Parse each `.o` in a separate worker thread; results collected into the symbol table after all parsing completes. Archive member parsing also parallel. Uses std's `thread::scope` — no external crates. Parallelism bounded by `std::thread::available_parallelism()`.
29
+
30
+### 3. Parallel reloc application
31
+
32
+Each atom's relocs are independent. Process per-atom in parallel; the output buffer is preallocated and each atom writes to a disjoint slice.
33
+
34
+### 4. Parallel SHA-256 for code signing
35
+
36
+One thread per 4 KiB page. SHA-256 is inherently sequential within a page but trivially parallel across pages. Drop-in speedup for large binaries.
37
+
38
+### 5. Bump allocator for ephemeral data
39
+
40
+Parser produces many small allocations (strings, reloc lists, atom descriptors). A per-input arena avoids fragmentation and makes bulk drop free. Implement as `src/arena.rs` — a std-only `Vec<Box<[u8]>>` chunker.
41
+
42
+### 6. mmap for large inputs
43
+
44
+`std::fs::File` + `memmap2`? No — memmap2 is an external crate. Use `libc::mmap` via an unsafe `src/mmap.rs` wrapper. Input files are always read-only; mmap saves a read syscall and lets us share parse state across threads cheaply. Fall back to `fs::read` for GNU-thin archive members whose external path doesn't mmap cleanly (rare).
45
+
46
+### 7. Symbol-table hash map
47
+
48
+Profile shows std `HashMap` is fine for our scale. If not: replace with an open-addressing table keyed by `Istr` (handle-equality), linear probing, power-of-2 capacity. ~100 LoC.
49
+
50
+### 8. String interner
51
+
52
+Single global `StringInterner` shared across inputs. Interning cost: one hash lookup per name. Optimize by batching per-input: each input parses its strings into a local table, then merges into the global interner in one pass.
53
+
54
+### 9. No-alloc hot paths
55
+
56
+Reloc application and chain construction should not allocate per-reloc. Preallocated scratch buffers, reused across the relocation pass.
57
+
58
+### 10. Benchmarks
59
+
60
+`afs-ld/bench/` (or a `#[bench]` behind `cargo +nightly bench`) with:
61
+- `bench_hello_world`: small, measures startup overhead.
62
+- `bench_runtime_link`: mid, measures symbol-table & reloc-apply.
63
+- `bench_fortsh_link`: large, measures end-to-end throughput.
64
+
65
+Budget targets:
66
+- hello-world: ≤ 20 ms.
67
+- runtime link: ≤ 150 ms.
68
+- fortsh link: ≤ 2× Apple `ld`'s wall time on the same machine.
69
+
70
+### 11. Determinism preserved
71
+
72
+Parallelism must not reorder output. Each worker produces a deterministic result; join order is fixed; sorts are stable. A parallel and sequential run must produce byte-identical outputs.
73
+
74
+## Testing Strategy
75
+
76
+- Benchmarks land as regression gates: nightly CI records throughput; > 10% regression fails.
77
+- Determinism: 100 parallel runs of the same input, assert byte-identical output every time.
78
+- Sprint 27 parity must remain green — no correctness regression.
79
+- Single-threaded fallback (`-j 1`) for debugging.
80
+
81
+## Definition of Done
82
+
83
+- fortsh link wall time within 2× of `ld`'s.
84
+- All Sprint 27 scenarios still byte-identical.
85
+- Determinism bulletproof across parallelism.
86
+- No external dependencies added.
.docs/sprints/sprint29.mdadded
@@ -0,0 +1,91 @@
1
+# Sprint 29: fortsh Link Audit
2
+
3
+## Prerequisites
4
+Sprints 18–28 — every functional sprint, parity gate, performance tuning.
5
+
6
+## Goals
7
+End-to-end link of fortsh under afs-ld. fortsh is ~57 KLoC Fortran 2018, 55 modules, heavy `iso_c_binding`, allocatable strings, derived types. Linking it is the first real-world stress test of everything before this sprint. Fix what breaks. No excuses.
8
+
9
+## Deliverables
10
+
11
+### 1. fortsh build pipeline under afs-ld
12
+
13
+```
14
+cd fortsh
15
+AFS_LD=1 armfortas --std=f2018 -O2 <all sources> -o fortsh
16
+```
17
+
18
+Expected:
19
+- Build succeeds.
20
+- `./fortsh --version` prints the expected version string.
21
+- Interactive mode starts and reads a line.
22
+- `./fortsh -c "echo hello"` prints "hello".
23
+
24
+### 2. Failure taxonomy
25
+
26
+Anticipated categories (adjust during sprint based on what actually breaks):
27
+
28
+- **Symbol resolution**: missing runtime symbols, weak-coalesce wrong winner, common-size mismatches.
29
+- **Relocation math**: PAGE21/PAGEOFF12 miscomputation on specific offsets, SUBTRACTOR pair issues in eh_frame.
30
+- **TLV**: thread-local I/O state failing at runtime.
31
+- **Unwind**: backtrace on crash produces garbage.
32
+- **Dead-strip**: functions stripped that were live (or live that should have been stripped).
33
+- **Chained fixups**: a chain crossing a page boundary or containing a bad `next` offset.
34
+- **Code signature**: kernel-kill on exec.
35
+
36
+Each class has a known file/function starting point for triage from earlier sprints.
37
+
38
+### 3. Audit process
39
+
40
+Same rules as armfortas audits (`armfortas/CLAUDE.md`):
41
+
42
+- **Assume nothing works until proven otherwise.** Every subsystem gets exercised by some fortsh code path.
43
+- **Stubs and placeholders are synonyms for broken.** If fortsh passes a case only because of a hand-patched workaround, the sprint isn't done.
44
+- **Wrong output is worse than crashes.** A fortsh that "runs" but produces wrong answers is a critical failure.
45
+- **Don't soften findings.** "Major" = wrong answers. "Critical" = silent corruption.
46
+- **Fix now unless it genuinely requires a later sprint.**
47
+
48
+Every bug becomes a regression test in `tests/parity_corpus/fortsh_*/`.
49
+
50
+### 4. Runtime behavior matrix
51
+
52
+Curated list of fortsh scenarios run under the afs-ld-linked binary:
53
+
54
+- Interactive `echo`, `cat`, `ls` (builtins).
55
+- Pipe: `echo hello | cat`.
56
+- Redirect: `echo hello > /tmp/f`.
57
+- Variables: `x=1; echo $x`.
58
+- Scripts: `./fortsh script.fsh`.
59
+- Error paths: `nonexistent_command` returns non-zero.
60
+- `iso_c_binding` calls into libc from Fortran.
61
+- Allocatable string assignment: `s = s // "more"`.
62
+- Derived-type shell_state_t access.
63
+
64
+Every item green, or the sprint isn't done.
65
+
66
+### 5. Differential vs system-ld-linked fortsh
67
+
68
+Same fortsh source, linked by both. Runtime behavior **must** match for every scenario in §4. Binary size within 5%. Load-command shape equivalent. Output byte-by-byte for the parts our Sprint 27 rules cover.
69
+
70
+### 6. Perf check
71
+
72
+Link time for fortsh under afs-ld is within Sprint 28's 2× budget.
73
+
74
+### 7. Audit report
75
+
76
+`.docs/audits/sprint29_fortsh.md` (or wherever the project convention puts audit reports, parallel to armfortas's audit structure): a brutally honest writeup of what worked, what broke, what was fixed, what remains. No soft-pedaling.
77
+
78
+## Testing Strategy
79
+
80
+- Full fortsh test suite executed under both linker paths.
81
+- Each scenario in §4 scripted as a parity test.
82
+- Perf budget asserted.
83
+- Memory usage at link time within reason (< 1 GiB on fortsh).
84
+
85
+## Definition of Done
86
+
87
+- fortsh links under afs-ld.
88
+- Every scenario in §4 passes.
89
+- All fortsh integration tests pass.
90
+- Differential with ld-linked fortsh matches on every runtime scenario.
91
+- Audit report filed; no open critical items.
.docs/sprints/sprint30.mdadded
@@ -0,0 +1,89 @@
1
+# Sprint 30: Diagnostics & Polish
2
+
3
+## Prerequisites
4
+Sprints 19 (`-map`, `-why_live`), 29 (fortsh audit informs diagnostic quality).
5
+
6
+## Goals
7
+Raise every diagnostic surface to afs-as's caret-under-line standard. Polish `--help`, `--version`, `-t`, error recovery in binary parsers. Ship-quality UX for linker errors.
8
+
9
+## Deliverables
10
+
11
+### 1. Binary-input diagnostics
12
+
13
+Every parser error cites the input file, the byte offset, and a caret pointing at the offending region in a hex dump:
14
+
15
+```
16
+afs-ld: error: in input.o at byte 0x1a4: LC_SEGMENT_64 claims nsects=3 but cmdsize fits only 2 section headers
17
+
18
+  0x1a0: 00 01 00 00 48 02 00 00 03 00 00 00 00 00 00 00
19
+                                 ^^
20
+  cmdsize=0x248 accommodates 2 × 80-byte section_64 entries; nsects=3 needs 240+72=312 bytes.
21
+```
22
+
23
+Implemented in `src/diag.rs` with helpers for "byte offset → nearest load command, section, atom, symbol" so every error can contextualize itself.
24
+
25
+### 2. Source-level backmapping
26
+
27
+When diagnosing a reloc error, point at the originating `.s` line if the object's symbol table includes debug info (afs-as emits no debug info today; this is a forward-compatible hook). Otherwise, map to the offending atom's symbol name and input file.
28
+
29
+### 3. Did-you-mean everywhere
30
+
31
+- Undefined symbol (Sprint 8).
32
+- Unknown flag (Sprint 19).
33
+- Missing library (`-lFoo` → did you mean `-lfoo`?).
34
+- Mistyped architecture (`-arch arm86` → did you mean `arm64`?).
35
+
36
+Levenshtein-3, capped at the 10 closest matches.
37
+
38
+### 4. Colorized output
39
+
40
+ANSI color codes on TTY stderr. Flagged off under `NO_COLOR` env var and under `--color=never`. Matches afs-as's approach in `afs-as/src/diag*.rs`.
41
+
42
+### 5. Verbose and trace modes
43
+
44
+- `-v`: version + target + active flag summary.
45
+- `-t` / `--trace`: every input file logged as it's loaded.
46
+- `-verbose_deprecation`: warnings for deprecated flags (we accept them for `ld` compatibility but note they're deprecated).
47
+
48
+### 6. `--help` format
49
+
50
+Mirrors `ld`'s. Sections: inputs, outputs, symbols, diagnostics, platform. Each flag has a one-line description and (where applicable) a default value. Fits on 80 columns, readable.
51
+
52
+### 7. `--version` format
53
+
54
+```
55
+afs-ld <version>
56
+Bespoke ARM64 Mach-O linker for armfortas
57
+Target: arm64-apple-macos
58
+Commit: <git hash>
59
+```
60
+
61
+### 8. Deterministic stderr
62
+
63
+Error output is the same for the same input across runs. No wall clock, no pid, no thread-id. Supports scripted diffing in CI.
64
+
65
+### 9. Error-code conventions
66
+
67
+Exit codes:
68
+- 0: success.
69
+- 1: link failure (undefined symbol, ambiguous resolution, etc.).
70
+- 2: CLI misuse (bad flag, missing required arg).
71
+- 64–78: BSD `<sysexits.h>` codes where they fit (EX_USAGE=64, EX_DATAERR=65, EX_NOINPUT=66, EX_UNAVAILABLE=69, EX_SOFTWARE=70).
72
+
73
+### 10. Regression: no diagnostic regresses below afs-as quality
74
+
75
+Every diagnostic in afs-ld should be at least as useful as the closest afs-as analog. Cross-checked in an audit.
76
+
77
+## Testing Strategy
78
+
79
+- Snapshot tests for every major error category: undefined symbol, duplicate symbol, missing library, bad flag, malformed input. Compare against a stored expected-output; diff the text (modulo terminal width and colors).
80
+- `--help` and `--version` snapshot tests.
81
+- TTY detection test via a pty harness (or a manual verification step).
82
+
83
+## Definition of Done
84
+
85
+- Every error category has a snapshot test that matches a stored golden.
86
+- Did-you-mean fires on the five categories listed.
87
+- `--help` fits 80 cols and is scannable.
88
+- Color on TTY, off under `NO_COLOR` / `--color=never`.
89
+- Exit codes follow the convention.
.docs/sprints/sprint31.mdadded
@@ -0,0 +1,101 @@
1
+# Sprint 31: Final Audit
2
+
3
+## Prerequisites
4
+Every prior sprint.
5
+
6
+## Goals
7
+The last line of defense before afs-ld is declared the armfortas default linker permanently (i.e., Sprint 20's env-var fallback is removed). Brutally honest audit of every subsystem. Regressions caught, gaps documented, decisions defended.
8
+
9
+## Deliverables
10
+
11
+### 1. Parity corpus green
12
+
13
+Sprint 27's `tests/parity_corpus/` fully green, plus every fortsh-derived scenario added in Sprint 29. No tolerated-diff entries added since Sprint 27 without audit-committee (i.e., user) sign-off.
14
+
15
+### 2. Determinism sweep
16
+
17
+Link every corpus scenario 10 times under parallelism. All 10 outputs byte-identical. Record the hash.
18
+
19
+### 3. Spec conformance survey
20
+
21
+Walk the Mach-O, Apple Mach-O ABI, and arm64 AAPCS64 specs section by section. For each feature used by armfortas or fortsh, confirm afs-ld implements it correctly. Checklist:
22
+
23
+- Header & magic.
24
+- Load command set.
25
+- Segment/section flags.
26
+- Every relocation type in `<mach-o/arm64/reloc.h>`.
27
+- Symbol types in `<mach-o/nlist.h>`.
28
+- `LC_DYLD_INFO_ONLY` opcode set.
29
+- `LC_DYLD_CHAINED_FIXUPS` format.
30
+- Export trie terminal formats.
31
+- `__unwind_info` layout.
32
+- Compact unwind encoding.
33
+- Code signature SuperBlob.
34
+
35
+For each, cite the afs-ld file/function that implements it. Gaps documented in `.docs/audits/sprint31_final.md`.
36
+
37
+### 4. CLI parity survey
38
+
39
+Every `ld` flag that armfortas or fortsh passes must be supported. Cross-check against:
40
+- `armfortas/src/driver/mod.rs` linker-invocation call sites.
41
+- `fortsh` CMake / build-system linker flags (consult the project).
42
+- The set listed in Sprint 19.
43
+
44
+Any flag in the "passes but no-op" category audited for silent misbehavior.
45
+
46
+### 5. Binary size audit
47
+
48
+Compare total output size (afs-ld vs `ld`) on:
49
+- hello-world.
50
+- libarmfortas_rt-linked Fortran program.
51
+- fortsh.
52
+
53
+Within 5% of `ld` on each. Larger than 5% triggers an investigation into where the bloat lives.
54
+
55
+### 6. Performance audit
56
+
57
+Sprint 28's benchmarks run one more time. fortsh link within 2× of `ld`. No regression since Sprint 28.
58
+
59
+### 7. Diagnostic quality audit
60
+
61
+Manual pass over every error and warning message. Each evaluated on:
62
+- Does it name the input?
63
+- Does it cite a location (file, offset, symbol)?
64
+- Does it tell the user how to fix it?
65
+
66
+Low-quality diagnostics fixed on the spot.
67
+
68
+### 8. Dead code and `unwrap`/`panic` sweep
69
+
70
+Cargo-geiger-style (but hand-rolled, since we forbid external deps):
71
+- Every `.unwrap()` / `.expect()` reviewed. Panics only in truly-impossible cases.
72
+- Every `todo!()` or `unimplemented!()` either implemented or explicitly deferred with a pointer to a future sprint.
73
+- Dead code removed.
74
+
75
+### 9. CLAUDE.md, README, overview.md refresh
76
+
77
+Sync documentation with the final state of the crate. Note any scope changes from the original plan. If any sprint was rescoped or split, update the sprint index.
78
+
79
+### 10. Submodule pin
80
+
81
+Parent armfortas pinned to a specific afs-ld commit. Tag the afs-ld repo `v0.1.0`.
82
+
83
+### 11. Default-swap removal
84
+
85
+After the audit passes, Sprint 20's `AFS_LD=1` default flip becomes permanent. The env-var fallback stays for one more sprint as a safety net (configurable via `AFS_LD=0` to fall back to system `ld`), then removed entirely.
86
+
87
+## Testing Strategy
88
+
89
+- Every prior test suite run; all green.
90
+- Determinism sweep (§2).
91
+- Perf sweep (§6).
92
+- Manual binary-size diff (§5).
93
+- Manual CLI parity checklist (§4).
94
+
95
+## Definition of Done
96
+
97
+- Audit report `.docs/audits/sprint31_final.md` written.
98
+- All tests green.
99
+- No open critical items.
100
+- afs-ld is the armfortas default linker.
101
+- Tagged `v0.1.0`.