fortrangoingonforty/afs-ld / c1785ff

Browse files

Track planning docs

Authored by espadonne
SHA
c1785ffed8b1e9d5017483aee68344ac38ef06ec
Parents
233258a
Tree
664d871

7 changed files

StatusFile+-
A .docs/sprints/closeout0-9.md 247 0
M .docs/sprints/sprint00.md 6 4
M .docs/sprints/sprint01.md 6 0
M .docs/sprints/sprint02.md 7 0
M .docs/sprints/sprint04.md 11 1
M .docs/sprints/sprint08.md 13 11
A AGENTS.md 349 0
.docs/sprints/closeout0-9.mdadded
@@ -0,0 +1,247 @@
1
+# Sprint 0-9 Closeout Checklist
2
+
3
+Concrete closeout checklist based on the current codebase audit.
4
+
5
+Current conclusion: we are not ready to honestly declare Sprint 10 complete-in-practice yet.
6
+The main blockers are:
7
+
8
+- Sprint 0's tolerated-diff categories are still deferred until afs-ld can emit real linked output for Mach-O-to-Mach-O differential checks.
9
+
10
+## Sprint 10 Gate
11
+
12
+Do not declare "we are on Sprint 10" until all of these are true:
13
+
14
+- [x] Sprint 9 reloc referents are remapped to atom-aware forms.
15
+- [x] Sprint 8 resolution orchestration exists as a real callable stage, not just loose helper APIs.
16
+- [x] `cargo test -p afs-ld` is green after the closeout work.
17
+- [x] `cargo clippy -p afs-ld --all-targets -- -D warnings` is green after the closeout work.
18
+- [x] `README.md` and sprint docs no longer materially misstate the current state of the crate.
19
+
20
+## Recommended Order
21
+
22
+- [x] Close Sprint 9 reloc-to-atom remap first.
23
+- [x] Close Sprint 8 resolution orchestration and option coverage second.
24
+- [x] Close Sprint 6 TBD/SDK search gaps third.
25
+- [x] Close Sprint 4 nested archive support fourth.
26
+- [ ] Finish the deferred Sprint 0 differential-harness tolerance work once afs-ld can emit real output.
27
+
28
+## Cross-Sprint Exit Criteria
29
+
30
+- [ ] Every closeout chunk lands with tests.
31
+- [ ] Every bug fix or behavioral gap gets a regression test.
32
+- [ ] No newly-discovered roadmap/code mismatch is left undocumented.
33
+- [ ] Any user-facing diagnostic we touch stays deterministic and testable.
34
+
35
+## Sprint 0
36
+
37
+Status: closed
38
+
39
+Validated:
40
+
41
+- [x] `afs-ld` exists as its own git submodule in the parent workspace.
42
+- [x] Parent `Cargo.toml` includes `afs-ld` as a workspace member.
43
+- [x] `CLAUDE.md`, `README.md`, crate wiring, and test harness scaffolding exist.
44
+- [x] Reference repos are present under parent `.refs/` (`ld64`, `mold`, `lld`).
45
+- [x] `tests/reader_empty.rs` enforces the empty-invocation CLI contract.
46
+- [x] `tests/diff_harness_sanity.rs` and `tests/diff_harness_finds_critical.rs` exist and pass.
47
+- [x] `cargo clippy -p afs-ld --all-targets -- -D warnings` is currently clean.
48
+
49
+Remaining closeout work:
50
+
51
+- [x] Explicitly downscope Sprint 0 docs so the current diff harness is described as synthetic until end-to-end linking exists.
52
+- [ ] Add tolerated-diff categories once real Mach-O-to-Mach-O comparisons exist.
53
+
54
+## Sprint 1
55
+
56
+Status: closed
57
+
58
+Validated:
59
+
60
+- [x] Mach-O constants are duplicated locally in `src/macho/constants.rs`.
61
+- [x] `MachHeader64` parsing exists and rejects malformed headers.
62
+- [x] Load-command dispatch exists and preserves unknown commands as raw bytes.
63
+- [x] Segment and section-header metadata parsing exists.
64
+- [x] `LC_BUILD_VERSION` and `LC_LINKER_OPTIMIZATION_HINT` decoding exists.
65
+- [x] `--dump` exists through `src/dump.rs` and `src/main.rs`.
66
+- [x] Corpus round-trip tests pass in `tests/reader_corpus_round_trip.rs`.
67
+
68
+Remaining closeout work:
69
+
70
+- [x] Add an `otool -lV` parity test for dumper output shape across the corpus.
71
+- [x] Add a panic-focused malformed-input stress pass beyond the current unit tests so the "no panics on malformed input" claim is defensible.
72
+
73
+## Sprint 2
74
+
75
+Status: closed
76
+
77
+Validated:
78
+
79
+- [x] Section classification exists in `src/section.rs`.
80
+- [x] `InputSection` carries section data and raw relocation bytes.
81
+- [x] `RawNlist` / `InputSymbol` parsing and classification exist in `src/symbol.rs`.
82
+- [x] Common symbols, weak flags, private externs, and indirect aliases are surfaced.
83
+- [x] `StringTable` exists and handles suffix-dedup overlaps.
84
+- [x] `DysymtabCmd` is parsed and exposed through `ObjectFile`.
85
+- [x] `ObjectFile` integrates header, commands, sections, symbols, strings, and dysymtab.
86
+
87
+Remaining closeout work:
88
+
89
+- [x] Add `nm -a` parity tests for symbol view and classification.
90
+- [x] Add `otool -r` parity checks for relocation-offset surfaces promised by Sprint 2, with section/load-command parity covered by the Sprint 1 `otool -lV` gate.
91
+- [x] Add stronger malformed-symbol / malformed-string-table stress coverage if we want the "never panics" bar to be explicit.
92
+
93
+## Sprint 3
94
+
95
+Status: closed enough for current closeout
96
+
97
+Validated:
98
+
99
+- [x] ARM64 relocation constants exist.
100
+- [x] Raw relocation parsing and writing exist.
101
+- [x] Fused `Reloc` form exists.
102
+- [x] `ADDEND` and `SUBTRACTOR + UNSIGNED` pairing is fused in `parse_relocs`.
103
+- [x] Validation logic exists in `validate_relocs`.
104
+- [x] Write-side round-trip support exists.
105
+- [x] Unit coverage is broad and current corpus relocation round-trips pass.
106
+
107
+Remaining closeout work:
108
+
109
+- [x] No audit-blocking work found for Sprint 3.
110
+
111
+## Sprint 4
112
+
113
+Status: closed
114
+
115
+Validated:
116
+
117
+- [x] BSD, SysV, and GNU-thin archive flavors are recognized.
118
+- [x] Archive headers and name decoding are implemented.
119
+- [x] Symbol-index parsing exists for BSD and SysV archives.
120
+- [x] Lazy member fetch exists via `fetch_object_defining`.
121
+- [x] `libarmfortas_rt.a` is exercised by `tests/archive_runtime.rs`.
122
+- [x] Archive dump mode exists via `--dump-archive`.
123
+
124
+Remaining closeout work:
125
+
126
+- [x] Implement one-level nested archive support (`.a` member inside `.a`) and preserve provenance for diagnostics.
127
+- [x] Formally treat `resolve::force_load_archive` / `force_load_all` as the Sprint 4 completion surface and document that surface instead of adding a parallel archive-only helper.
128
+- [x] Add `ar -t` shape/parity coverage for `--dump-archive`.
129
+
130
+## Sprint 5
131
+
132
+Status: partially closed
133
+
134
+Validated:
135
+
136
+- [x] `DylibFile` exists and parses binary `MH_DYLIB`.
137
+- [x] `LC_ID_DYLIB`, dependency dylib commands, ordinals, and rpaths are decoded.
138
+- [x] Export trie decoding exists with cycle/depth protection.
139
+- [x] Real clang-built dylib coverage exists in `tests/dylib_integration.rs`.
140
+- [x] Dylib dump mode exists via `--dump-dylib`.
141
+
142
+Remaining closeout work:
143
+
144
+- [ ] Prove recursive re-export / umbrella lookup behavior with a focused test, not just dependency collection.
145
+- [ ] Confirm the public dylib surface matches what Sprint 5 intended for re-exported symbols, not only direct exports.
146
+
147
+## Sprint 6
148
+
149
+Status: closed
150
+
151
+Validated:
152
+
153
+- [x] The custom YAML subset parser exists in `src/macho/tbd_yaml.rs`.
154
+- [x] TBD schema decoding exists in `src/macho/tbd.rs`.
155
+- [x] `DylibFile::from_tbd` exists and materializes TBDs into the same linker-facing surface.
156
+- [x] Real `libSystem.tbd` smoke/integration coverage exists in `tests/tbd_smoke.rs` and `tests/tbd_integration.rs`.
157
+- [x] TBD dump mode exists via `--dump-tbd`.
158
+
159
+Remaining closeout work:
160
+
161
+- [x] Implement SDK `-syslibroot` library search helpers for `.tbd` / `.dylib`.
162
+- [x] Implement framework search helpers promised by Sprint 6.
163
+- [x] Make target filtering fail loudly when the requested target is not exported, instead of only materializing matching targets when the caller already knows one exists.
164
+- [x] No further audit-blocking work found for Sprint 6 in the current helper/test surface.
165
+
166
+## Sprint 7
167
+
168
+Status: closed
169
+
170
+Validated:
171
+
172
+- [x] `Symbol` sum type exists with the planned major variants.
173
+- [x] `StringInterner`, opaque ids, and `SymbolTable` exist.
174
+- [x] The insertion matrix is heavily unit-tested.
175
+- [x] Weak/strong/common coalescing behavior is covered in unit tests.
176
+- [x] Alias-cycle detection and chain resolution exist.
177
+- [x] Transition logging exists.
178
+
179
+Remaining closeout work:
180
+
181
+- [ ] Add the differential weak-coalescing / duplicate-behavior coverage against system `ld` that Sprint 7 originally called for.
182
+
183
+## Sprint 8
184
+
185
+Status: closed
186
+
187
+Validated:
188
+
189
+- [x] Archive seeding, object seeding, and dylib seeding exist.
190
+- [x] Fixed-point archive fetch draining exists.
191
+- [x] `force_load_archive` and `force_load_all` helpers exist in `src/resolve.rs`.
192
+- [x] Undefined classification exists for `Error`, `Warning`, `Suppress`, and `DynamicLookup`.
193
+- [x] Did-you-mean support exists.
194
+- [x] Duplicate-symbol and undefined-symbol formatting helpers exist.
195
+- [x] Real integration coverage exists for archive pull plus unresolved-symbol reporting.
196
+
197
+Remaining closeout work:
198
+
199
+- [x] Add a real orchestration entrypoint for resolution (`seed -> optional force load -> drain -> classify`) that can be called as a coherent stage.
200
+- [x] Add option/state plumbing for `all_load`, `force_load`, and undefined treatment so resolution is not just a bag of helper APIs.
201
+- [x] Add an archive order-sensitivity test.
202
+- [x] Add dedicated tests for `force_load_archive` and `force_load_all`.
203
+- [x] Add dedicated tests for `UndefinedTreatment::Warning`, `Suppress`, and `DynamicLookup`.
204
+- [x] Add a dedicated test that unresolved weak refs stay accepted regardless of treatment.
205
+- [x] Tighten diagnostics toward the Sprint 8 format by carrying section/offset provenance and aggregate repeated relocation sites when available.
206
+
207
+## Sprint 9
208
+
209
+Status: closed
210
+
211
+Validated:
212
+
213
+- [x] Atom model and atom table exist.
214
+- [x] Section splitting at symbol boundaries exists.
215
+- [x] `.alt_entry` folding exists.
216
+- [x] CString atom splitting exists and is integration-tested.
217
+- [x] Compact-unwind atom splitting and `parent_of` wiring exist.
218
+- [x] Backpatching of `Symbol::Defined { atom }` exists.
219
+- [x] `N_NO_DEAD_STRIP` and weak-def flags are propagated into atom flags.
220
+- [x] Embedded payload addends on symbol-based data relocs are folded into local atom offsets or preserved on external refs.
221
+
222
+Remaining closeout work:
223
+
224
+- [x] Remap relocations from raw section/symbol referents into atom-aware referents.
225
+- [x] Add atom-local relocation storage or an equivalent per-atom relocation view.
226
+- [x] Ensure same-object references point at target atoms, not raw section offsets.
227
+- [x] Add a focused integration test proving a local branch or data reference resolves to the callee/target atom.
228
+- [x] Add a boundary-crossing reloc diagnostic test.
229
+- [x] Confirm no raw section-relative relocation state leaks into Sprint 10 inputs.
230
+- [x] No further audit-blocking work found for Sprint 9 in the current corpus and targeted local-addend probes.
231
+
232
+## Documentation Closeout
233
+
234
+- [x] Update `README.md` so it no longer says the crate is only Sprint 0 scaffolding.
235
+- [x] Refresh sprint docs whose deliverables have been implemented under a different surface than originally planned.
236
+- [x] Keep `CLAUDE.md` as the authority for discipline, but make user-facing docs match the actual code.
237
+
238
+## Verification Commands
239
+
240
+- [x] `cargo test -p afs-ld`
241
+- [x] `cargo clippy -p afs-ld --all-targets -- -D warnings`
242
+- [ ] Focused xcrun-backed checks when touching reader/resolve/atom/TBD/dylib paths:
243
+  - [x] `cargo test -p afs-ld --test reader_corpus_round_trip -- --nocapture`
244
+  - [x] `cargo test -p afs-ld --test resolve_integration -- --nocapture`
245
+  - [x] `cargo test -p afs-ld --test atom_integration -- --nocapture`
246
+  - [x] `cargo test -p afs-ld --test dylib_integration -- --nocapture`
247
+  - [x] `cargo test -p afs-ld --test tbd_integration -- --nocapture`
.docs/sprints/sprint00.mdmodified
@@ -88,14 +88,16 @@ pub fn link_both(case: &LinkCase) -> LinkOutputs;
8888
 pub fn diff_macho(ours: &[u8], theirs: &[u8]) -> DiffReport;
8989
 ```
9090
 
91
-`DiffReport` categorizes byte differences as `Tolerated` (UUID, timestamp, temp-path hashes) or `Critical` (anything else). Critical diffs fail the test. `link_both` shells out to `ld` via `xcrun -f ld` so it picks up the active toolchain.
91
+`DiffReport` categorizes byte differences as `Tolerated` (UUID, timestamp, temp-path hashes) or `Critical` (anything else). Critical diffs fail the test.
92
+
93
+Closeout note: the current Sprint 0 surface is intentionally synthetic. `diff_macho` exists and is tested, but `link_both` remains a placeholder until afs-ld can emit real linked output. That means the current harness validates diff categorization logic, not end-to-end linker parity yet.
9294
 
9395
 ### 6. Skeleton CLI and first failing test
9496
 
9597
 - `afs-ld/src/args.rs`: hand-rolled argv parser stub that recognizes `-o`, `-e`, `-arch`, and positional inputs. Unknown flags error loudly with a hint.
9698
 - `afs-ld/tests/reader_empty.rs`: attempts to link `0 inputs → empty output`, expects the diagnostic `"afs-ld: error: no input files"`. Passes today by producing that exact string.
97
-- `afs-ld/tests/diff_harness_sanity.rs`: runs the harness against a known-identical pair (two copies of the same pre-linked binary produced by `xcrun ld`) and expects zero diffs. Passes.
98
-- `afs-ld/tests/diff_harness_finds_critical.rs`: feeds the harness two binaries that differ in a non-tolerated byte range (e.g. different text bytes) and asserts the harness reports `Critical`. Passes.
99
+- `afs-ld/tests/diff_harness_sanity.rs`: exercises the diff surface against two identical synthetic byte slices and expects zero diffs. Passes.
100
+- `afs-ld/tests/diff_harness_finds_critical.rs`: feeds the harness two synthetic binaries that differ in a non-tolerated byte range and asserts the harness reports `Critical`. Passes.
99101
 
100102
 ## Testing Strategy
101103
 
@@ -115,5 +117,5 @@ pub fn diff_macho(ours: &[u8], theirs: &[u8]) -> DiffReport;
115117
 - `armfortas/Cargo.toml` lists `afs-ld` in `[workspace] members`.
116118
 - `afs-ld/CLAUDE.md`, `README.md`, `Cargo.toml`, `src/lib.rs`, `src/main.rs`, `src/args.rs` all committed in the new repo.
117119
 - `.refs/ld64/` and `.refs/mold/` cloned.
118
-- Differential harness runs, correctly reports zero diffs on identical binaries, correctly reports critical diffs on intentionally-different binaries.
120
+- Differential harness substrate runs, correctly reports zero diffs on identical byte slices, correctly reports critical diffs on intentionally-different byte slices.
119121
 - `cargo test --workspace` green.
.docs/sprints/sprint01.mdmodified
@@ -6,6 +6,12 @@ Sprint 0 — crate, harness, references in place.
66
 ## Goals
77
 Read a Mach-O relocatable object file: parse the header and every load command afs-as emits. End state: given any `.o` in `afs-as/tests/corpus/`, afs-ld can pretty-print its structure and round-trip-compare it to a golden.
88
 
9
+Closeout note: alongside the original unit coverage, `tests/reader_malformed_stress.rs`
10
+now runs deterministic truncated/header-corruption cases over real corpus-built
11
+objects to defend the "no panics on malformed input" bar, and
12
+`tests/reader_tool_parity.rs` checks the `--dump` load-command surface against
13
+`otool -lV` across the afs-as corpus.
14
+
915
 ## Deliverables
1016
 
1117
 ### 1. Mach-O constants
.docs/sprints/sprint02.mdmodified
@@ -6,6 +6,13 @@ Sprint 1 — header + load commands parsed.
66
 ## Goals
77
 Decode section payloads, the symbol table (nlist_64), and the string table. Expose the full section/symbol/string model that later sprints build on.
88
 
9
+Closeout note: `tests/reader_malformed_stress.rs` now also covers malformed
10
+symbol/string-table variants derived from real corpus objects so the reader's
11
+symbol and string surfaces are exercised under targeted bad-input cases, not
12
+just hand-written unit fixtures. `tests/reader_tool_parity.rs` now also checks
13
+symbol classification against `nm -a` and raw relocation tables against
14
+`otool -r` across the afs-as corpus.
15
+
916
 ## Deliverables
1017
 
1118
 ### 1. Section attributes and kinds
.docs/sprints/sprint04.mdmodified
@@ -6,6 +6,13 @@ Sprints 1–3 — Mach-O reading complete.
66
 ## Goals
77
 Read static archives (`.a`) including the BSD, System V, and GNU-thin variants. Support lazy member fetching: a member is only parsed when an undefined symbol names it. This is the mechanism by which `libarmfortas_rt.a` gets pulled in.
88
 
9
+Closeout note: the force-load surface landed in the resolver as
10
+`resolve::force_load_archive` / `resolve::force_load_all`, and one-level
11
+nested archives are expanded through the fetched-member path with provenance
12
+chains such as `outer.a(inner.a)(foo.o)`. `--dump-archive` now intentionally
13
+prints the same member listing shape as `ar -t`, and parity is checked
14
+against both generated archives and `libarmfortas_rt.a` when available.
15
+
916
 ## Deliverables
1017
 
1118
 ### 1. Archive format recognizer
@@ -69,7 +76,10 @@ impl<'a> Archive<'a> {
6976
 Returns `None` if the archive does not define `name`. Fetching an archive member memoizes: a second lookup for the same member returns a cached handle. The resolution pass (Sprint 8) is the only caller.
7077
 
7178
 ### 6. `-force_load` / `-all_load` support (semantics, not CLI yet)
72
-Archive has a `force_all(&mut self)` method that pre-fetches every member. Sprint 19 wires the CLI.
79
+Implemented via the resolver-level helpers
80
+`resolve::force_load_archive` / `resolve::force_load_all`, which pre-fetch
81
+archive members against the live linker input registry. Sprint 19 wires the
82
+CLI surface.
7383
 
7484
 ### 7. Archive-of-archives
7585
 Rare but legal: member can be another `.a`. Recurse one level. If a sub-archive defines `name`, the outer `fetch` returns the sub-member's object file and records a provenance chain for diagnostics.
.docs/sprints/sprint08.mdmodified
@@ -6,6 +6,11 @@ Sprint 7 — `SymbolTable` with insertion semantics.
66
 ## Goals
77
 Drive the symbol table to a fixed point: every undefined reference either resolves to a Defined (from an object), Common (promoted in BSS), DylibImport (from a dylib/TBD), or raises a clear, actionable diagnostic. `-force_load` / `-all_load` / `-undefined <treatment>` all handled.
88
 
9
+Closeout note: the implemented entrypoint is
10
+`resolve(inputs, table, opts) -> ResolutionReport`. The current library
11
+surface applies archive force-loading as archives are encountered in
12
+command-line order so left-to-right archive behavior stays explicit.
13
+
914
 ## Deliverables
1015
 
1116
 ### 1. Resolution algorithm
@@ -13,13 +18,10 @@ Drive the symbol table to a fixed point: every undefined reference either resolv
1318
 
1419
 ```rust
1520
 pub fn resolve(inputs: &mut Inputs, table: &mut SymbolTable, opts: &LinkOptions)
16
-    -> Result<(), Vec<ResolveError>>
21
+    -> Result<ResolutionReport, ResolutionError>
1722
 {
18
-    seed_table_with_objects_and_dylib_imports(inputs, table, opts);
19
-    if opts.all_load    { force_load_everything(inputs, table); }
20
-    for forced in &opts.force_load { force_load_one(inputs, table, forced); }
21
-    fixed_point_pull_from_archives(inputs, table);
22
-    classify_unresolved(table, opts);
23
+    seed_and_resolve_in_link_order(inputs, table, opts);
24
+    classify_unresolved(table, opts.undefined_treatment);
2325
 }
2426
 ```
2527
 
@@ -43,7 +45,7 @@ Order matters: armfortas's driver currently passes `<objs> <runtime.a> -lSystem`
4345
 ### 4. `-force_load` and `-all_load`
4446
 - `-force_load <archive>`: pull every member of that archive before fixed-point.
4547
 - `-all_load`: pull every member of every archive.
46
-- Both happen before the fixed-point loop so their transitively-pulled symbols feed into the same fixed point.
48
+- In the implemented surface these happen when the named archive is encountered in link order, which preserves left-to-right linker semantics while still feeding the same resolution/classification pipeline.
4749
 
4850
 ### 5. `-undefined <treatment>`
4951
 After the fixed point, any still-Undefined entry is classified by the `-undefined` setting:
@@ -60,8 +62,8 @@ Undefined errors must cite every referrer input, not just one. Output format:
6062
 
6163
 ```
6264
 afs-ld: error: undefined symbol: _afs_print
63
-      referenced by program.o(text section + 0x34)
64
-      referenced by runtime.o(text section + 0x120)
65
+      referenced by program.o(__TEXT,__text + 0x34)
66
+      referenced by runtime.o(__TEXT,__text + 0x120)
6567
       (also via 2 relocations in libarmfortas_rt.a(io.o))
6668
 Hint: did you mean _afs_print_real? (Levenshtein distance 5)
6769
 ```
@@ -71,8 +73,8 @@ Did-you-mean uses a basic Levenshtein-3 search over defined symbols.
7173
 ### 8. Diagnostics for duplicate strong
7274
 ```
7375
 afs-ld: error: duplicate symbol _foo
74
-  defined in: a.o (text + 0x0)
75
-  also in:    b.o (text + 0x0)
76
+  defined in: a.o (__TEXT,__text + 0x0)
77
+  also in:    b.o (__TEXT,__text + 0x0)
7678
 ```
7779
 
7880
 No suggestion — two strong defs is a real ambiguity.
AGENTS.mdadded
@@ -0,0 +1,349 @@
1
+# AFS-LD
2
+
3
+Local working guide for agents in `afs-ld`. Keep this file untracked.
4
+`CLAUDE.md` is the tracked, authoritative policy file; this document adds a
5
+reality-checked snapshot of the current implementation so we do not confuse the
6
+roadmap with shipped code.
7
+
8
+## Repository Context
9
+
10
+`afs-ld` is the standalone ARM64 Mach-O linker for the ARMFORTAS toolchain. It
11
+sits beside `afs-as` as a submodule in the `armfortas` workspace and is meant
12
+to replace Apple's `ld` for binaries produced by armfortas.
13
+
14
+The project boundary is intentionally clean:
15
+
16
+- `afs-as` emits `MH_OBJECT`.
17
+- `afs-ld` reads `.o`, `.a`, `.dylib`, and `.tbd`.
18
+- armfortas should eventually hand final linking to `afs-ld` rather than to
19
+  the system linker.
20
+
21
+The project is Mach-O only, macOS only, arm64 only, stdlib only.
22
+
23
+## Definition Of Done
24
+
25
+The real finish line is not "parses some objects" or "links hello world once."
26
+It is parity with Apple's `ld` for the binaries armfortas and fortsh need:
27
+
28
+- arm64 Mach-O executables and dylibs
29
+- static archive linking
30
+- dylib and TBD ingestion
31
+- dyld metadata that works on real macOS systems
32
+- ad-hoc signing so output executes on Apple Silicon
33
+- deterministic output
34
+- enough correctness to link fortsh without ARM-specific workarounds
35
+
36
+## Current Reality
37
+
38
+This repo is ahead of Sprint 0 scaffolding, but it is not yet a full linker.
39
+The roadmap in `.docs/overview.md` and `.docs/sprints/` is broader than the
40
+code that exists today.
41
+
42
+What is implemented now:
43
+
44
+- hand-rolled CLI parsing for a small flag subset plus dump modes
45
+- Mach-O header/load-command/section/symbol/string-table reading
46
+- relocation parsing, fusion, validation, and round-trip support
47
+- archive parsing and lazy member fetch support
48
+- binary dylib parsing and export-trie walking
49
+- TAPI TBD v4 parsing, including the custom YAML subset parser
50
+- linker-side symbol interning, symbol table modeling, and resolution passes
51
+- subsections-via-symbols atomization
52
+- `--dump`, `--dump-archive`, `--dump-dylib`, and `--dump-tbd`
53
+
54
+What is not implemented yet:
55
+
56
+- real `Linker::run` output production
57
+- output layout and Mach-O writing
58
+- dyld metadata synthesis
59
+- code signing
60
+- dead-strip / ICF / thunks
61
+- real differential linking against Apple `ld`
62
+- driver integration with armfortas
63
+- the full `ld`-compatible CLI surface described in Sprint 19
64
+
65
+Important practical note:
66
+
67
+- `src/lib.rs` still returns `LinkError::NotYetImplemented` for real link runs.
68
+- `tests/common/harness.rs::link_both` still panics because full end-to-end
69
+  linker execution has not landed.
70
+- `README.md` still describes the crate as "Sprint 0 scaffolding only," which is
71
+  now too pessimistic for the read-side code but still accurate for the actual
72
+  link-producing path.
73
+
74
+As of 2026-04-15 in this checkout, `cargo test -p afs-ld` is green.
75
+
76
+## Strengths
77
+
78
+- The read-side core is already substantial and well-tested.
79
+- The project has strong bespoke discipline: no `clap`, `serde`, `object`,
80
+  `goblin`, `byteorder`, or other format-parsing shortcuts.
81
+- Raw wire structures are modeled explicitly and usually paired with
82
+  round-trip-oriented tests.
83
+- The type modeling is strong: opaque ids, interned strings, explicit symbol
84
+  states, explicit atom ownership, explicit relocation referents.
85
+- Real-world fixtures are already in play: afs-as corpus objects,
86
+  `libarmfortas_rt.a`, `libSystem.tbd`, and small clang-built dylibs.
87
+- The codebase already separates concerns cleanly enough that writer/layout work
88
+  can land without tearing up the read-side foundation.
89
+- Dump modes make inspection easy and are useful while the full writer does not
90
+  exist yet.
91
+
92
+## Weaknesses And Risk Areas
93
+
94
+- The actual link-producing pipeline does not exist yet, so the hardest parity
95
+  bugs are still ahead of us.
96
+- Some tracked docs are aspirational. `.docs/overview.md` is the intended end
97
+  state, not a guarantee that every listed module already exists.
98
+- `README.md` is stale in the opposite direction: it understates how much
99
+  read-side work has landed.
100
+- The current diagnostics surface is still minimal. `src/diag.rs` only prints
101
+  `afs-ld: error: ...`; the richer caret diagnostics are planned, not present.
102
+- The CLI surface is intentionally tiny right now. Any work that assumes
103
+  `ld`-compatibility must start by checking `src/args.rs`, not by trusting the
104
+  sprint plan.
105
+- Performance characteristics are mostly unknown because the writer, layout, and
106
+  full-link path are not in place yet.
107
+- The differential harness is only half-built: the diff engine exists, but the
108
+  "run both linkers" machinery is not wired.
109
+- Several future modules named in the roadmap do not exist yet:
110
+  `layout.rs`, `driver.rs`, `map.rs`, `gc.rs`, `icf.rs`, `synth/`,
111
+  `macho/writer.rs`, and the code-signing path are all still planned work.
112
+
113
+## Build And Test
114
+
115
+Primary commands:
116
+
117
+```bash
118
+cargo build -p afs-ld
119
+cargo test -p afs-ld
120
+cargo clippy -p afs-ld --all-targets -- -D warnings
121
+```
122
+
123
+Useful targeted commands:
124
+
125
+```bash
126
+cargo test --lib -p afs-ld
127
+cargo test --test reader_corpus_round_trip -p afs-ld
128
+cargo test --test archive_runtime -p afs-ld
129
+cargo test --test dylib_integration -p afs-ld
130
+cargo test --test tbd_integration -p afs-ld
131
+cargo test --test resolve_integration -p afs-ld
132
+cargo test --test atom_integration -p afs-ld
133
+cargo test -p afs-ld -- <substring>
134
+```
135
+
136
+Environment assumptions:
137
+
138
+- macOS on Apple Silicon
139
+- Xcode command-line tools available through `xcrun`
140
+- access to the parent workspace, especially `runtime/` and `.refs/`
141
+
142
+Integration tests already shell out to system tools in a few places. Do not
143
+replace those with fake fixtures if a real toolchain interaction is the thing
144
+being tested.
145
+
146
+## Project Structure
147
+
148
+Actual source tree today:
149
+
150
+```text
151
+afs-ld/
152
+├── CLAUDE.md
153
+├── README.md
154
+├── .docs/
155
+│   ├── overview.md
156
+│   └── sprints/
157
+├── src/
158
+│   ├── archive.rs
159
+│   ├── args.rs
160
+│   ├── atom.rs
161
+│   ├── diag.rs
162
+│   ├── dump.rs
163
+│   ├── input.rs
164
+│   ├── leb.rs
165
+│   ├── lib.rs
166
+│   ├── main.rs
167
+│   ├── resolve.rs
168
+│   ├── section.rs
169
+│   ├── string_table.rs
170
+│   ├── symbol.rs
171
+│   ├── macho/
172
+│   │   ├── constants.rs
173
+│   │   ├── dylib.rs
174
+│   │   ├── exports.rs
175
+│   │   ├── reader.rs
176
+│   │   ├── tbd.rs
177
+│   │   └── tbd_yaml.rs
178
+│   └── reloc/
179
+│       └── mod.rs
180
+└── tests/
181
+    ├── common/harness.rs
182
+    ├── archive_runtime.rs
183
+    ├── atom_integration.rs
184
+    ├── diff_harness_*.rs
185
+    ├── dylib_integration.rs
186
+    ├── reader_*.rs
187
+    ├── resolve_integration.rs
188
+    ├── tbd_*.rs
189
+    └── reader_corpus_round_trip.rs
190
+```
191
+
192
+Planned future modules listed in the docs should be treated as design intent,
193
+not as present-tense implementation.
194
+
195
+## Implemented Pipeline Vs Planned Pipeline
196
+
197
+Implemented today:
198
+
199
+```text
200
+argv
201
+  -> args.rs
202
+  -> dump/read paths
203
+  -> archive/object/dylib/TBD ingestion
204
+  -> symbol/section/reloc decoding
205
+  -> resolve.rs
206
+  -> atom.rs
207
+```
208
+
209
+Current real-link path:
210
+
211
+```text
212
+argv -> args.rs -> Linker::run -> NotYetImplemented
213
+```
214
+
215
+Planned end-to-end pipeline from the roadmap:
216
+
217
+```text
218
+args -> inputs -> resolve -> atomize -> layout -> apply relocs
219
+     -> synth sections -> write -> sign
220
+```
221
+
222
+When you are planning work, always identify which of those stages is real in
223
+this checkout and which stage is still only described in docs.
224
+
225
+## Development Guidance
226
+
227
+### 1. Trust code and tests over roadmap prose
228
+
229
+Read these in order before substantial work:
230
+
231
+1. `CLAUDE.md`
232
+2. `.docs/overview.md`
233
+3. the relevant sprint file in `.docs/sprints/`
234
+4. the actual Rust module you will touch
235
+5. the tests covering that module
236
+
237
+If the docs and the code disagree, treat the code plus tests as the truth about
238
+what exists today, then decide whether the docs need to be refreshed.
239
+
240
+### 2. Keep the bespoke contract intact
241
+
242
+- Stdlib only unless a dependency discussion happens first.
243
+- Do not couple afs-ld to afs-as at a Rust type level.
244
+- Duplicate Mach-O constants locally when needed.
245
+- Do not hide format details behind clever abstractions that erase wire truth.
246
+
247
+### 3. Preserve the wire
248
+
249
+- Keep raw bytes or raw fields accessible when lossless re-emission matters.
250
+- Prefer explicit parse and write pairs for on-disk structures.
251
+- Avoid converting fixed-size or padded wire data into lossy higher-level forms
252
+  unless the raw representation is still available somewhere.
253
+- If a new decoder lands, pair it with tests that prove it round-trips or at
254
+  least preserves the exact bytes relevant to the current stage.
255
+
256
+### 4. Be explicit about incomplete work
257
+
258
+- Hard errors are better than silent wrong answers.
259
+- If something is not implemented, say so directly.
260
+- Do not introduce "temporary" behavior that quietly emits malformed Mach-O.
261
+- Do not soften a missing feature into a no-op unless the flag or structure is
262
+  explicitly intended to be ignored.
263
+
264
+### 5. Exhaustive matches matter
265
+
266
+- Prefer enums for wire forms and linker-side states.
267
+- Avoid catch-all `_` arms in production matches when a new variant should force
268
+  the compiler to help us.
269
+- When adding a new variant, update every relevant match deliberately.
270
+
271
+### 6. Keep dump surfaces useful
272
+
273
+- `--dump*` modes are an active debugging tool, not a side feature.
274
+- When new reader functionality lands, extend the corresponding dump output.
275
+- If you add a new parsed field but the dump cannot show it, the repo loses one
276
+  of its best inspection surfaces.
277
+
278
+### 7. Respect deterministic behavior
279
+
280
+- Avoid nondeterministic iteration when output order matters.
281
+- Avoid timestamps, random ids, or unstable hashing in any future write path.
282
+- When adding diagnostics, keep them stable and testable.
283
+
284
+## Testing Practices
285
+
286
+- Every bug fix gets a regression test.
287
+- New parser behavior should land with unit tests close to the module.
288
+- When touching integration behavior, prefer real fixtures over mocked ones.
289
+- For archive work, look first at `tests/archive_runtime.rs`.
290
+- For dylib and TBD work, look first at `tests/dylib_integration.rs`,
291
+  `tests/tbd_integration.rs`, and `tests/tbd_smoke.rs`.
292
+- For reader invariants, `tests/reader_corpus_round_trip.rs` is a key guardrail.
293
+- For resolution and atomization, `tests/resolve_integration.rs` and
294
+  `tests/atom_integration.rs` should move with the code.
295
+- If you add future write-side functionality, extend the differential harness
296
+  rather than building a parallel ad hoc test path.
297
+
298
+Run focused tests first, then widen:
299
+
300
+- module-local or single integration test while developing
301
+- `cargo test -p afs-ld` before handing work off
302
+- `cargo clippy -p afs-ld --all-targets -- -D warnings` when changing code paths
303
+  broadly enough to justify it
304
+
305
+## Documentation Practices
306
+
307
+- `CLAUDE.md` is policy and development discipline.
308
+- `.docs/overview.md` is the intended architecture and scope.
309
+- `.docs/sprints/` is the staged roadmap.
310
+- `README.md` is user-facing and currently stale relative to the read-side code.
311
+
312
+When a change materially shifts reality, update the tracked docs that are now
313
+misleading. This is especially important in this repo because the roadmap is
314
+ambitious and can otherwise create false assumptions for future work.
315
+
316
+## References
317
+
318
+Use the parent repository's references when you need to confirm Mach-O or linker
319
+behavior instead of inventing from memory:
320
+
321
+- `.refs/llvm/lld/MachO/` for architecture and pass structure
322
+- `.refs/ld64/` for Apple-parity edge cases
323
+- `.refs/mold/` for performance ideas and comparative implementation choices
324
+
325
+Also use Apple's Mach-O and arm64 relocation headers as the numeric source of
326
+truth for constants mirrored in `src/macho/constants.rs`.
327
+
328
+## Working Style For This Repo
329
+
330
+- Prefer small, reviewable changes.
331
+- Keep commit messages terse and imperative.
332
+- Do not mention sprint numbers in commit subjects.
333
+- Avoid monolithic "land the whole linker" changes; the sprint plan is granular
334
+  for a reason.
335
+- Before implementing a planned module from the roadmap, make sure the crate
336
+  actually has the prerequisites the sprint assumed.
337
+- If you are about to say "the docs say this exists," stop and confirm with
338
+  `ls`, `rg`, and the tests.
339
+
340
+## Practical Shortcuts
341
+
342
+- Use `rg --files` and `rg` first; the repo is small enough that this is fast
343
+  and keeps context grounded in the actual tree.
344
+- For current status, start with `src/lib.rs`, `src/main.rs`, `src/args.rs`,
345
+  `tests/common/harness.rs`, and `README.md`.
346
+- For architectural intent, then read `.docs/overview.md` and the relevant
347
+  sprint file.
348
+
349
+That order will save a lot of confusion.