Track planning docs
- SHA
c1785ffed8b1e9d5017483aee68344ac38ef06ec- Parents
-
233258a - Tree
664d871
c1785ff
c1785ffed8b1e9d5017483aee68344ac38ef06ec233258a
664d871| Status | File | + | - |
|---|---|---|---|
| A |
.docs/sprints/closeout0-9.md
|
247 | 0 |
| M |
.docs/sprints/sprint00.md
|
6 | 4 |
| M |
.docs/sprints/sprint01.md
|
6 | 0 |
| M |
.docs/sprints/sprint02.md
|
7 | 0 |
| M |
.docs/sprints/sprint04.md
|
11 | 1 |
| M |
.docs/sprints/sprint08.md
|
13 | 11 |
| A |
AGENTS.md
|
349 | 0 |
.docs/sprints/closeout0-9.mdadded@@ -0,0 +1,247 @@ | ||
| 1 | +# Sprint 0-9 Closeout Checklist | |
| 2 | + | |
| 3 | +Concrete closeout checklist based on the current codebase audit. | |
| 4 | + | |
| 5 | +Current conclusion: we are not ready to honestly declare Sprint 10 complete-in-practice yet. | |
| 6 | +The main blockers are: | |
| 7 | + | |
| 8 | +- Sprint 0's tolerated-diff categories are still deferred until afs-ld can emit real linked output for Mach-O-to-Mach-O differential checks. | |
| 9 | + | |
| 10 | +## Sprint 10 Gate | |
| 11 | + | |
| 12 | +Do not declare "we are on Sprint 10" until all of these are true: | |
| 13 | + | |
| 14 | +- [x] Sprint 9 reloc referents are remapped to atom-aware forms. | |
| 15 | +- [x] Sprint 8 resolution orchestration exists as a real callable stage, not just loose helper APIs. | |
| 16 | +- [x] `cargo test -p afs-ld` is green after the closeout work. | |
| 17 | +- [x] `cargo clippy -p afs-ld --all-targets -- -D warnings` is green after the closeout work. | |
| 18 | +- [x] `README.md` and sprint docs no longer materially misstate the current state of the crate. | |
| 19 | + | |
| 20 | +## Recommended Order | |
| 21 | + | |
| 22 | +- [x] Close Sprint 9 reloc-to-atom remap first. | |
| 23 | +- [x] Close Sprint 8 resolution orchestration and option coverage second. | |
| 24 | +- [x] Close Sprint 6 TBD/SDK search gaps third. | |
| 25 | +- [x] Close Sprint 4 nested archive support fourth. | |
| 26 | +- [ ] Finish the deferred Sprint 0 differential-harness tolerance work once afs-ld can emit real output. | |
| 27 | + | |
| 28 | +## Cross-Sprint Exit Criteria | |
| 29 | + | |
| 30 | +- [ ] Every closeout chunk lands with tests. | |
| 31 | +- [ ] Every bug fix or behavioral gap gets a regression test. | |
| 32 | +- [ ] No newly-discovered roadmap/code mismatch is left undocumented. | |
| 33 | +- [ ] Any user-facing diagnostic we touch stays deterministic and testable. | |
| 34 | + | |
| 35 | +## Sprint 0 | |
| 36 | + | |
| 37 | +Status: closed | |
| 38 | + | |
| 39 | +Validated: | |
| 40 | + | |
| 41 | +- [x] `afs-ld` exists as its own git submodule in the parent workspace. | |
| 42 | +- [x] Parent `Cargo.toml` includes `afs-ld` as a workspace member. | |
| 43 | +- [x] `CLAUDE.md`, `README.md`, crate wiring, and test harness scaffolding exist. | |
| 44 | +- [x] Reference repos are present under parent `.refs/` (`ld64`, `mold`, `lld`). | |
| 45 | +- [x] `tests/reader_empty.rs` enforces the empty-invocation CLI contract. | |
| 46 | +- [x] `tests/diff_harness_sanity.rs` and `tests/diff_harness_finds_critical.rs` exist and pass. | |
| 47 | +- [x] `cargo clippy -p afs-ld --all-targets -- -D warnings` is currently clean. | |
| 48 | + | |
| 49 | +Remaining closeout work: | |
| 50 | + | |
| 51 | +- [x] Explicitly downscope Sprint 0 docs so the current diff harness is described as synthetic until end-to-end linking exists. | |
| 52 | +- [ ] Add tolerated-diff categories once real Mach-O-to-Mach-O comparisons exist. | |
| 53 | + | |
| 54 | +## Sprint 1 | |
| 55 | + | |
| 56 | +Status: closed | |
| 57 | + | |
| 58 | +Validated: | |
| 59 | + | |
| 60 | +- [x] Mach-O constants are duplicated locally in `src/macho/constants.rs`. | |
| 61 | +- [x] `MachHeader64` parsing exists and rejects malformed headers. | |
| 62 | +- [x] Load-command dispatch exists and preserves unknown commands as raw bytes. | |
| 63 | +- [x] Segment and section-header metadata parsing exists. | |
| 64 | +- [x] `LC_BUILD_VERSION` and `LC_LINKER_OPTIMIZATION_HINT` decoding exists. | |
| 65 | +- [x] `--dump` exists through `src/dump.rs` and `src/main.rs`. | |
| 66 | +- [x] Corpus round-trip tests pass in `tests/reader_corpus_round_trip.rs`. | |
| 67 | + | |
| 68 | +Remaining closeout work: | |
| 69 | + | |
| 70 | +- [x] Add an `otool -lV` parity test for dumper output shape across the corpus. | |
| 71 | +- [x] Add a panic-focused malformed-input stress pass beyond the current unit tests so the "no panics on malformed input" claim is defensible. | |
| 72 | + | |
| 73 | +## Sprint 2 | |
| 74 | + | |
| 75 | +Status: closed | |
| 76 | + | |
| 77 | +Validated: | |
| 78 | + | |
| 79 | +- [x] Section classification exists in `src/section.rs`. | |
| 80 | +- [x] `InputSection` carries section data and raw relocation bytes. | |
| 81 | +- [x] `RawNlist` / `InputSymbol` parsing and classification exist in `src/symbol.rs`. | |
| 82 | +- [x] Common symbols, weak flags, private externs, and indirect aliases are surfaced. | |
| 83 | +- [x] `StringTable` exists and handles suffix-dedup overlaps. | |
| 84 | +- [x] `DysymtabCmd` is parsed and exposed through `ObjectFile`. | |
| 85 | +- [x] `ObjectFile` integrates header, commands, sections, symbols, strings, and dysymtab. | |
| 86 | + | |
| 87 | +Remaining closeout work: | |
| 88 | + | |
| 89 | +- [x] Add `nm -a` parity tests for symbol view and classification. | |
| 90 | +- [x] Add `otool -r` parity checks for relocation-offset surfaces promised by Sprint 2, with section/load-command parity covered by the Sprint 1 `otool -lV` gate. | |
| 91 | +- [x] Add stronger malformed-symbol / malformed-string-table stress coverage if we want the "never panics" bar to be explicit. | |
| 92 | + | |
| 93 | +## Sprint 3 | |
| 94 | + | |
| 95 | +Status: closed enough for current closeout | |
| 96 | + | |
| 97 | +Validated: | |
| 98 | + | |
| 99 | +- [x] ARM64 relocation constants exist. | |
| 100 | +- [x] Raw relocation parsing and writing exist. | |
| 101 | +- [x] Fused `Reloc` form exists. | |
| 102 | +- [x] `ADDEND` and `SUBTRACTOR + UNSIGNED` pairing is fused in `parse_relocs`. | |
| 103 | +- [x] Validation logic exists in `validate_relocs`. | |
| 104 | +- [x] Write-side round-trip support exists. | |
| 105 | +- [x] Unit coverage is broad and current corpus relocation round-trips pass. | |
| 106 | + | |
| 107 | +Remaining closeout work: | |
| 108 | + | |
| 109 | +- [x] No audit-blocking work found for Sprint 3. | |
| 110 | + | |
| 111 | +## Sprint 4 | |
| 112 | + | |
| 113 | +Status: closed | |
| 114 | + | |
| 115 | +Validated: | |
| 116 | + | |
| 117 | +- [x] BSD, SysV, and GNU-thin archive flavors are recognized. | |
| 118 | +- [x] Archive headers and name decoding are implemented. | |
| 119 | +- [x] Symbol-index parsing exists for BSD and SysV archives. | |
| 120 | +- [x] Lazy member fetch exists via `fetch_object_defining`. | |
| 121 | +- [x] `libarmfortas_rt.a` is exercised by `tests/archive_runtime.rs`. | |
| 122 | +- [x] Archive dump mode exists via `--dump-archive`. | |
| 123 | + | |
| 124 | +Remaining closeout work: | |
| 125 | + | |
| 126 | +- [x] Implement one-level nested archive support (`.a` member inside `.a`) and preserve provenance for diagnostics. | |
| 127 | +- [x] Formally treat `resolve::force_load_archive` / `force_load_all` as the Sprint 4 completion surface and document that surface instead of adding a parallel archive-only helper. | |
| 128 | +- [x] Add `ar -t` shape/parity coverage for `--dump-archive`. | |
| 129 | + | |
| 130 | +## Sprint 5 | |
| 131 | + | |
| 132 | +Status: partially closed | |
| 133 | + | |
| 134 | +Validated: | |
| 135 | + | |
| 136 | +- [x] `DylibFile` exists and parses binary `MH_DYLIB`. | |
| 137 | +- [x] `LC_ID_DYLIB`, dependency dylib commands, ordinals, and rpaths are decoded. | |
| 138 | +- [x] Export trie decoding exists with cycle/depth protection. | |
| 139 | +- [x] Real clang-built dylib coverage exists in `tests/dylib_integration.rs`. | |
| 140 | +- [x] Dylib dump mode exists via `--dump-dylib`. | |
| 141 | + | |
| 142 | +Remaining closeout work: | |
| 143 | + | |
| 144 | +- [ ] Prove recursive re-export / umbrella lookup behavior with a focused test, not just dependency collection. | |
| 145 | +- [ ] Confirm the public dylib surface matches what Sprint 5 intended for re-exported symbols, not only direct exports. | |
| 146 | + | |
| 147 | +## Sprint 6 | |
| 148 | + | |
| 149 | +Status: closed | |
| 150 | + | |
| 151 | +Validated: | |
| 152 | + | |
| 153 | +- [x] The custom YAML subset parser exists in `src/macho/tbd_yaml.rs`. | |
| 154 | +- [x] TBD schema decoding exists in `src/macho/tbd.rs`. | |
| 155 | +- [x] `DylibFile::from_tbd` exists and materializes TBDs into the same linker-facing surface. | |
| 156 | +- [x] Real `libSystem.tbd` smoke/integration coverage exists in `tests/tbd_smoke.rs` and `tests/tbd_integration.rs`. | |
| 157 | +- [x] TBD dump mode exists via `--dump-tbd`. | |
| 158 | + | |
| 159 | +Remaining closeout work: | |
| 160 | + | |
| 161 | +- [x] Implement SDK `-syslibroot` library search helpers for `.tbd` / `.dylib`. | |
| 162 | +- [x] Implement framework search helpers promised by Sprint 6. | |
| 163 | +- [x] Make target filtering fail loudly when the requested target is not exported, instead of only materializing matching targets when the caller already knows one exists. | |
| 164 | +- [x] No further audit-blocking work found for Sprint 6 in the current helper/test surface. | |
| 165 | + | |
| 166 | +## Sprint 7 | |
| 167 | + | |
| 168 | +Status: closed | |
| 169 | + | |
| 170 | +Validated: | |
| 171 | + | |
| 172 | +- [x] `Symbol` sum type exists with the planned major variants. | |
| 173 | +- [x] `StringInterner`, opaque ids, and `SymbolTable` exist. | |
| 174 | +- [x] The insertion matrix is heavily unit-tested. | |
| 175 | +- [x] Weak/strong/common coalescing behavior is covered in unit tests. | |
| 176 | +- [x] Alias-cycle detection and chain resolution exist. | |
| 177 | +- [x] Transition logging exists. | |
| 178 | + | |
| 179 | +Remaining closeout work: | |
| 180 | + | |
| 181 | +- [ ] Add the differential weak-coalescing / duplicate-behavior coverage against system `ld` that Sprint 7 originally called for. | |
| 182 | + | |
| 183 | +## Sprint 8 | |
| 184 | + | |
| 185 | +Status: closed | |
| 186 | + | |
| 187 | +Validated: | |
| 188 | + | |
| 189 | +- [x] Archive seeding, object seeding, and dylib seeding exist. | |
| 190 | +- [x] Fixed-point archive fetch draining exists. | |
| 191 | +- [x] `force_load_archive` and `force_load_all` helpers exist in `src/resolve.rs`. | |
| 192 | +- [x] Undefined classification exists for `Error`, `Warning`, `Suppress`, and `DynamicLookup`. | |
| 193 | +- [x] Did-you-mean support exists. | |
| 194 | +- [x] Duplicate-symbol and undefined-symbol formatting helpers exist. | |
| 195 | +- [x] Real integration coverage exists for archive pull plus unresolved-symbol reporting. | |
| 196 | + | |
| 197 | +Remaining closeout work: | |
| 198 | + | |
| 199 | +- [x] Add a real orchestration entrypoint for resolution (`seed -> optional force load -> drain -> classify`) that can be called as a coherent stage. | |
| 200 | +- [x] Add option/state plumbing for `all_load`, `force_load`, and undefined treatment so resolution is not just a bag of helper APIs. | |
| 201 | +- [x] Add an archive order-sensitivity test. | |
| 202 | +- [x] Add dedicated tests for `force_load_archive` and `force_load_all`. | |
| 203 | +- [x] Add dedicated tests for `UndefinedTreatment::Warning`, `Suppress`, and `DynamicLookup`. | |
| 204 | +- [x] Add a dedicated test that unresolved weak refs stay accepted regardless of treatment. | |
| 205 | +- [x] Tighten diagnostics toward the Sprint 8 format by carrying section/offset provenance and aggregate repeated relocation sites when available. | |
| 206 | + | |
| 207 | +## Sprint 9 | |
| 208 | + | |
| 209 | +Status: closed | |
| 210 | + | |
| 211 | +Validated: | |
| 212 | + | |
| 213 | +- [x] Atom model and atom table exist. | |
| 214 | +- [x] Section splitting at symbol boundaries exists. | |
| 215 | +- [x] `.alt_entry` folding exists. | |
| 216 | +- [x] CString atom splitting exists and is integration-tested. | |
| 217 | +- [x] Compact-unwind atom splitting and `parent_of` wiring exist. | |
| 218 | +- [x] Backpatching of `Symbol::Defined { atom }` exists. | |
| 219 | +- [x] `N_NO_DEAD_STRIP` and weak-def flags are propagated into atom flags. | |
| 220 | +- [x] Embedded payload addends on symbol-based data relocs are folded into local atom offsets or preserved on external refs. | |
| 221 | + | |
| 222 | +Remaining closeout work: | |
| 223 | + | |
| 224 | +- [x] Remap relocations from raw section/symbol referents into atom-aware referents. | |
| 225 | +- [x] Add atom-local relocation storage or an equivalent per-atom relocation view. | |
| 226 | +- [x] Ensure same-object references point at target atoms, not raw section offsets. | |
| 227 | +- [x] Add a focused integration test proving a local branch or data reference resolves to the callee/target atom. | |
| 228 | +- [x] Add a boundary-crossing reloc diagnostic test. | |
| 229 | +- [x] Confirm no raw section-relative relocation state leaks into Sprint 10 inputs. | |
| 230 | +- [x] No further audit-blocking work found for Sprint 9 in the current corpus and targeted local-addend probes. | |
| 231 | + | |
| 232 | +## Documentation Closeout | |
| 233 | + | |
| 234 | +- [x] Update `README.md` so it no longer says the crate is only Sprint 0 scaffolding. | |
| 235 | +- [x] Refresh sprint docs whose deliverables have been implemented under a different surface than originally planned. | |
| 236 | +- [x] Keep `CLAUDE.md` as the authority for discipline, but make user-facing docs match the actual code. | |
| 237 | + | |
| 238 | +## Verification Commands | |
| 239 | + | |
| 240 | +- [x] `cargo test -p afs-ld` | |
| 241 | +- [x] `cargo clippy -p afs-ld --all-targets -- -D warnings` | |
| 242 | +- [ ] Focused xcrun-backed checks when touching reader/resolve/atom/TBD/dylib paths: | |
| 243 | + - [x] `cargo test -p afs-ld --test reader_corpus_round_trip -- --nocapture` | |
| 244 | + - [x] `cargo test -p afs-ld --test resolve_integration -- --nocapture` | |
| 245 | + - [x] `cargo test -p afs-ld --test atom_integration -- --nocapture` | |
| 246 | + - [x] `cargo test -p afs-ld --test dylib_integration -- --nocapture` | |
| 247 | + - [x] `cargo test -p afs-ld --test tbd_integration -- --nocapture` | |
.docs/sprints/sprint00.mdmodified@@ -88,14 +88,16 @@ pub fn link_both(case: &LinkCase) -> LinkOutputs; | ||
| 88 | 88 | pub fn diff_macho(ours: &[u8], theirs: &[u8]) -> DiffReport; |
| 89 | 89 | ``` |
| 90 | 90 | |
| 91 | -`DiffReport` categorizes byte differences as `Tolerated` (UUID, timestamp, temp-path hashes) or `Critical` (anything else). Critical diffs fail the test. `link_both` shells out to `ld` via `xcrun -f ld` so it picks up the active toolchain. | |
| 91 | +`DiffReport` categorizes byte differences as `Tolerated` (UUID, timestamp, temp-path hashes) or `Critical` (anything else). Critical diffs fail the test. | |
| 92 | + | |
| 93 | +Closeout note: the current Sprint 0 surface is intentionally synthetic. `diff_macho` exists and is tested, but `link_both` remains a placeholder until afs-ld can emit real linked output. That means the current harness validates diff categorization logic, not end-to-end linker parity yet. | |
| 92 | 94 | |
| 93 | 95 | ### 6. Skeleton CLI and first failing test |
| 94 | 96 | |
| 95 | 97 | - `afs-ld/src/args.rs`: hand-rolled argv parser stub that recognizes `-o`, `-e`, `-arch`, and positional inputs. Unknown flags error loudly with a hint. |
| 96 | 98 | - `afs-ld/tests/reader_empty.rs`: attempts to link `0 inputs → empty output`, expects the diagnostic `"afs-ld: error: no input files"`. Passes today by producing that exact string. |
| 97 | -- `afs-ld/tests/diff_harness_sanity.rs`: runs the harness against a known-identical pair (two copies of the same pre-linked binary produced by `xcrun ld`) and expects zero diffs. Passes. | |
| 98 | -- `afs-ld/tests/diff_harness_finds_critical.rs`: feeds the harness two binaries that differ in a non-tolerated byte range (e.g. different text bytes) and asserts the harness reports `Critical`. Passes. | |
| 99 | +- `afs-ld/tests/diff_harness_sanity.rs`: exercises the diff surface against two identical synthetic byte slices and expects zero diffs. Passes. | |
| 100 | +- `afs-ld/tests/diff_harness_finds_critical.rs`: feeds the harness two synthetic binaries that differ in a non-tolerated byte range and asserts the harness reports `Critical`. Passes. | |
| 99 | 101 | |
| 100 | 102 | ## Testing Strategy |
| 101 | 103 | |
@@ -115,5 +117,5 @@ pub fn diff_macho(ours: &[u8], theirs: &[u8]) -> DiffReport; | ||
| 115 | 117 | - `armfortas/Cargo.toml` lists `afs-ld` in `[workspace] members`. |
| 116 | 118 | - `afs-ld/CLAUDE.md`, `README.md`, `Cargo.toml`, `src/lib.rs`, `src/main.rs`, `src/args.rs` all committed in the new repo. |
| 117 | 119 | - `.refs/ld64/` and `.refs/mold/` cloned. |
| 118 | -- Differential harness runs, correctly reports zero diffs on identical binaries, correctly reports critical diffs on intentionally-different binaries. | |
| 120 | +- Differential harness substrate runs, correctly reports zero diffs on identical byte slices, correctly reports critical diffs on intentionally-different byte slices. | |
| 119 | 121 | - `cargo test --workspace` green. |
.docs/sprints/sprint01.mdmodified@@ -6,6 +6,12 @@ Sprint 0 — crate, harness, references in place. | ||
| 6 | 6 | ## Goals |
| 7 | 7 | Read a Mach-O relocatable object file: parse the header and every load command afs-as emits. End state: given any `.o` in `afs-as/tests/corpus/`, afs-ld can pretty-print its structure and round-trip-compare it to a golden. |
| 8 | 8 | |
| 9 | +Closeout note: alongside the original unit coverage, `tests/reader_malformed_stress.rs` | |
| 10 | +now runs deterministic truncated/header-corruption cases over real corpus-built | |
| 11 | +objects to defend the "no panics on malformed input" bar, and | |
| 12 | +`tests/reader_tool_parity.rs` checks the `--dump` load-command surface against | |
| 13 | +`otool -lV` across the afs-as corpus. | |
| 14 | + | |
| 9 | 15 | ## Deliverables |
| 10 | 16 | |
| 11 | 17 | ### 1. Mach-O constants |
.docs/sprints/sprint02.mdmodified@@ -6,6 +6,13 @@ Sprint 1 — header + load commands parsed. | ||
| 6 | 6 | ## Goals |
| 7 | 7 | Decode section payloads, the symbol table (nlist_64), and the string table. Expose the full section/symbol/string model that later sprints build on. |
| 8 | 8 | |
| 9 | +Closeout note: `tests/reader_malformed_stress.rs` now also covers malformed | |
| 10 | +symbol/string-table variants derived from real corpus objects so the reader's | |
| 11 | +symbol and string surfaces are exercised under targeted bad-input cases, not | |
| 12 | +just hand-written unit fixtures. `tests/reader_tool_parity.rs` now also checks | |
| 13 | +symbol classification against `nm -a` and raw relocation tables against | |
| 14 | +`otool -r` across the afs-as corpus. | |
| 15 | + | |
| 9 | 16 | ## Deliverables |
| 10 | 17 | |
| 11 | 18 | ### 1. Section attributes and kinds |
.docs/sprints/sprint04.mdmodified@@ -6,6 +6,13 @@ Sprints 1–3 — Mach-O reading complete. | ||
| 6 | 6 | ## Goals |
| 7 | 7 | Read static archives (`.a`) including the BSD, System V, and GNU-thin variants. Support lazy member fetching: a member is only parsed when an undefined symbol names it. This is the mechanism by which `libarmfortas_rt.a` gets pulled in. |
| 8 | 8 | |
| 9 | +Closeout note: the force-load surface landed in the resolver as | |
| 10 | +`resolve::force_load_archive` / `resolve::force_load_all`, and one-level | |
| 11 | +nested archives are expanded through the fetched-member path with provenance | |
| 12 | +chains such as `outer.a(inner.a)(foo.o)`. `--dump-archive` now intentionally | |
| 13 | +prints the same member listing shape as `ar -t`, and parity is checked | |
| 14 | +against both generated archives and `libarmfortas_rt.a` when available. | |
| 15 | + | |
| 9 | 16 | ## Deliverables |
| 10 | 17 | |
| 11 | 18 | ### 1. Archive format recognizer |
@@ -69,7 +76,10 @@ impl<'a> Archive<'a> { | ||
| 69 | 76 | Returns `None` if the archive does not define `name`. Fetching an archive member memoizes: a second lookup for the same member returns a cached handle. The resolution pass (Sprint 8) is the only caller. |
| 70 | 77 | |
| 71 | 78 | ### 6. `-force_load` / `-all_load` support (semantics, not CLI yet) |
| 72 | -Archive has a `force_all(&mut self)` method that pre-fetches every member. Sprint 19 wires the CLI. | |
| 79 | +Implemented via the resolver-level helpers | |
| 80 | +`resolve::force_load_archive` / `resolve::force_load_all`, which pre-fetch | |
| 81 | +archive members against the live linker input registry. Sprint 19 wires the | |
| 82 | +CLI surface. | |
| 73 | 83 | |
| 74 | 84 | ### 7. Archive-of-archives |
| 75 | 85 | Rare but legal: member can be another `.a`. Recurse one level. If a sub-archive defines `name`, the outer `fetch` returns the sub-member's object file and records a provenance chain for diagnostics. |
.docs/sprints/sprint08.mdmodified@@ -6,6 +6,11 @@ Sprint 7 — `SymbolTable` with insertion semantics. | ||
| 6 | 6 | ## Goals |
| 7 | 7 | Drive the symbol table to a fixed point: every undefined reference either resolves to a Defined (from an object), Common (promoted in BSS), DylibImport (from a dylib/TBD), or raises a clear, actionable diagnostic. `-force_load` / `-all_load` / `-undefined <treatment>` all handled. |
| 8 | 8 | |
| 9 | +Closeout note: the implemented entrypoint is | |
| 10 | +`resolve(inputs, table, opts) -> ResolutionReport`. The current library | |
| 11 | +surface applies archive force-loading as archives are encountered in | |
| 12 | +command-line order so left-to-right archive behavior stays explicit. | |
| 13 | + | |
| 9 | 14 | ## Deliverables |
| 10 | 15 | |
| 11 | 16 | ### 1. Resolution algorithm |
@@ -13,13 +18,10 @@ Drive the symbol table to a fixed point: every undefined reference either resolv | ||
| 13 | 18 | |
| 14 | 19 | ```rust |
| 15 | 20 | pub fn resolve(inputs: &mut Inputs, table: &mut SymbolTable, opts: &LinkOptions) |
| 16 | - -> Result<(), Vec<ResolveError>> | |
| 21 | + -> Result<ResolutionReport, ResolutionError> | |
| 17 | 22 | { |
| 18 | - seed_table_with_objects_and_dylib_imports(inputs, table, opts); | |
| 19 | - if opts.all_load { force_load_everything(inputs, table); } | |
| 20 | - for forced in &opts.force_load { force_load_one(inputs, table, forced); } | |
| 21 | - fixed_point_pull_from_archives(inputs, table); | |
| 22 | - classify_unresolved(table, opts); | |
| 23 | + seed_and_resolve_in_link_order(inputs, table, opts); | |
| 24 | + classify_unresolved(table, opts.undefined_treatment); | |
| 23 | 25 | } |
| 24 | 26 | ``` |
| 25 | 27 | |
@@ -43,7 +45,7 @@ Order matters: armfortas's driver currently passes `<objs> <runtime.a> -lSystem` | ||
| 43 | 45 | ### 4. `-force_load` and `-all_load` |
| 44 | 46 | - `-force_load <archive>`: pull every member of that archive before fixed-point. |
| 45 | 47 | - `-all_load`: pull every member of every archive. |
| 46 | -- Both happen before the fixed-point loop so their transitively-pulled symbols feed into the same fixed point. | |
| 48 | +- In the implemented surface these happen when the named archive is encountered in link order, which preserves left-to-right linker semantics while still feeding the same resolution/classification pipeline. | |
| 47 | 49 | |
| 48 | 50 | ### 5. `-undefined <treatment>` |
| 49 | 51 | After the fixed point, any still-Undefined entry is classified by the `-undefined` setting: |
@@ -60,8 +62,8 @@ Undefined errors must cite every referrer input, not just one. Output format: | ||
| 60 | 62 | |
| 61 | 63 | ``` |
| 62 | 64 | afs-ld: error: undefined symbol: _afs_print |
| 63 | - referenced by program.o(text section + 0x34) | |
| 64 | - referenced by runtime.o(text section + 0x120) | |
| 65 | + referenced by program.o(__TEXT,__text + 0x34) | |
| 66 | + referenced by runtime.o(__TEXT,__text + 0x120) | |
| 65 | 67 | (also via 2 relocations in libarmfortas_rt.a(io.o)) |
| 66 | 68 | Hint: did you mean _afs_print_real? (Levenshtein distance 5) |
| 67 | 69 | ``` |
@@ -71,8 +73,8 @@ Did-you-mean uses a basic Levenshtein-3 search over defined symbols. | ||
| 71 | 73 | ### 8. Diagnostics for duplicate strong |
| 72 | 74 | ``` |
| 73 | 75 | afs-ld: error: duplicate symbol _foo |
| 74 | - defined in: a.o (text + 0x0) | |
| 75 | - also in: b.o (text + 0x0) | |
| 76 | + defined in: a.o (__TEXT,__text + 0x0) | |
| 77 | + also in: b.o (__TEXT,__text + 0x0) | |
| 76 | 78 | ``` |
| 77 | 79 | |
| 78 | 80 | No suggestion — two strong defs is a real ambiguity. |
AGENTS.mdadded@@ -0,0 +1,349 @@ | ||
| 1 | +# AFS-LD | |
| 2 | + | |
| 3 | +Local working guide for agents in `afs-ld`. Keep this file untracked. | |
| 4 | +`CLAUDE.md` is the tracked, authoritative policy file; this document adds a | |
| 5 | +reality-checked snapshot of the current implementation so we do not confuse the | |
| 6 | +roadmap with shipped code. | |
| 7 | + | |
| 8 | +## Repository Context | |
| 9 | + | |
| 10 | +`afs-ld` is the standalone ARM64 Mach-O linker for the ARMFORTAS toolchain. It | |
| 11 | +sits beside `afs-as` as a submodule in the `armfortas` workspace and is meant | |
| 12 | +to replace Apple's `ld` for binaries produced by armfortas. | |
| 13 | + | |
| 14 | +The project boundary is intentionally clean: | |
| 15 | + | |
| 16 | +- `afs-as` emits `MH_OBJECT`. | |
| 17 | +- `afs-ld` reads `.o`, `.a`, `.dylib`, and `.tbd`. | |
| 18 | +- armfortas should eventually hand final linking to `afs-ld` rather than to | |
| 19 | + the system linker. | |
| 20 | + | |
| 21 | +The project is Mach-O only, macOS only, arm64 only, stdlib only. | |
| 22 | + | |
| 23 | +## Definition Of Done | |
| 24 | + | |
| 25 | +The real finish line is not "parses some objects" or "links hello world once." | |
| 26 | +It is parity with Apple's `ld` for the binaries armfortas and fortsh need: | |
| 27 | + | |
| 28 | +- arm64 Mach-O executables and dylibs | |
| 29 | +- static archive linking | |
| 30 | +- dylib and TBD ingestion | |
| 31 | +- dyld metadata that works on real macOS systems | |
| 32 | +- ad-hoc signing so output executes on Apple Silicon | |
| 33 | +- deterministic output | |
| 34 | +- enough correctness to link fortsh without ARM-specific workarounds | |
| 35 | + | |
| 36 | +## Current Reality | |
| 37 | + | |
| 38 | +This repo is ahead of Sprint 0 scaffolding, but it is not yet a full linker. | |
| 39 | +The roadmap in `.docs/overview.md` and `.docs/sprints/` is broader than the | |
| 40 | +code that exists today. | |
| 41 | + | |
| 42 | +What is implemented now: | |
| 43 | + | |
| 44 | +- hand-rolled CLI parsing for a small flag subset plus dump modes | |
| 45 | +- Mach-O header/load-command/section/symbol/string-table reading | |
| 46 | +- relocation parsing, fusion, validation, and round-trip support | |
| 47 | +- archive parsing and lazy member fetch support | |
| 48 | +- binary dylib parsing and export-trie walking | |
| 49 | +- TAPI TBD v4 parsing, including the custom YAML subset parser | |
| 50 | +- linker-side symbol interning, symbol table modeling, and resolution passes | |
| 51 | +- subsections-via-symbols atomization | |
| 52 | +- `--dump`, `--dump-archive`, `--dump-dylib`, and `--dump-tbd` | |
| 53 | + | |
| 54 | +What is not implemented yet: | |
| 55 | + | |
| 56 | +- real `Linker::run` output production | |
| 57 | +- output layout and Mach-O writing | |
| 58 | +- dyld metadata synthesis | |
| 59 | +- code signing | |
| 60 | +- dead-strip / ICF / thunks | |
| 61 | +- real differential linking against Apple `ld` | |
| 62 | +- driver integration with armfortas | |
| 63 | +- the full `ld`-compatible CLI surface described in Sprint 19 | |
| 64 | + | |
| 65 | +Important practical note: | |
| 66 | + | |
| 67 | +- `src/lib.rs` still returns `LinkError::NotYetImplemented` for real link runs. | |
| 68 | +- `tests/common/harness.rs::link_both` still panics because full end-to-end | |
| 69 | + linker execution has not landed. | |
| 70 | +- `README.md` still describes the crate as "Sprint 0 scaffolding only," which is | |
| 71 | + now too pessimistic for the read-side code but still accurate for the actual | |
| 72 | + link-producing path. | |
| 73 | + | |
| 74 | +As of 2026-04-15 in this checkout, `cargo test -p afs-ld` is green. | |
| 75 | + | |
| 76 | +## Strengths | |
| 77 | + | |
| 78 | +- The read-side core is already substantial and well-tested. | |
| 79 | +- The project has strong bespoke discipline: no `clap`, `serde`, `object`, | |
| 80 | + `goblin`, `byteorder`, or other format-parsing shortcuts. | |
| 81 | +- Raw wire structures are modeled explicitly and usually paired with | |
| 82 | + round-trip-oriented tests. | |
| 83 | +- The type modeling is strong: opaque ids, interned strings, explicit symbol | |
| 84 | + states, explicit atom ownership, explicit relocation referents. | |
| 85 | +- Real-world fixtures are already in play: afs-as corpus objects, | |
| 86 | + `libarmfortas_rt.a`, `libSystem.tbd`, and small clang-built dylibs. | |
| 87 | +- The codebase already separates concerns cleanly enough that writer/layout work | |
| 88 | + can land without tearing up the read-side foundation. | |
| 89 | +- Dump modes make inspection easy and are useful while the full writer does not | |
| 90 | + exist yet. | |
| 91 | + | |
| 92 | +## Weaknesses And Risk Areas | |
| 93 | + | |
| 94 | +- The actual link-producing pipeline does not exist yet, so the hardest parity | |
| 95 | + bugs are still ahead of us. | |
| 96 | +- Some tracked docs are aspirational. `.docs/overview.md` is the intended end | |
| 97 | + state, not a guarantee that every listed module already exists. | |
| 98 | +- `README.md` is stale in the opposite direction: it understates how much | |
| 99 | + read-side work has landed. | |
| 100 | +- The current diagnostics surface is still minimal. `src/diag.rs` only prints | |
| 101 | + `afs-ld: error: ...`; the richer caret diagnostics are planned, not present. | |
| 102 | +- The CLI surface is intentionally tiny right now. Any work that assumes | |
| 103 | + `ld`-compatibility must start by checking `src/args.rs`, not by trusting the | |
| 104 | + sprint plan. | |
| 105 | +- Performance characteristics are mostly unknown because the writer, layout, and | |
| 106 | + full-link path are not in place yet. | |
| 107 | +- The differential harness is only half-built: the diff engine exists, but the | |
| 108 | + "run both linkers" machinery is not wired. | |
| 109 | +- Several future modules named in the roadmap do not exist yet: | |
| 110 | + `layout.rs`, `driver.rs`, `map.rs`, `gc.rs`, `icf.rs`, `synth/`, | |
| 111 | + `macho/writer.rs`, and the code-signing path are all still planned work. | |
| 112 | + | |
| 113 | +## Build And Test | |
| 114 | + | |
| 115 | +Primary commands: | |
| 116 | + | |
| 117 | +```bash | |
| 118 | +cargo build -p afs-ld | |
| 119 | +cargo test -p afs-ld | |
| 120 | +cargo clippy -p afs-ld --all-targets -- -D warnings | |
| 121 | +``` | |
| 122 | + | |
| 123 | +Useful targeted commands: | |
| 124 | + | |
| 125 | +```bash | |
| 126 | +cargo test --lib -p afs-ld | |
| 127 | +cargo test --test reader_corpus_round_trip -p afs-ld | |
| 128 | +cargo test --test archive_runtime -p afs-ld | |
| 129 | +cargo test --test dylib_integration -p afs-ld | |
| 130 | +cargo test --test tbd_integration -p afs-ld | |
| 131 | +cargo test --test resolve_integration -p afs-ld | |
| 132 | +cargo test --test atom_integration -p afs-ld | |
| 133 | +cargo test -p afs-ld -- <substring> | |
| 134 | +``` | |
| 135 | + | |
| 136 | +Environment assumptions: | |
| 137 | + | |
| 138 | +- macOS on Apple Silicon | |
| 139 | +- Xcode command-line tools available through `xcrun` | |
| 140 | +- access to the parent workspace, especially `runtime/` and `.refs/` | |
| 141 | + | |
| 142 | +Integration tests already shell out to system tools in a few places. Do not | |
| 143 | +replace those with fake fixtures if a real toolchain interaction is the thing | |
| 144 | +being tested. | |
| 145 | + | |
| 146 | +## Project Structure | |
| 147 | + | |
| 148 | +Actual source tree today: | |
| 149 | + | |
| 150 | +```text | |
| 151 | +afs-ld/ | |
| 152 | +├── CLAUDE.md | |
| 153 | +├── README.md | |
| 154 | +├── .docs/ | |
| 155 | +│ ├── overview.md | |
| 156 | +│ └── sprints/ | |
| 157 | +├── src/ | |
| 158 | +│ ├── archive.rs | |
| 159 | +│ ├── args.rs | |
| 160 | +│ ├── atom.rs | |
| 161 | +│ ├── diag.rs | |
| 162 | +│ ├── dump.rs | |
| 163 | +│ ├── input.rs | |
| 164 | +│ ├── leb.rs | |
| 165 | +│ ├── lib.rs | |
| 166 | +│ ├── main.rs | |
| 167 | +│ ├── resolve.rs | |
| 168 | +│ ├── section.rs | |
| 169 | +│ ├── string_table.rs | |
| 170 | +│ ├── symbol.rs | |
| 171 | +│ ├── macho/ | |
| 172 | +│ │ ├── constants.rs | |
| 173 | +│ │ ├── dylib.rs | |
| 174 | +│ │ ├── exports.rs | |
| 175 | +│ │ ├── reader.rs | |
| 176 | +│ │ ├── tbd.rs | |
| 177 | +│ │ └── tbd_yaml.rs | |
| 178 | +│ └── reloc/ | |
| 179 | +│ └── mod.rs | |
| 180 | +└── tests/ | |
| 181 | + ├── common/harness.rs | |
| 182 | + ├── archive_runtime.rs | |
| 183 | + ├── atom_integration.rs | |
| 184 | + ├── diff_harness_*.rs | |
| 185 | + ├── dylib_integration.rs | |
| 186 | + ├── reader_*.rs | |
| 187 | + ├── resolve_integration.rs | |
| 188 | + ├── tbd_*.rs | |
| 189 | + └── reader_corpus_round_trip.rs | |
| 190 | +``` | |
| 191 | + | |
| 192 | +Planned future modules listed in the docs should be treated as design intent, | |
| 193 | +not as present-tense implementation. | |
| 194 | + | |
| 195 | +## Implemented Pipeline Vs Planned Pipeline | |
| 196 | + | |
| 197 | +Implemented today: | |
| 198 | + | |
| 199 | +```text | |
| 200 | +argv | |
| 201 | + -> args.rs | |
| 202 | + -> dump/read paths | |
| 203 | + -> archive/object/dylib/TBD ingestion | |
| 204 | + -> symbol/section/reloc decoding | |
| 205 | + -> resolve.rs | |
| 206 | + -> atom.rs | |
| 207 | +``` | |
| 208 | + | |
| 209 | +Current real-link path: | |
| 210 | + | |
| 211 | +```text | |
| 212 | +argv -> args.rs -> Linker::run -> NotYetImplemented | |
| 213 | +``` | |
| 214 | + | |
| 215 | +Planned end-to-end pipeline from the roadmap: | |
| 216 | + | |
| 217 | +```text | |
| 218 | +args -> inputs -> resolve -> atomize -> layout -> apply relocs | |
| 219 | + -> synth sections -> write -> sign | |
| 220 | +``` | |
| 221 | + | |
| 222 | +When you are planning work, always identify which of those stages is real in | |
| 223 | +this checkout and which stage is still only described in docs. | |
| 224 | + | |
| 225 | +## Development Guidance | |
| 226 | + | |
| 227 | +### 1. Trust code and tests over roadmap prose | |
| 228 | + | |
| 229 | +Read these in order before substantial work: | |
| 230 | + | |
| 231 | +1. `CLAUDE.md` | |
| 232 | +2. `.docs/overview.md` | |
| 233 | +3. the relevant sprint file in `.docs/sprints/` | |
| 234 | +4. the actual Rust module you will touch | |
| 235 | +5. the tests covering that module | |
| 236 | + | |
| 237 | +If the docs and the code disagree, treat the code plus tests as the truth about | |
| 238 | +what exists today, then decide whether the docs need to be refreshed. | |
| 239 | + | |
| 240 | +### 2. Keep the bespoke contract intact | |
| 241 | + | |
| 242 | +- Stdlib only unless a dependency discussion happens first. | |
| 243 | +- Do not couple afs-ld to afs-as at a Rust type level. | |
| 244 | +- Duplicate Mach-O constants locally when needed. | |
| 245 | +- Do not hide format details behind clever abstractions that erase wire truth. | |
| 246 | + | |
| 247 | +### 3. Preserve the wire | |
| 248 | + | |
| 249 | +- Keep raw bytes or raw fields accessible when lossless re-emission matters. | |
| 250 | +- Prefer explicit parse and write pairs for on-disk structures. | |
| 251 | +- Avoid converting fixed-size or padded wire data into lossy higher-level forms | |
| 252 | + unless the raw representation is still available somewhere. | |
| 253 | +- If a new decoder lands, pair it with tests that prove it round-trips or at | |
| 254 | + least preserves the exact bytes relevant to the current stage. | |
| 255 | + | |
| 256 | +### 4. Be explicit about incomplete work | |
| 257 | + | |
| 258 | +- Hard errors are better than silent wrong answers. | |
| 259 | +- If something is not implemented, say so directly. | |
| 260 | +- Do not introduce "temporary" behavior that quietly emits malformed Mach-O. | |
| 261 | +- Do not soften a missing feature into a no-op unless the flag or structure is | |
| 262 | + explicitly intended to be ignored. | |
| 263 | + | |
| 264 | +### 5. Exhaustive matches matter | |
| 265 | + | |
| 266 | +- Prefer enums for wire forms and linker-side states. | |
| 267 | +- Avoid catch-all `_` arms in production matches when a new variant should force | |
| 268 | + the compiler to help us. | |
| 269 | +- When adding a new variant, update every relevant match deliberately. | |
| 270 | + | |
| 271 | +### 6. Keep dump surfaces useful | |
| 272 | + | |
| 273 | +- `--dump*` modes are an active debugging tool, not a side feature. | |
| 274 | +- When new reader functionality lands, extend the corresponding dump output. | |
| 275 | +- If you add a new parsed field but the dump cannot show it, the repo loses one | |
| 276 | + of its best inspection surfaces. | |
| 277 | + | |
| 278 | +### 7. Respect deterministic behavior | |
| 279 | + | |
| 280 | +- Avoid nondeterministic iteration when output order matters. | |
| 281 | +- Avoid timestamps, random ids, or unstable hashing in any future write path. | |
| 282 | +- When adding diagnostics, keep them stable and testable. | |
| 283 | + | |
| 284 | +## Testing Practices | |
| 285 | + | |
| 286 | +- Every bug fix gets a regression test. | |
| 287 | +- New parser behavior should land with unit tests close to the module. | |
| 288 | +- When touching integration behavior, prefer real fixtures over mocked ones. | |
| 289 | +- For archive work, look first at `tests/archive_runtime.rs`. | |
| 290 | +- For dylib and TBD work, look first at `tests/dylib_integration.rs`, | |
| 291 | + `tests/tbd_integration.rs`, and `tests/tbd_smoke.rs`. | |
| 292 | +- For reader invariants, `tests/reader_corpus_round_trip.rs` is a key guardrail. | |
| 293 | +- For resolution and atomization, `tests/resolve_integration.rs` and | |
| 294 | + `tests/atom_integration.rs` should move with the code. | |
| 295 | +- If you add future write-side functionality, extend the differential harness | |
| 296 | + rather than building a parallel ad hoc test path. | |
| 297 | + | |
| 298 | +Run focused tests first, then widen: | |
| 299 | + | |
| 300 | +- module-local or single integration test while developing | |
| 301 | +- `cargo test -p afs-ld` before handing work off | |
| 302 | +- `cargo clippy -p afs-ld --all-targets -- -D warnings` when changing code paths | |
| 303 | + broadly enough to justify it | |
| 304 | + | |
| 305 | +## Documentation Practices | |
| 306 | + | |
| 307 | +- `CLAUDE.md` is policy and development discipline. | |
| 308 | +- `.docs/overview.md` is the intended architecture and scope. | |
| 309 | +- `.docs/sprints/` is the staged roadmap. | |
| 310 | +- `README.md` is user-facing and currently stale relative to the read-side code. | |
| 311 | + | |
| 312 | +When a change materially shifts reality, update the tracked docs that are now | |
| 313 | +misleading. This is especially important in this repo because the roadmap is | |
| 314 | +ambitious and can otherwise create false assumptions for future work. | |
| 315 | + | |
| 316 | +## References | |
| 317 | + | |
| 318 | +Use the parent repository's references when you need to confirm Mach-O or linker | |
| 319 | +behavior instead of inventing from memory: | |
| 320 | + | |
| 321 | +- `.refs/llvm/lld/MachO/` for architecture and pass structure | |
| 322 | +- `.refs/ld64/` for Apple-parity edge cases | |
| 323 | +- `.refs/mold/` for performance ideas and comparative implementation choices | |
| 324 | + | |
| 325 | +Also use Apple's Mach-O and arm64 relocation headers as the numeric source of | |
| 326 | +truth for constants mirrored in `src/macho/constants.rs`. | |
| 327 | + | |
| 328 | +## Working Style For This Repo | |
| 329 | + | |
| 330 | +- Prefer small, reviewable changes. | |
| 331 | +- Keep commit messages terse and imperative. | |
| 332 | +- Do not mention sprint numbers in commit subjects. | |
| 333 | +- Avoid monolithic "land the whole linker" changes; the sprint plan is granular | |
| 334 | + for a reason. | |
| 335 | +- Before implementing a planned module from the roadmap, make sure the crate | |
| 336 | + actually has the prerequisites the sprint assumed. | |
| 337 | +- If you are about to say "the docs say this exists," stop and confirm with | |
| 338 | + `ls`, `rg`, and the tests. | |
| 339 | + | |
| 340 | +## Practical Shortcuts | |
| 341 | + | |
| 342 | +- Use `rg --files` and `rg` first; the repo is small enough that this is fast | |
| 343 | + and keeps context grounded in the actual tree. | |
| 344 | +- For current status, start with `src/lib.rs`, `src/main.rs`, `src/args.rs`, | |
| 345 | + `tests/common/harness.rs`, and `README.md`. | |
| 346 | +- For architectural intent, then read `.docs/overview.md` and the relevant | |
| 347 | + sprint file. | |
| 348 | + | |
| 349 | +That order will save a lot of confusion. | |