markdown · 12652 bytes Raw Blame History

AFS-LD

Local working guide for agents in afs-ld. Keep this file untracked. CLAUDE.md is the tracked, authoritative policy file; this document adds a reality-checked snapshot of the current implementation so we do not confuse the roadmap with shipped code.

Repository Context

afs-ld is the standalone ARM64 Mach-O linker for the ARMFORTAS toolchain. It sits beside afs-as as a submodule in the armfortas workspace and is meant to replace Apple's ld for binaries produced by armfortas.

The project boundary is intentionally clean:

  • afs-as emits MH_OBJECT.
  • afs-ld reads .o, .a, .dylib, and .tbd.
  • armfortas should eventually hand final linking to afs-ld rather than to the system linker.

The project is Mach-O only, macOS only, arm64 only, stdlib only.

Definition Of Done

The real finish line is not "parses some objects" or "links hello world once." It is parity with Apple's ld for the binaries armfortas and fortsh need:

  • arm64 Mach-O executables and dylibs
  • static archive linking
  • dylib and TBD ingestion
  • dyld metadata that works on real macOS systems
  • ad-hoc signing so output executes on Apple Silicon
  • deterministic output
  • enough correctness to link fortsh without ARM-specific workarounds

Current Reality

This repo is ahead of Sprint 0 scaffolding, but it is not yet a full linker. The roadmap in .docs/overview.md and .docs/sprints/ is broader than the code that exists today.

What is implemented now:

  • hand-rolled CLI parsing for a small flag subset plus dump modes
  • Mach-O header/load-command/section/symbol/string-table reading
  • relocation parsing, fusion, validation, and round-trip support
  • archive parsing and lazy member fetch support
  • binary dylib parsing and export-trie walking
  • TAPI TBD v4 parsing, including the custom YAML subset parser
  • linker-side symbol interning, symbol table modeling, and resolution passes
  • subsections-via-symbols atomization
  • --dump, --dump-archive, --dump-dylib, and --dump-tbd

What is not implemented yet:

  • real Linker::run output production
  • output layout and Mach-O writing
  • dyld metadata synthesis
  • code signing
  • dead-strip / ICF / thunks
  • real differential linking against Apple ld
  • driver integration with armfortas
  • the full ld-compatible CLI surface described in Sprint 19

Important practical note:

  • src/lib.rs still returns LinkError::NotYetImplemented for real link runs.
  • tests/common/harness.rs::link_both still panics because full end-to-end linker execution has not landed.
  • README.md still describes the crate as "Sprint 0 scaffolding only," which is now too pessimistic for the read-side code but still accurate for the actual link-producing path.

As of 2026-04-15 in this checkout, cargo test -p afs-ld is green.

Strengths

  • The read-side core is already substantial and well-tested.
  • The project has strong bespoke discipline: no clap, serde, object, goblin, byteorder, or other format-parsing shortcuts.
  • Raw wire structures are modeled explicitly and usually paired with round-trip-oriented tests.
  • The type modeling is strong: opaque ids, interned strings, explicit symbol states, explicit atom ownership, explicit relocation referents.
  • Real-world fixtures are already in play: afs-as corpus objects, libarmfortas_rt.a, libSystem.tbd, and small clang-built dylibs.
  • The codebase already separates concerns cleanly enough that writer/layout work can land without tearing up the read-side foundation.
  • Dump modes make inspection easy and are useful while the full writer does not exist yet.

Weaknesses And Risk Areas

  • The actual link-producing pipeline does not exist yet, so the hardest parity bugs are still ahead of us.
  • Some tracked docs are aspirational. .docs/overview.md is the intended end state, not a guarantee that every listed module already exists.
  • README.md is stale in the opposite direction: it understates how much read-side work has landed.
  • The current diagnostics surface is still minimal. src/diag.rs only prints afs-ld: error: ...; the richer caret diagnostics are planned, not present.
  • The CLI surface is intentionally tiny right now. Any work that assumes ld-compatibility must start by checking src/args.rs, not by trusting the sprint plan.
  • Performance characteristics are mostly unknown because the writer, layout, and full-link path are not in place yet.
  • The differential harness is only half-built: the diff engine exists, but the "run both linkers" machinery is not wired.
  • Several future modules named in the roadmap do not exist yet: layout.rs, driver.rs, map.rs, gc.rs, icf.rs, synth/, macho/writer.rs, and the code-signing path are all still planned work.

Build And Test

Primary commands:

cargo build -p afs-ld
cargo test -p afs-ld
cargo clippy -p afs-ld --all-targets -- -D warnings

Useful targeted commands:

cargo test --lib -p afs-ld
cargo test --test reader_corpus_round_trip -p afs-ld
cargo test --test archive_runtime -p afs-ld
cargo test --test dylib_integration -p afs-ld
cargo test --test tbd_integration -p afs-ld
cargo test --test resolve_integration -p afs-ld
cargo test --test atom_integration -p afs-ld
cargo test -p afs-ld -- <substring>

Environment assumptions:

  • macOS on Apple Silicon
  • Xcode command-line tools available through xcrun
  • access to the parent workspace, especially runtime/ and .refs/

Integration tests already shell out to system tools in a few places. Do not replace those with fake fixtures if a real toolchain interaction is the thing being tested.

Project Structure

Actual source tree today:

afs-ld/
├── CLAUDE.md
├── README.md
├── .docs/
│   ├── overview.md
│   └── sprints/
├── src/
│   ├── archive.rs
│   ├── args.rs
│   ├── atom.rs
│   ├── diag.rs
│   ├── dump.rs
│   ├── input.rs
│   ├── leb.rs
│   ├── lib.rs
│   ├── main.rs
│   ├── resolve.rs
│   ├── section.rs
│   ├── string_table.rs
│   ├── symbol.rs
│   ├── macho/
│   │   ├── constants.rs
│   │   ├── dylib.rs
│   │   ├── exports.rs
│   │   ├── reader.rs
│   │   ├── tbd.rs
│   │   └── tbd_yaml.rs
│   └── reloc/
│       └── mod.rs
└── tests/
    ├── common/harness.rs
    ├── archive_runtime.rs
    ├── atom_integration.rs
    ├── diff_harness_*.rs
    ├── dylib_integration.rs
    ├── reader_*.rs
    ├── resolve_integration.rs
    ├── tbd_*.rs
    └── reader_corpus_round_trip.rs

Planned future modules listed in the docs should be treated as design intent, not as present-tense implementation.

Implemented Pipeline Vs Planned Pipeline

Implemented today:

argv
  -> args.rs
  -> dump/read paths
  -> archive/object/dylib/TBD ingestion
  -> symbol/section/reloc decoding
  -> resolve.rs
  -> atom.rs

Current real-link path:

argv -> args.rs -> Linker::run -> NotYetImplemented

Planned end-to-end pipeline from the roadmap:

args -> inputs -> resolve -> atomize -> layout -> apply relocs
     -> synth sections -> write -> sign

When you are planning work, always identify which of those stages is real in this checkout and which stage is still only described in docs.

Development Guidance

1. Trust code and tests over roadmap prose

Read these in order before substantial work:

  1. CLAUDE.md
  2. .docs/overview.md
  3. the relevant sprint file in .docs/sprints/
  4. the actual Rust module you will touch
  5. the tests covering that module

If the docs and the code disagree, treat the code plus tests as the truth about what exists today, then decide whether the docs need to be refreshed.

2. Keep the bespoke contract intact

  • Stdlib only unless a dependency discussion happens first.
  • Do not couple afs-ld to afs-as at a Rust type level.
  • Duplicate Mach-O constants locally when needed.
  • Do not hide format details behind clever abstractions that erase wire truth.

3. Preserve the wire

  • Keep raw bytes or raw fields accessible when lossless re-emission matters.
  • Prefer explicit parse and write pairs for on-disk structures.
  • Avoid converting fixed-size or padded wire data into lossy higher-level forms unless the raw representation is still available somewhere.
  • If a new decoder lands, pair it with tests that prove it round-trips or at least preserves the exact bytes relevant to the current stage.

4. Be explicit about incomplete work

  • Hard errors are better than silent wrong answers.
  • If something is not implemented, say so directly.
  • Do not introduce "temporary" behavior that quietly emits malformed Mach-O.
  • Do not soften a missing feature into a no-op unless the flag or structure is explicitly intended to be ignored.

5. Exhaustive matches matter

  • Prefer enums for wire forms and linker-side states.
  • Avoid catch-all _ arms in production matches when a new variant should force the compiler to help us.
  • When adding a new variant, update every relevant match deliberately.

6. Keep dump surfaces useful

  • --dump* modes are an active debugging tool, not a side feature.
  • When new reader functionality lands, extend the corresponding dump output.
  • If you add a new parsed field but the dump cannot show it, the repo loses one of its best inspection surfaces.

7. Respect deterministic behavior

  • Avoid nondeterministic iteration when output order matters.
  • Avoid timestamps, random ids, or unstable hashing in any future write path.
  • When adding diagnostics, keep them stable and testable.

Testing Practices

  • Every bug fix gets a regression test.
  • New parser behavior should land with unit tests close to the module.
  • When touching integration behavior, prefer real fixtures over mocked ones.
  • For archive work, look first at tests/archive_runtime.rs.
  • For dylib and TBD work, look first at tests/dylib_integration.rs, tests/tbd_integration.rs, and tests/tbd_smoke.rs.
  • For reader invariants, tests/reader_corpus_round_trip.rs is a key guardrail.
  • For resolution and atomization, tests/resolve_integration.rs and tests/atom_integration.rs should move with the code.
  • If you add future write-side functionality, extend the differential harness rather than building a parallel ad hoc test path.

Run focused tests first, then widen:

  • module-local or single integration test while developing
  • cargo test -p afs-ld before handing work off
  • cargo clippy -p afs-ld --all-targets -- -D warnings when changing code paths broadly enough to justify it

Documentation Practices

  • CLAUDE.md is policy and development discipline.
  • .docs/overview.md is the intended architecture and scope.
  • .docs/sprints/ is the staged roadmap.
  • README.md is user-facing and currently stale relative to the read-side code.

When a change materially shifts reality, update the tracked docs that are now misleading. This is especially important in this repo because the roadmap is ambitious and can otherwise create false assumptions for future work.

References

Use the parent repository's references when you need to confirm Mach-O or linker behavior instead of inventing from memory:

  • .refs/llvm/lld/MachO/ for architecture and pass structure
  • .refs/ld64/ for Apple-parity edge cases
  • .refs/mold/ for performance ideas and comparative implementation choices

Also use Apple's Mach-O and arm64 relocation headers as the numeric source of truth for constants mirrored in src/macho/constants.rs.

Working Style For This Repo

  • Prefer small, reviewable changes.
  • Keep commit messages terse and imperative.
  • Do not mention sprint numbers in commit subjects.
  • Avoid monolithic "land the whole linker" changes; the sprint plan is granular for a reason.
  • Before implementing a planned module from the roadmap, make sure the crate actually has the prerequisites the sprint assumed.
  • If you are about to say "the docs say this exists," stop and confirm with ls, rg, and the tests.

Practical Shortcuts

  • Use rg --files and rg first; the repo is small enough that this is fast and keeps context grounded in the actual tree.
  • For current status, start with src/lib.rs, src/main.rs, src/args.rs, tests/common/harness.rs, and README.md.
  • For architectural intent, then read .docs/overview.md and the relevant sprint file.

That order will save a lot of confusion.

View source
1 # AFS-LD
2
3 Local working guide for agents in `afs-ld`. Keep this file untracked.
4 `CLAUDE.md` is the tracked, authoritative policy file; this document adds a
5 reality-checked snapshot of the current implementation so we do not confuse the
6 roadmap with shipped code.
7
8 ## Repository Context
9
10 `afs-ld` is the standalone ARM64 Mach-O linker for the ARMFORTAS toolchain. It
11 sits beside `afs-as` as a submodule in the `armfortas` workspace and is meant
12 to replace Apple's `ld` for binaries produced by armfortas.
13
14 The project boundary is intentionally clean:
15
16 - `afs-as` emits `MH_OBJECT`.
17 - `afs-ld` reads `.o`, `.a`, `.dylib`, and `.tbd`.
18 - armfortas should eventually hand final linking to `afs-ld` rather than to
19 the system linker.
20
21 The project is Mach-O only, macOS only, arm64 only, stdlib only.
22
23 ## Definition Of Done
24
25 The real finish line is not "parses some objects" or "links hello world once."
26 It is parity with Apple's `ld` for the binaries armfortas and fortsh need:
27
28 - arm64 Mach-O executables and dylibs
29 - static archive linking
30 - dylib and TBD ingestion
31 - dyld metadata that works on real macOS systems
32 - ad-hoc signing so output executes on Apple Silicon
33 - deterministic output
34 - enough correctness to link fortsh without ARM-specific workarounds
35
36 ## Current Reality
37
38 This repo is ahead of Sprint 0 scaffolding, but it is not yet a full linker.
39 The roadmap in `.docs/overview.md` and `.docs/sprints/` is broader than the
40 code that exists today.
41
42 What is implemented now:
43
44 - hand-rolled CLI parsing for a small flag subset plus dump modes
45 - Mach-O header/load-command/section/symbol/string-table reading
46 - relocation parsing, fusion, validation, and round-trip support
47 - archive parsing and lazy member fetch support
48 - binary dylib parsing and export-trie walking
49 - TAPI TBD v4 parsing, including the custom YAML subset parser
50 - linker-side symbol interning, symbol table modeling, and resolution passes
51 - subsections-via-symbols atomization
52 - `--dump`, `--dump-archive`, `--dump-dylib`, and `--dump-tbd`
53
54 What is not implemented yet:
55
56 - real `Linker::run` output production
57 - output layout and Mach-O writing
58 - dyld metadata synthesis
59 - code signing
60 - dead-strip / ICF / thunks
61 - real differential linking against Apple `ld`
62 - driver integration with armfortas
63 - the full `ld`-compatible CLI surface described in Sprint 19
64
65 Important practical note:
66
67 - `src/lib.rs` still returns `LinkError::NotYetImplemented` for real link runs.
68 - `tests/common/harness.rs::link_both` still panics because full end-to-end
69 linker execution has not landed.
70 - `README.md` still describes the crate as "Sprint 0 scaffolding only," which is
71 now too pessimistic for the read-side code but still accurate for the actual
72 link-producing path.
73
74 As of 2026-04-15 in this checkout, `cargo test -p afs-ld` is green.
75
76 ## Strengths
77
78 - The read-side core is already substantial and well-tested.
79 - The project has strong bespoke discipline: no `clap`, `serde`, `object`,
80 `goblin`, `byteorder`, or other format-parsing shortcuts.
81 - Raw wire structures are modeled explicitly and usually paired with
82 round-trip-oriented tests.
83 - The type modeling is strong: opaque ids, interned strings, explicit symbol
84 states, explicit atom ownership, explicit relocation referents.
85 - Real-world fixtures are already in play: afs-as corpus objects,
86 `libarmfortas_rt.a`, `libSystem.tbd`, and small clang-built dylibs.
87 - The codebase already separates concerns cleanly enough that writer/layout work
88 can land without tearing up the read-side foundation.
89 - Dump modes make inspection easy and are useful while the full writer does not
90 exist yet.
91
92 ## Weaknesses And Risk Areas
93
94 - The actual link-producing pipeline does not exist yet, so the hardest parity
95 bugs are still ahead of us.
96 - Some tracked docs are aspirational. `.docs/overview.md` is the intended end
97 state, not a guarantee that every listed module already exists.
98 - `README.md` is stale in the opposite direction: it understates how much
99 read-side work has landed.
100 - The current diagnostics surface is still minimal. `src/diag.rs` only prints
101 `afs-ld: error: ...`; the richer caret diagnostics are planned, not present.
102 - The CLI surface is intentionally tiny right now. Any work that assumes
103 `ld`-compatibility must start by checking `src/args.rs`, not by trusting the
104 sprint plan.
105 - Performance characteristics are mostly unknown because the writer, layout, and
106 full-link path are not in place yet.
107 - The differential harness is only half-built: the diff engine exists, but the
108 "run both linkers" machinery is not wired.
109 - Several future modules named in the roadmap do not exist yet:
110 `layout.rs`, `driver.rs`, `map.rs`, `gc.rs`, `icf.rs`, `synth/`,
111 `macho/writer.rs`, and the code-signing path are all still planned work.
112
113 ## Build And Test
114
115 Primary commands:
116
117 ```bash
118 cargo build -p afs-ld
119 cargo test -p afs-ld
120 cargo clippy -p afs-ld --all-targets -- -D warnings
121 ```
122
123 Useful targeted commands:
124
125 ```bash
126 cargo test --lib -p afs-ld
127 cargo test --test reader_corpus_round_trip -p afs-ld
128 cargo test --test archive_runtime -p afs-ld
129 cargo test --test dylib_integration -p afs-ld
130 cargo test --test tbd_integration -p afs-ld
131 cargo test --test resolve_integration -p afs-ld
132 cargo test --test atom_integration -p afs-ld
133 cargo test -p afs-ld -- <substring>
134 ```
135
136 Environment assumptions:
137
138 - macOS on Apple Silicon
139 - Xcode command-line tools available through `xcrun`
140 - access to the parent workspace, especially `runtime/` and `.refs/`
141
142 Integration tests already shell out to system tools in a few places. Do not
143 replace those with fake fixtures if a real toolchain interaction is the thing
144 being tested.
145
146 ## Project Structure
147
148 Actual source tree today:
149
150 ```text
151 afs-ld/
152 ├── CLAUDE.md
153 ├── README.md
154 ├── .docs/
155 │ ├── overview.md
156 │ └── sprints/
157 ├── src/
158 │ ├── archive.rs
159 │ ├── args.rs
160 │ ├── atom.rs
161 │ ├── diag.rs
162 │ ├── dump.rs
163 │ ├── input.rs
164 │ ├── leb.rs
165 │ ├── lib.rs
166 │ ├── main.rs
167 │ ├── resolve.rs
168 │ ├── section.rs
169 │ ├── string_table.rs
170 │ ├── symbol.rs
171 │ ├── macho/
172 │ │ ├── constants.rs
173 │ │ ├── dylib.rs
174 │ │ ├── exports.rs
175 │ │ ├── reader.rs
176 │ │ ├── tbd.rs
177 │ │ └── tbd_yaml.rs
178 │ └── reloc/
179 │ └── mod.rs
180 └── tests/
181 ├── common/harness.rs
182 ├── archive_runtime.rs
183 ├── atom_integration.rs
184 ├── diff_harness_*.rs
185 ├── dylib_integration.rs
186 ├── reader_*.rs
187 ├── resolve_integration.rs
188 ├── tbd_*.rs
189 └── reader_corpus_round_trip.rs
190 ```
191
192 Planned future modules listed in the docs should be treated as design intent,
193 not as present-tense implementation.
194
195 ## Implemented Pipeline Vs Planned Pipeline
196
197 Implemented today:
198
199 ```text
200 argv
201 -> args.rs
202 -> dump/read paths
203 -> archive/object/dylib/TBD ingestion
204 -> symbol/section/reloc decoding
205 -> resolve.rs
206 -> atom.rs
207 ```
208
209 Current real-link path:
210
211 ```text
212 argv -> args.rs -> Linker::run -> NotYetImplemented
213 ```
214
215 Planned end-to-end pipeline from the roadmap:
216
217 ```text
218 args -> inputs -> resolve -> atomize -> layout -> apply relocs
219 -> synth sections -> write -> sign
220 ```
221
222 When you are planning work, always identify which of those stages is real in
223 this checkout and which stage is still only described in docs.
224
225 ## Development Guidance
226
227 ### 1. Trust code and tests over roadmap prose
228
229 Read these in order before substantial work:
230
231 1. `CLAUDE.md`
232 2. `.docs/overview.md`
233 3. the relevant sprint file in `.docs/sprints/`
234 4. the actual Rust module you will touch
235 5. the tests covering that module
236
237 If the docs and the code disagree, treat the code plus tests as the truth about
238 what exists today, then decide whether the docs need to be refreshed.
239
240 ### 2. Keep the bespoke contract intact
241
242 - Stdlib only unless a dependency discussion happens first.
243 - Do not couple afs-ld to afs-as at a Rust type level.
244 - Duplicate Mach-O constants locally when needed.
245 - Do not hide format details behind clever abstractions that erase wire truth.
246
247 ### 3. Preserve the wire
248
249 - Keep raw bytes or raw fields accessible when lossless re-emission matters.
250 - Prefer explicit parse and write pairs for on-disk structures.
251 - Avoid converting fixed-size or padded wire data into lossy higher-level forms
252 unless the raw representation is still available somewhere.
253 - If a new decoder lands, pair it with tests that prove it round-trips or at
254 least preserves the exact bytes relevant to the current stage.
255
256 ### 4. Be explicit about incomplete work
257
258 - Hard errors are better than silent wrong answers.
259 - If something is not implemented, say so directly.
260 - Do not introduce "temporary" behavior that quietly emits malformed Mach-O.
261 - Do not soften a missing feature into a no-op unless the flag or structure is
262 explicitly intended to be ignored.
263
264 ### 5. Exhaustive matches matter
265
266 - Prefer enums for wire forms and linker-side states.
267 - Avoid catch-all `_` arms in production matches when a new variant should force
268 the compiler to help us.
269 - When adding a new variant, update every relevant match deliberately.
270
271 ### 6. Keep dump surfaces useful
272
273 - `--dump*` modes are an active debugging tool, not a side feature.
274 - When new reader functionality lands, extend the corresponding dump output.
275 - If you add a new parsed field but the dump cannot show it, the repo loses one
276 of its best inspection surfaces.
277
278 ### 7. Respect deterministic behavior
279
280 - Avoid nondeterministic iteration when output order matters.
281 - Avoid timestamps, random ids, or unstable hashing in any future write path.
282 - When adding diagnostics, keep them stable and testable.
283
284 ## Testing Practices
285
286 - Every bug fix gets a regression test.
287 - New parser behavior should land with unit tests close to the module.
288 - When touching integration behavior, prefer real fixtures over mocked ones.
289 - For archive work, look first at `tests/archive_runtime.rs`.
290 - For dylib and TBD work, look first at `tests/dylib_integration.rs`,
291 `tests/tbd_integration.rs`, and `tests/tbd_smoke.rs`.
292 - For reader invariants, `tests/reader_corpus_round_trip.rs` is a key guardrail.
293 - For resolution and atomization, `tests/resolve_integration.rs` and
294 `tests/atom_integration.rs` should move with the code.
295 - If you add future write-side functionality, extend the differential harness
296 rather than building a parallel ad hoc test path.
297
298 Run focused tests first, then widen:
299
300 - module-local or single integration test while developing
301 - `cargo test -p afs-ld` before handing work off
302 - `cargo clippy -p afs-ld --all-targets -- -D warnings` when changing code paths
303 broadly enough to justify it
304
305 ## Documentation Practices
306
307 - `CLAUDE.md` is policy and development discipline.
308 - `.docs/overview.md` is the intended architecture and scope.
309 - `.docs/sprints/` is the staged roadmap.
310 - `README.md` is user-facing and currently stale relative to the read-side code.
311
312 When a change materially shifts reality, update the tracked docs that are now
313 misleading. This is especially important in this repo because the roadmap is
314 ambitious and can otherwise create false assumptions for future work.
315
316 ## References
317
318 Use the parent repository's references when you need to confirm Mach-O or linker
319 behavior instead of inventing from memory:
320
321 - `.refs/llvm/lld/MachO/` for architecture and pass structure
322 - `.refs/ld64/` for Apple-parity edge cases
323 - `.refs/mold/` for performance ideas and comparative implementation choices
324
325 Also use Apple's Mach-O and arm64 relocation headers as the numeric source of
326 truth for constants mirrored in `src/macho/constants.rs`.
327
328 ## Working Style For This Repo
329
330 - Prefer small, reviewable changes.
331 - Keep commit messages terse and imperative.
332 - Do not mention sprint numbers in commit subjects.
333 - Avoid monolithic "land the whole linker" changes; the sprint plan is granular
334 for a reason.
335 - Before implementing a planned module from the roadmap, make sure the crate
336 actually has the prerequisites the sprint assumed.
337 - If you are about to say "the docs say this exists," stop and confirm with
338 `ls`, `rg`, and the tests.
339
340 ## Practical Shortcuts
341
342 - Use `rg --files` and `rg` first; the repo is small enough that this is fast
343 and keeps context grounded in the actual tree.
344 - For current status, start with `src/lib.rs`, `src/main.rs`, `src/args.rs`,
345 `tests/common/harness.rs`, and `README.md`.
346 - For architectural intent, then read `.docs/overview.md` and the relevant
347 sprint file.
348
349 That order will save a lot of confusion.