@@ -0,0 +1,105 @@ |
| 1 | +# CLAUDE.md |
| 2 | + |
| 3 | +This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. |
| 4 | + |
| 5 | +## Repository Context |
| 6 | + |
| 7 | +`afs-ld` is a **git submodule** of [ARMFORTAS](https://github.com/FortranGoingOnForty/armfortas), a bespoke ARM64 Fortran compiler. It is the standalone ARM64 Mach-O linker: it reads Mach-O relocatable objects (MH_OBJECT) produced by `afs-as`, static archives, binary dylibs, and TAPI TBD text stubs, and emits linked Mach-O executables (MH_EXECUTE) and shared libraries (MH_DYLIB). It knows nothing about Fortran — the boundary with the compiler is the CLI (an `ld`-compatible flag surface). |
| 8 | + |
| 9 | +The parent `armfortas/CLAUDE.md` describes the broader compiler philosophy (bespoke, no LLVM, no parser generators, no compiler-infrastructure crates) and applies here too. **Rust standard library only** — no `clap`, no `serde`, no `byteorder`, no `object`, no `goblin`, no `memmap2`, no YAML crate. Hand-roll parsers, serializers, and the tiny subset of YAML we need for TBD files. |
| 10 | + |
| 11 | +## Build, Test, Lint |
| 12 | + |
| 13 | +```bash |
| 14 | +cargo build -p afs-ld # build linker crate |
| 15 | +cargo test -p afs-ld # run all afs-ld tests |
| 16 | +cargo clippy -p afs-ld --all-targets -- -D warnings |
| 17 | + |
| 18 | +cargo test --lib -p afs-ld # unit tests only (in src/) |
| 19 | +cargo test --test <name> -p afs-ld # one integration test file |
| 20 | +cargo test --test parity_matrix # vs Apple `ld` across the corpus |
| 21 | +cargo test --test hello_world # executable end-to-end |
| 22 | +cargo test --test hello_library # dylib end-to-end |
| 23 | +cargo test -p afs-ld -- <substring> # filter by test name |
| 24 | +``` |
| 25 | + |
| 26 | +Integration tests shell out to Apple `ld`, `otool`, `nm`, `codesign`, and `xcrun`. They require **macOS on Apple Silicon** and a working Xcode command-line toolchain. Do not stub these out — the differential against the system linker is the entire point of the parity matrix. |
| 27 | + |
| 28 | +## Architecture |
| 29 | + |
| 30 | +Pipeline, end to end: |
| 31 | + |
| 32 | +``` |
| 33 | +args → inputs → resolve → atomize → layout → apply relocs → synth sections → write → sign |
| 34 | + │ │ │ │ │ │ │ │ │ |
| 35 | +args.rs input.rs resolve.rs atom.rs layout.rs reloc/arm64.rs synth/*.rs macho/ synth/ |
| 36 | + symbol.rs writer code_sig |
| 37 | +``` |
| 38 | + |
| 39 | +### Module responsibilities |
| 40 | + |
| 41 | +- **`src/args.rs`** — CLI parser. Hand-rolled, no `clap`. Recognizes the `ld`-compatible flag surface (Sprint 19 ships the full set). |
| 42 | +- **`src/macho/`** — Mach-O 64 read/write. `constants.rs` holds the numeric literals (`LC_*`, `MH_*`, `S_*`, `N_*`, `ARM64_RELOC_*`) — duplicated from afs-as rather than cross-crate coupled, keeping each submodule independent. `reader.rs` parses MH_OBJECT; `writer.rs` emits MH_EXECUTE and MH_DYLIB; `dylib.rs` parses binary MH_DYLIB; `tbd.rs` parses TAPI TBD v4 text stubs (minimal YAML subset, not a general parser). |
| 43 | +- **`src/archive.rs`** — BSD + SysV + GNU-thin static archives. Lazy member fetch. |
| 44 | +- **`src/input.rs`** — `InputFile` enum unifying objects, archives, dylibs, TBDs. |
| 45 | +- **`src/symbol.rs`** / **`src/resolve.rs`** — `Symbol` sum type and the name resolution pass. Archive-driven fixed-point loop; weak/common/alias coalescing; diagnostics with did-you-mean. |
| 46 | +- **`src/atom.rs`** — subsections-via-symbols atomization. Atoms are the unit of dead-stripping, ICF, and output layout. |
| 47 | +- **`src/section.rs`** / **`src/layout.rs`** — output segment plan and VM/file-offset assignment. `MH_EXECUTE` and `MH_DYLIB` are both first-class. |
| 48 | +- **`src/reloc/`** — ARM64 reloc application (`arm64.rs`) and LOH relaxation (`loh.rs`). Handles BRANCH26, PAGE21/PAGEOFF12, GOT_LOAD_*, POINTER_TO_GOT, TLVP_LOAD_*, UNSIGNED, SUBTRACTOR, ADDEND. |
| 49 | +- **`src/synth/`** — synthetic sections: `got`, `stubs`, `tlv`, `symtab`, `dyld_info` (classic), `chained` (LC_DYLD_CHAINED_FIXUPS), `unwind`, `eh_frame`, `func_starts`, `data_in_code`, `code_sig` (ad-hoc SHA-256). |
| 50 | +- **`src/gc.rs`** / **`src/icf.rs`** — `-dead_strip` and `-icf=safe` passes. |
| 51 | +- **`src/map.rs`** / **`src/why_live.rs`** — `-map` link map and `-why_live` dead-strip reasoning. |
| 52 | +- **`src/driver.rs`** — orchestrator: args → inputs → resolve → atomize → layout → apply relocs → synth → write → sign. |
| 53 | +- **`src/diag.rs`** — diagnostics. Path + byte offset + caret, matching `afs-as/src/diag*.rs` style. Deterministic output: no wall clock, no pid, no thread-id. |
| 54 | + |
| 55 | +## Coding Conventions |
| 56 | + |
| 57 | +- **Rust std only.** Any external dependency needs an explicit debate and a CLAUDE.md update. |
| 58 | +- **`unsafe` only where genuinely required.** Keep blocks small and commented. The one known case is `libc::mmap` for large input files (Sprint 28). |
| 59 | +- **Exhaustive pattern matching** on `Section`, `Symbol`, `Relocation`, `InputFile`, `Fixup`, `LoadCommand` — no catch-all `_` arms outside tests. |
| 60 | +- **Determinism**: no timestamps in output, sorted iteration order, stable hashing, parallelism preserves byte-identical output. |
| 61 | +- **Commit discipline**: terse imperative messages, no co-authors, per-file / per-chunk commits, never monoliths. No sprint-number references in commit messages. |
| 62 | +- **No "stubs pass silently"**: placeholder code that returns wrong answers is worse than code that panics. Tests must catch the stub, not paper over it. |
| 63 | +- **Diagnostics always cite input + offset + caret.** `src/diag.rs` is the one place that constructs these; every error path goes through it. |
| 64 | + |
| 65 | +## Test Architecture |
| 66 | + |
| 67 | +Tests are **layered**, not a single golden path. Each layer catches a different class of regression: |
| 68 | + |
| 69 | +| Test file | What it proves | |
| 70 | +|---|---| |
| 71 | +| `src/**/#[cfg(test)]` | Parser / encoder / resolution unit tests (`cargo test --lib`) | |
| 72 | +| `tests/common/harness.rs` | Differential harness: spawn afs-ld + system ld on the same inputs, diff outputs | |
| 73 | +| `tests/reader_*.rs` | Round-trip Mach-O object reads across the afs-as corpus | |
| 74 | +| `tests/reloc_*.rs` | Golden-file relocation application | |
| 75 | +| `tests/resolve_*.rs` | Symbol resolution matrices (strong vs weak vs common vs dylib vs archive) | |
| 76 | +| `tests/hello_world.rs` | End-to-end: afs-as → afs-ld → runnable PIE executable | |
| 77 | +| `tests/hello_library.rs` | End-to-end: afs-as → afs-ld → `dlopen`able dylib | |
| 78 | +| `tests/parity_matrix.rs` | Full corpus byte-level differential vs Apple `ld` (CI gate) | |
| 79 | +| `tests/armfortas_integration.rs` | Parent's integration suite run under `AFS_LD=1` | |
| 80 | + |
| 81 | +Corpus fixtures live in `tests/corpus/`. Every new relocation kind, section kind, or CLI flag lands a corpus entry in the same sprint that implements it. |
| 82 | + |
| 83 | +## Audit discipline |
| 84 | + |
| 85 | +After each sprint, a brutally honest audit: |
| 86 | + |
| 87 | +- Assume nothing works until proven otherwise. Test every claim. |
| 88 | +- "Placeholder" and "stub" are synonyms for "broken." Wrong answers silently produced are worse than crashes. |
| 89 | +- Check against the Mach-O ABI spec, not just "does it link." Wrong output corrupts the loader's bind/rebase state. |
| 90 | +- Don't soften findings. "Major" means "produces wrong binaries." "Critical" means "silent corruption that dyld will accept." |
| 91 | +- No deferred items unless they genuinely require a later sprint. Fix it now if it can be fixed now. |
| 92 | +- The audit is not a formality. It's the last line of defense before bad linker output gets merged and ships bad binaries downstream. |
| 93 | + |
| 94 | +## Key references |
| 95 | + |
| 96 | +- `.refs/llvm/lld/MachO/` — primary architectural reference. |
| 97 | +- `.refs/ld64/src/` — Apple authoritative. `src/ld/` and `src/mach_o/` cover the whole linker. |
| 98 | +- `.refs/mold/src/` — performance techniques (parallel parsing, string merging, allocator tricks). Mostly ELF-oriented but the performance patterns apply. |
| 99 | +- Apple `<mach-o/loader.h>`, `<mach-o/nlist.h>`, `<mach-o/reloc.h>`, `<mach-o/arm64/reloc.h>` — mirrored numerically in `src/macho/constants.rs`. |
| 100 | +- `dyld` open source — bind/rebase/lazy-bind opcode semantics and chained-fixups format. |
| 101 | +- ARM Architecture Reference Manual (ARMv8-A) — encoding of relocated instructions. |
| 102 | + |
| 103 | +## Sprint roadmap |
| 104 | + |
| 105 | +`.docs/sprints/index.md` — 32 sprints across 10 phases. Each sprint has an individual markdown file with prerequisites, deliverables, testing strategy, and definition of done. |