afs-ld Public

Watch 0 Fork 0 Star 0

markdown · 12652 bytes Raw Blame History

AFS-LD

Local working guide for agents in afs-ld. Keep this file untracked. CLAUDE.md is the tracked, authoritative policy file; this document adds a reality-checked snapshot of the current implementation so we do not confuse the roadmap with shipped code.

Repository Context

afs-ld is the standalone ARM64 Mach-O linker for the ARMFORTAS toolchain. It sits beside afs-as as a submodule in the armfortas workspace and is meant to replace Apple's ld for binaries produced by armfortas.

The project boundary is intentionally clean:

afs-as emits MH_OBJECT.
afs-ld reads .o, .a, .dylib, and .tbd.
armfortas should eventually hand final linking to afs-ld rather than to the system linker.

The project is Mach-O only, macOS only, arm64 only, stdlib only.

Definition Of Done

The real finish line is not "parses some objects" or "links hello world once." It is parity with Apple's ld for the binaries armfortas and fortsh need:

arm64 Mach-O executables and dylibs
static archive linking
dylib and TBD ingestion
dyld metadata that works on real macOS systems
ad-hoc signing so output executes on Apple Silicon
deterministic output
enough correctness to link fortsh without ARM-specific workarounds

Current Reality

This repo is ahead of Sprint 0 scaffolding, but it is not yet a full linker. The roadmap in .docs/overview.md and .docs/sprints/ is broader than the code that exists today.

What is implemented now:

hand-rolled CLI parsing for a small flag subset plus dump modes
Mach-O header/load-command/section/symbol/string-table reading
relocation parsing, fusion, validation, and round-trip support
archive parsing and lazy member fetch support
binary dylib parsing and export-trie walking
TAPI TBD v4 parsing, including the custom YAML subset parser
linker-side symbol interning, symbol table modeling, and resolution passes
subsections-via-symbols atomization
--dump, --dump-archive, --dump-dylib, and --dump-tbd

What is not implemented yet:

real Linker::run output production
output layout and Mach-O writing
dyld metadata synthesis
code signing
dead-strip / ICF / thunks
real differential linking against Apple ld
driver integration with armfortas
the full ld-compatible CLI surface described in Sprint 19

Important practical note:

src/lib.rs still returns LinkError::NotYetImplemented for real link runs.
tests/common/harness.rs::link_both still panics because full end-to-end linker execution has not landed.
README.md still describes the crate as "Sprint 0 scaffolding only," which is now too pessimistic for the read-side code but still accurate for the actual link-producing path.

As of 2026-04-15 in this checkout, cargo test -p afs-ld is green.

Strengths

The read-side core is already substantial and well-tested.
The project has strong bespoke discipline: no clap, serde, object, goblin, byteorder, or other format-parsing shortcuts.
Raw wire structures are modeled explicitly and usually paired with round-trip-oriented tests.
The type modeling is strong: opaque ids, interned strings, explicit symbol states, explicit atom ownership, explicit relocation referents.
Real-world fixtures are already in play: afs-as corpus objects, libarmfortas_rt.a, libSystem.tbd, and small clang-built dylibs.
The codebase already separates concerns cleanly enough that writer/layout work can land without tearing up the read-side foundation.
Dump modes make inspection easy and are useful while the full writer does not exist yet.

Weaknesses And Risk Areas

The actual link-producing pipeline does not exist yet, so the hardest parity bugs are still ahead of us.
Some tracked docs are aspirational. .docs/overview.md is the intended end state, not a guarantee that every listed module already exists.
README.md is stale in the opposite direction: it understates how much read-side work has landed.
The current diagnostics surface is still minimal. src/diag.rs only prints afs-ld: error: ...; the richer caret diagnostics are planned, not present.
The CLI surface is intentionally tiny right now. Any work that assumes ld-compatibility must start by checking src/args.rs, not by trusting the sprint plan.
Performance characteristics are mostly unknown because the writer, layout, and full-link path are not in place yet.
The differential harness is only half-built: the diff engine exists, but the "run both linkers" machinery is not wired.
Several future modules named in the roadmap do not exist yet: layout.rs, driver.rs, map.rs, gc.rs, icf.rs, synth/, macho/writer.rs, and the code-signing path are all still planned work.

Build And Test

Primary commands:

cargo build -p afs-ld
cargo test -p afs-ld
cargo clippy -p afs-ld --all-targets -- -D warnings

Useful targeted commands:

cargo test --lib -p afs-ld
cargo test --test reader_corpus_round_trip -p afs-ld
cargo test --test archive_runtime -p afs-ld
cargo test --test dylib_integration -p afs-ld
cargo test --test tbd_integration -p afs-ld
cargo test --test resolve_integration -p afs-ld
cargo test --test atom_integration -p afs-ld
cargo test -p afs-ld -- <substring>

Environment assumptions:

macOS on Apple Silicon
Xcode command-line tools available through xcrun
access to the parent workspace, especially runtime/ and .refs/

Integration tests already shell out to system tools in a few places. Do not replace those with fake fixtures if a real toolchain interaction is the thing being tested.

Project Structure

Actual source tree today:

afs-ld/
├── CLAUDE.md
├── README.md
├── .docs/
│   ├── overview.md
│   └── sprints/
├── src/
│   ├── archive.rs
│   ├── args.rs
│   ├── atom.rs
│   ├── diag.rs
│   ├── dump.rs
│   ├── input.rs
│   ├── leb.rs
│   ├── lib.rs
│   ├── main.rs
│   ├── resolve.rs
│   ├── section.rs
│   ├── string_table.rs
│   ├── symbol.rs
│   ├── macho/
│   │   ├── constants.rs
│   │   ├── dylib.rs
│   │   ├── exports.rs
│   │   ├── reader.rs
│   │   ├── tbd.rs
│   │   └── tbd_yaml.rs
│   └── reloc/
│       └── mod.rs
└── tests/
    ├── common/harness.rs
    ├── archive_runtime.rs
    ├── atom_integration.rs
    ├── diff_harness_*.rs
    ├── dylib_integration.rs
    ├── reader_*.rs
    ├── resolve_integration.rs
    ├── tbd_*.rs
    └── reader_corpus_round_trip.rs

Planned future modules listed in the docs should be treated as design intent, not as present-tense implementation.

Implemented Pipeline Vs Planned Pipeline

Implemented today:

argv
  -> args.rs
  -> dump/read paths
  -> archive/object/dylib/TBD ingestion
  -> symbol/section/reloc decoding
  -> resolve.rs
  -> atom.rs

Current real-link path:

argv -> args.rs -> Linker::run -> NotYetImplemented

Planned end-to-end pipeline from the roadmap:

args -> inputs -> resolve -> atomize -> layout -> apply relocs
     -> synth sections -> write -> sign

When you are planning work, always identify which of those stages is real in this checkout and which stage is still only described in docs.

Development Guidance

1. Trust code and tests over roadmap prose

Read these in order before substantial work:

CLAUDE.md
.docs/overview.md
the relevant sprint file in .docs/sprints/
the actual Rust module you will touch
the tests covering that module

If the docs and the code disagree, treat the code plus tests as the truth about what exists today, then decide whether the docs need to be refreshed.

2. Keep the bespoke contract intact

Stdlib only unless a dependency discussion happens first.
Do not couple afs-ld to afs-as at a Rust type level.
Duplicate Mach-O constants locally when needed.
Do not hide format details behind clever abstractions that erase wire truth.

3. Preserve the wire

Keep raw bytes or raw fields accessible when lossless re-emission matters.
Prefer explicit parse and write pairs for on-disk structures.
Avoid converting fixed-size or padded wire data into lossy higher-level forms unless the raw representation is still available somewhere.
If a new decoder lands, pair it with tests that prove it round-trips or at least preserves the exact bytes relevant to the current stage.

4. Be explicit about incomplete work

Hard errors are better than silent wrong answers.
If something is not implemented, say so directly.
Do not introduce "temporary" behavior that quietly emits malformed Mach-O.
Do not soften a missing feature into a no-op unless the flag or structure is explicitly intended to be ignored.

5. Exhaustive matches matter

Prefer enums for wire forms and linker-side states.
Avoid catch-all _ arms in production matches when a new variant should force the compiler to help us.
When adding a new variant, update every relevant match deliberately.

6. Keep dump surfaces useful

--dump* modes are an active debugging tool, not a side feature.
When new reader functionality lands, extend the corresponding dump output.
If you add a new parsed field but the dump cannot show it, the repo loses one of its best inspection surfaces.

7. Respect deterministic behavior

Avoid nondeterministic iteration when output order matters.
Avoid timestamps, random ids, or unstable hashing in any future write path.
When adding diagnostics, keep them stable and testable.

Testing Practices

Every bug fix gets a regression test.
New parser behavior should land with unit tests close to the module.
When touching integration behavior, prefer real fixtures over mocked ones.
For archive work, look first at tests/archive_runtime.rs.
For dylib and TBD work, look first at tests/dylib_integration.rs, tests/tbd_integration.rs, and tests/tbd_smoke.rs.
For reader invariants, tests/reader_corpus_round_trip.rs is a key guardrail.
For resolution and atomization, tests/resolve_integration.rs and tests/atom_integration.rs should move with the code.
If you add future write-side functionality, extend the differential harness rather than building a parallel ad hoc test path.

Run focused tests first, then widen:

module-local or single integration test while developing
cargo test -p afs-ld before handing work off
cargo clippy -p afs-ld --all-targets -- -D warnings when changing code paths broadly enough to justify it

Documentation Practices

CLAUDE.md is policy and development discipline.
.docs/overview.md is the intended architecture and scope.
.docs/sprints/ is the staged roadmap.
README.md is user-facing and currently stale relative to the read-side code.

When a change materially shifts reality, update the tracked docs that are now misleading. This is especially important in this repo because the roadmap is ambitious and can otherwise create false assumptions for future work.

References

Use the parent repository's references when you need to confirm Mach-O or linker behavior instead of inventing from memory:

.refs/llvm/lld/MachO/ for architecture and pass structure
.refs/ld64/ for Apple-parity edge cases
.refs/mold/ for performance ideas and comparative implementation choices

Also use Apple's Mach-O and arm64 relocation headers as the numeric source of truth for constants mirrored in src/macho/constants.rs.

Working Style For This Repo

Prefer small, reviewable changes.
Keep commit messages terse and imperative.
Do not mention sprint numbers in commit subjects.
Avoid monolithic "land the whole linker" changes; the sprint plan is granular for a reason.
Before implementing a planned module from the roadmap, make sure the crate actually has the prerequisites the sprint assumed.
If you are about to say "the docs say this exists," stop and confirm with ls, rg, and the tests.

Practical Shortcuts

Use rg --files and rg first; the repo is small enough that this is fast and keeps context grounded in the actual tree.
For current status, start with src/lib.rs, src/main.rs, src/args.rs, tests/common/harness.rs, and README.md.
For architectural intent, then read .docs/overview.md and the relevant sprint file.

That order will save a lot of confusion.

View source

  
        1
        # AFS-LD
      
        2
        
        3
        Local working guide for agents in `afs-ld`. Keep this file untracked.
      
        4
        `CLAUDE.md` is the tracked, authoritative policy file; this document adds a
      
        5
        reality-checked snapshot of the current implementation so we do not confuse the
      
        6
        roadmap with shipped code.
      
        7
        
        8
        ## Repository Context
      
        9
        
        10
        `afs-ld` is the standalone ARM64 Mach-O linker for the ARMFORTAS toolchain. It
      
        11
        sits beside `afs-as` as a submodule in the `armfortas` workspace and is meant
      
        12
        to replace Apple's `ld` for binaries produced by armfortas.
      
        13
        
        14
        The project boundary is intentionally clean:
      
        15
        
        16
        - `afs-as` emits `MH_OBJECT`.
      
        17
        - `afs-ld` reads `.o`, `.a`, `.dylib`, and `.tbd`.
      
        18
        - armfortas should eventually hand final linking to `afs-ld` rather than to
      
        19
          the system linker.
      
        20
        
        21
        The project is Mach-O only, macOS only, arm64 only, stdlib only.
      
        22
        
        23
        ## Definition Of Done
      
        24
        
        25
        The real finish line is not "parses some objects" or "links hello world once."
      
        26
        It is parity with Apple's `ld` for the binaries armfortas and fortsh need:
      
        27
        
        28
        - arm64 Mach-O executables and dylibs
      
        29
        - static archive linking
      
        30
        - dylib and TBD ingestion
      
        31
        - dyld metadata that works on real macOS systems
      
        32
        - ad-hoc signing so output executes on Apple Silicon
      
        33
        - deterministic output
      
        34
        - enough correctness to link fortsh without ARM-specific workarounds
      
        35
        
        36
        ## Current Reality
      
        37
        
        38
        This repo is ahead of Sprint 0 scaffolding, but it is not yet a full linker.
      
        39
        The roadmap in `.docs/overview.md` and `.docs/sprints/` is broader than the
      
        40
        code that exists today.
      
        41
        
        42
        What is implemented now:
      
        43
        
        44
        - hand-rolled CLI parsing for a small flag subset plus dump modes
      
        45
        - Mach-O header/load-command/section/symbol/string-table reading
      
        46
        - relocation parsing, fusion, validation, and round-trip support
      
        47
        - archive parsing and lazy member fetch support
      
        48
        - binary dylib parsing and export-trie walking
      
        49
        - TAPI TBD v4 parsing, including the custom YAML subset parser
      
        50
        - linker-side symbol interning, symbol table modeling, and resolution passes
      
        51
        - subsections-via-symbols atomization
      
        52
        - `--dump`, `--dump-archive`, `--dump-dylib`, and `--dump-tbd`
      
        53
        
        54
        What is not implemented yet:
      
        55
        
        56
        - real `Linker::run` output production
      
        57
        - output layout and Mach-O writing
      
        58
        - dyld metadata synthesis
      
        59
        - code signing
      
        60
        - dead-strip / ICF / thunks
      
        61
        - real differential linking against Apple `ld`
      
        62
        - driver integration with armfortas
      
        63
        - the full `ld`-compatible CLI surface described in Sprint 19
      
        64
        
        65
        Important practical note:
      
        66
        
        67
        - `src/lib.rs` still returns `LinkError::NotYetImplemented` for real link runs.
      
        68
        - `tests/common/harness.rs::link_both` still panics because full end-to-end
      
        69
          linker execution has not landed.
      
        70
        - `README.md` still describes the crate as "Sprint 0 scaffolding only," which is
      
        71
          now too pessimistic for the read-side code but still accurate for the actual
      
        72
          link-producing path.
      
        73
        
        74
        As of 2026-04-15 in this checkout, `cargo test -p afs-ld` is green.
      
        75
        
        76
        ## Strengths
      
        77
        
        78
        - The read-side core is already substantial and well-tested.
      
        79
        - The project has strong bespoke discipline: no `clap`, `serde`, `object`,
      
        80
          `goblin`, `byteorder`, or other format-parsing shortcuts.
      
        81
        - Raw wire structures are modeled explicitly and usually paired with
      
        82
          round-trip-oriented tests.
      
        83
        - The type modeling is strong: opaque ids, interned strings, explicit symbol
      
        84
          states, explicit atom ownership, explicit relocation referents.
      
        85
        - Real-world fixtures are already in play: afs-as corpus objects,
      
        86
          `libarmfortas_rt.a`, `libSystem.tbd`, and small clang-built dylibs.
      
        87
        - The codebase already separates concerns cleanly enough that writer/layout work
      
        88
          can land without tearing up the read-side foundation.
      
        89
        - Dump modes make inspection easy and are useful while the full writer does not
      
        90
          exist yet.
      
        91
        
        92
        ## Weaknesses And Risk Areas
      
        93
        
        94
        - The actual link-producing pipeline does not exist yet, so the hardest parity
      
        95
          bugs are still ahead of us.
      
        96
        - Some tracked docs are aspirational. `.docs/overview.md` is the intended end
      
        97
          state, not a guarantee that every listed module already exists.
      
        98
        - `README.md` is stale in the opposite direction: it understates how much
      
        99
          read-side work has landed.
      
        100
        - The current diagnostics surface is still minimal. `src/diag.rs` only prints
      
        101
          `afs-ld: error: ...`; the richer caret diagnostics are planned, not present.
      
        102
        - The CLI surface is intentionally tiny right now. Any work that assumes
      
        103
          `ld`-compatibility must start by checking `src/args.rs`, not by trusting the
      
        104
          sprint plan.
      
        105
        - Performance characteristics are mostly unknown because the writer, layout, and
      
        106
          full-link path are not in place yet.
      
        107
        - The differential harness is only half-built: the diff engine exists, but the
      
        108
          "run both linkers" machinery is not wired.
      
        109
        - Several future modules named in the roadmap do not exist yet:
      
        110
          `layout.rs`, `driver.rs`, `map.rs`, `gc.rs`, `icf.rs`, `synth/`,
      
        111
          `macho/writer.rs`, and the code-signing path are all still planned work.
      
        112
        
        113
        ## Build And Test
      
        114
        
        115
        Primary commands:
      
        116
        
        117
        ```bash
      
        118
        cargo build -p afs-ld
      
        119
        cargo test -p afs-ld
      
        120
        cargo clippy -p afs-ld --all-targets -- -D warnings
      
        121
        ```
      
        122
        
        123
        Useful targeted commands:
      
        124
        
        125
        ```bash
      
        126
        cargo test --lib -p afs-ld
      
        127
        cargo test --test reader_corpus_round_trip -p afs-ld
      
        128
        cargo test --test archive_runtime -p afs-ld
      
        129
        cargo test --test dylib_integration -p afs-ld
      
        130
        cargo test --test tbd_integration -p afs-ld
      
        131
        cargo test --test resolve_integration -p afs-ld
      
        132
        cargo test --test atom_integration -p afs-ld
      
        133
        cargo test -p afs-ld -- <substring>
      
        134
        ```
      
        135
        
        136
        Environment assumptions:
      
        137
        
        138
        - macOS on Apple Silicon
      
        139
        - Xcode command-line tools available through `xcrun`
      
        140
        - access to the parent workspace, especially `runtime/` and `.refs/`
      
        141
        
        142
        Integration tests already shell out to system tools in a few places. Do not
      
        143
        replace those with fake fixtures if a real toolchain interaction is the thing
      
        144
        being tested.
      
        145
        
        146
        ## Project Structure
      
        147
        
        148
        Actual source tree today:
      
        149
        
        150
        ```text
      
        151
        afs-ld/
      
        152
        ├── CLAUDE.md
      
        153
        ├── README.md
      
        154
        ├── .docs/
      
        155
        │   ├── overview.md
      
        156
        │   └── sprints/
      
        157
        ├── src/
      
        158
        │   ├── archive.rs
      
        159
        │   ├── args.rs
      
        160
        │   ├── atom.rs
      
        161
        │   ├── diag.rs
      
        162
        │   ├── dump.rs
      
        163
        │   ├── input.rs
      
        164
        │   ├── leb.rs
      
        165
        │   ├── lib.rs
      
        166
        │   ├── main.rs
      
        167
        │   ├── resolve.rs
      
        168
        │   ├── section.rs
      
        169
        │   ├── string_table.rs
      
        170
        │   ├── symbol.rs
      
        171
        │   ├── macho/
      
        172
        │   │   ├── constants.rs
      
        173
        │   │   ├── dylib.rs
      
        174
        │   │   ├── exports.rs
      
        175
        │   │   ├── reader.rs
      
        176
        │   │   ├── tbd.rs
      
        177
        │   │   └── tbd_yaml.rs
      
        178
        │   └── reloc/
      
        179
        │       └── mod.rs
      
        180
        └── tests/
      
        181
            ├── common/harness.rs
      
        182
            ├── archive_runtime.rs
      
        183
            ├── atom_integration.rs
      
        184
            ├── diff_harness_*.rs
      
        185
            ├── dylib_integration.rs
      
        186
            ├── reader_*.rs
      
        187
            ├── resolve_integration.rs
      
        188
            ├── tbd_*.rs
      
        189
            └── reader_corpus_round_trip.rs
      
        190
        ```
      
        191
        
        192
        Planned future modules listed in the docs should be treated as design intent,
      
        193
        not as present-tense implementation.
      
        194
        
        195
        ## Implemented Pipeline Vs Planned Pipeline
      
        196
        
        197
        Implemented today:
      
        198
        
        199
        ```text
      
        200
        argv
      
        201
          -> args.rs
      
        202
          -> dump/read paths
      
        203
          -> archive/object/dylib/TBD ingestion
      
        204
          -> symbol/section/reloc decoding
      
        205
          -> resolve.rs
      
        206
          -> atom.rs
      
        207
        ```
      
        208
        
        209
        Current real-link path:
      
        210
        
        211
        ```text
      
        212
        argv -> args.rs -> Linker::run -> NotYetImplemented
      
        213
        ```
      
        214
        
        215
        Planned end-to-end pipeline from the roadmap:
      
        216
        
        217
        ```text
      
        218
        args -> inputs -> resolve -> atomize -> layout -> apply relocs
      
        219
             -> synth sections -> write -> sign
      
        220
        ```
      
        221
        
        222
        When you are planning work, always identify which of those stages is real in
      
        223
        this checkout and which stage is still only described in docs.
      
        224
        
        225
        ## Development Guidance
      
        226
        
        227
        ### 1. Trust code and tests over roadmap prose
      
        228
        
        229
        Read these in order before substantial work:
      
        230
        
        231
        1. `CLAUDE.md`
      
        232
        2. `.docs/overview.md`
      
        233
        3. the relevant sprint file in `.docs/sprints/`
      
        234
        4. the actual Rust module you will touch
      
        235
        5. the tests covering that module
      
        236
        
        237
        If the docs and the code disagree, treat the code plus tests as the truth about
      
        238
        what exists today, then decide whether the docs need to be refreshed.
      
        239
        
        240
        ### 2. Keep the bespoke contract intact
      
        241
        
        242
        - Stdlib only unless a dependency discussion happens first.
      
        243
        - Do not couple afs-ld to afs-as at a Rust type level.
      
        244
        - Duplicate Mach-O constants locally when needed.
      
        245
        - Do not hide format details behind clever abstractions that erase wire truth.
      
        246
        
        247
        ### 3. Preserve the wire
      
        248
        
        249
        - Keep raw bytes or raw fields accessible when lossless re-emission matters.
      
        250
        - Prefer explicit parse and write pairs for on-disk structures.
      
        251
        - Avoid converting fixed-size or padded wire data into lossy higher-level forms
      
        252
          unless the raw representation is still available somewhere.
      
        253
        - If a new decoder lands, pair it with tests that prove it round-trips or at
      
        254
          least preserves the exact bytes relevant to the current stage.
      
        255
        
        256
        ### 4. Be explicit about incomplete work
      
        257
        
        258
        - Hard errors are better than silent wrong answers.
      
        259
        - If something is not implemented, say so directly.
      
        260
        - Do not introduce "temporary" behavior that quietly emits malformed Mach-O.
      
        261
        - Do not soften a missing feature into a no-op unless the flag or structure is
      
        262
          explicitly intended to be ignored.
      
        263
        
        264
        ### 5. Exhaustive matches matter
      
        265
        
        266
        - Prefer enums for wire forms and linker-side states.
      
        267
        - Avoid catch-all `_` arms in production matches when a new variant should force
      
        268
          the compiler to help us.
      
        269
        - When adding a new variant, update every relevant match deliberately.
      
        270
        
        271
        ### 6. Keep dump surfaces useful
      
        272
        
        273
        - `--dump*` modes are an active debugging tool, not a side feature.
      
        274
        - When new reader functionality lands, extend the corresponding dump output.
      
        275
        - If you add a new parsed field but the dump cannot show it, the repo loses one
      
        276
          of its best inspection surfaces.
      
        277
        
        278
        ### 7. Respect deterministic behavior
      
        279
        
        280
        - Avoid nondeterministic iteration when output order matters.
      
        281
        - Avoid timestamps, random ids, or unstable hashing in any future write path.
      
        282
        - When adding diagnostics, keep them stable and testable.
      
        283
        
        284
        ## Testing Practices
      
        285
        
        286
        - Every bug fix gets a regression test.
      
        287
        - New parser behavior should land with unit tests close to the module.
      
        288
        - When touching integration behavior, prefer real fixtures over mocked ones.
      
        289
        - For archive work, look first at `tests/archive_runtime.rs`.
      
        290
        - For dylib and TBD work, look first at `tests/dylib_integration.rs`,
      
        291
          `tests/tbd_integration.rs`, and `tests/tbd_smoke.rs`.
      
        292
        - For reader invariants, `tests/reader_corpus_round_trip.rs` is a key guardrail.
      
        293
        - For resolution and atomization, `tests/resolve_integration.rs` and
      
        294
          `tests/atom_integration.rs` should move with the code.
      
        295
        - If you add future write-side functionality, extend the differential harness
      
        296
          rather than building a parallel ad hoc test path.
      
        297
        
        298
        Run focused tests first, then widen:
      
        299
        
        300
        - module-local or single integration test while developing
      
        301
        - `cargo test -p afs-ld` before handing work off
      
        302
        - `cargo clippy -p afs-ld --all-targets -- -D warnings` when changing code paths
      
        303
          broadly enough to justify it
      
        304
        
        305
        ## Documentation Practices
      
        306
        
        307
        - `CLAUDE.md` is policy and development discipline.
      
        308
        - `.docs/overview.md` is the intended architecture and scope.
      
        309
        - `.docs/sprints/` is the staged roadmap.
      
        310
        - `README.md` is user-facing and currently stale relative to the read-side code.
      
        311
        
        312
        When a change materially shifts reality, update the tracked docs that are now
      
        313
        misleading. This is especially important in this repo because the roadmap is
      
        314
        ambitious and can otherwise create false assumptions for future work.
      
        315
        
        316
        ## References
      
        317
        
        318
        Use the parent repository's references when you need to confirm Mach-O or linker
      
        319
        behavior instead of inventing from memory:
      
        320
        
        321
        - `.refs/llvm/lld/MachO/` for architecture and pass structure
      
        322
        - `.refs/ld64/` for Apple-parity edge cases
      
        323
        - `.refs/mold/` for performance ideas and comparative implementation choices
      
        324
        
        325
        Also use Apple's Mach-O and arm64 relocation headers as the numeric source of
      
        326
        truth for constants mirrored in `src/macho/constants.rs`.
      
        327
        
        328
        ## Working Style For This Repo
      
        329
        
        330
        - Prefer small, reviewable changes.
      
        331
        - Keep commit messages terse and imperative.
      
        332
        - Do not mention sprint numbers in commit subjects.
      
        333
        - Avoid monolithic "land the whole linker" changes; the sprint plan is granular
      
        334
          for a reason.
      
        335
        - Before implementing a planned module from the roadmap, make sure the crate
      
        336
          actually has the prerequisites the sprint assumed.
      
        337
        - If you are about to say "the docs say this exists," stop and confirm with
      
        338
          `ls`, `rg`, and the tests.
      
        339
        
        340
        ## Practical Shortcuts
      
        341
        
        342
        - Use `rg --files` and `rg` first; the repo is small enough that this is fast
      
        343
          and keeps context grounded in the actual tree.
      
        344
        - For current status, start with `src/lib.rs`, `src/main.rs`, `src/args.rs`,
      
        345
          `tests/common/harness.rs`, and `README.md`.
      
        346
        - For architectural intent, then read `.docs/overview.md` and the relevant
      
        347
          sprint file.
      
        348
        
        349
        That order will save a lot of confusion.

1	# AFS-LD
2
3	Local working guide for agents in `afs-ld`. Keep this file untracked.
4	`CLAUDE.md` is the tracked, authoritative policy file; this document adds a
5	reality-checked snapshot of the current implementation so we do not confuse the
6	roadmap with shipped code.
7
8	## Repository Context
9
10	`afs-ld` is the standalone ARM64 Mach-O linker for the ARMFORTAS toolchain. It
11	sits beside `afs-as` as a submodule in the `armfortas` workspace and is meant
12	to replace Apple's `ld` for binaries produced by armfortas.
13
14	The project boundary is intentionally clean:
15
16	- `afs-as` emits `MH_OBJECT`.
17	- `afs-ld` reads `.o`, `.a`, `.dylib`, and `.tbd`.
18	- armfortas should eventually hand final linking to `afs-ld` rather than to
19	the system linker.
20
21	The project is Mach-O only, macOS only, arm64 only, stdlib only.
22
23	## Definition Of Done
24
25	The real finish line is not "parses some objects" or "links hello world once."
26	It is parity with Apple's `ld` for the binaries armfortas and fortsh need:
27
28	- arm64 Mach-O executables and dylibs
29	- static archive linking
30	- dylib and TBD ingestion
31	- dyld metadata that works on real macOS systems
32	- ad-hoc signing so output executes on Apple Silicon
33	- deterministic output
34	- enough correctness to link fortsh without ARM-specific workarounds
35
36	## Current Reality
37
38	This repo is ahead of Sprint 0 scaffolding, but it is not yet a full linker.
39	The roadmap in `.docs/overview.md` and `.docs/sprints/` is broader than the
40	code that exists today.
41
42	What is implemented now:
43
44	- hand-rolled CLI parsing for a small flag subset plus dump modes
45	- Mach-O header/load-command/section/symbol/string-table reading
46	- relocation parsing, fusion, validation, and round-trip support
47	- archive parsing and lazy member fetch support
48	- binary dylib parsing and export-trie walking
49	- TAPI TBD v4 parsing, including the custom YAML subset parser
50	- linker-side symbol interning, symbol table modeling, and resolution passes
51	- subsections-via-symbols atomization
52	- `--dump`, `--dump-archive`, `--dump-dylib`, and `--dump-tbd`
53
54	What is not implemented yet:
55
56	- real `Linker::run` output production
57	- output layout and Mach-O writing
58	- dyld metadata synthesis
59	- code signing
60	- dead-strip / ICF / thunks
61	- real differential linking against Apple `ld`
62	- driver integration with armfortas
63	- the full `ld`-compatible CLI surface described in Sprint 19
64
65	Important practical note:
66
67	- `src/lib.rs` still returns `LinkError::NotYetImplemented` for real link runs.
68	- `tests/common/harness.rs::link_both` still panics because full end-to-end
69	linker execution has not landed.
70	- `README.md` still describes the crate as "Sprint 0 scaffolding only," which is
71	now too pessimistic for the read-side code but still accurate for the actual
72	link-producing path.
73
74	As of 2026-04-15 in this checkout, `cargo test -p afs-ld` is green.
75
76	## Strengths
77
78	- The read-side core is already substantial and well-tested.
79	- The project has strong bespoke discipline: no `clap`, `serde`, `object`,
80	`goblin`, `byteorder`, or other format-parsing shortcuts.
81	- Raw wire structures are modeled explicitly and usually paired with
82	round-trip-oriented tests.
83	- The type modeling is strong: opaque ids, interned strings, explicit symbol
84	states, explicit atom ownership, explicit relocation referents.
85	- Real-world fixtures are already in play: afs-as corpus objects,
86	`libarmfortas_rt.a`, `libSystem.tbd`, and small clang-built dylibs.
87	- The codebase already separates concerns cleanly enough that writer/layout work
88	can land without tearing up the read-side foundation.
89	- Dump modes make inspection easy and are useful while the full writer does not
90	exist yet.
91
92	## Weaknesses And Risk Areas
93
94	- The actual link-producing pipeline does not exist yet, so the hardest parity
95	bugs are still ahead of us.
96	- Some tracked docs are aspirational. `.docs/overview.md` is the intended end
97	state, not a guarantee that every listed module already exists.
98	- `README.md` is stale in the opposite direction: it understates how much
99	read-side work has landed.
100	- The current diagnostics surface is still minimal. `src/diag.rs` only prints
101	`afs-ld: error: ...`; the richer caret diagnostics are planned, not present.
102	- The CLI surface is intentionally tiny right now. Any work that assumes
103	`ld`-compatibility must start by checking `src/args.rs`, not by trusting the
104	sprint plan.
105	- Performance characteristics are mostly unknown because the writer, layout, and
106	full-link path are not in place yet.
107	- The differential harness is only half-built: the diff engine exists, but the
108	"run both linkers" machinery is not wired.
109	- Several future modules named in the roadmap do not exist yet:
110	`layout.rs`, `driver.rs`, `map.rs`, `gc.rs`, `icf.rs`, `synth/`,
111	`macho/writer.rs`, and the code-signing path are all still planned work.
112
113	## Build And Test
114
115	Primary commands:
116
117	```bash
118	cargo build -p afs-ld
119	cargo test -p afs-ld
120	cargo clippy -p afs-ld --all-targets -- -D warnings
121	```
122
123	Useful targeted commands:
124
125	```bash
126	cargo test --lib -p afs-ld
127	cargo test --test reader_corpus_round_trip -p afs-ld
128	cargo test --test archive_runtime -p afs-ld
129	cargo test --test dylib_integration -p afs-ld
130	cargo test --test tbd_integration -p afs-ld
131	cargo test --test resolve_integration -p afs-ld
132	cargo test --test atom_integration -p afs-ld
133	cargo test -p afs-ld -- <substring>
134	```
135
136	Environment assumptions:
137
138	- macOS on Apple Silicon
139	- Xcode command-line tools available through `xcrun`
140	- access to the parent workspace, especially `runtime/` and `.refs/`
141
142	Integration tests already shell out to system tools in a few places. Do not
143	replace those with fake fixtures if a real toolchain interaction is the thing
144	being tested.
145
146	## Project Structure
147
148	Actual source tree today:
149
150	```text
151	afs-ld/
152	├── CLAUDE.md
153	├── README.md
154	├── .docs/
155	│ ├── overview.md
156	│ └── sprints/
157	├── src/
158	│ ├── archive.rs
159	│ ├── args.rs
160	│ ├── atom.rs
161	│ ├── diag.rs
162	│ ├── dump.rs
163	│ ├── input.rs
164	│ ├── leb.rs
165	│ ├── lib.rs
166	│ ├── main.rs
167	│ ├── resolve.rs
168	│ ├── section.rs
169	│ ├── string_table.rs
170	│ ├── symbol.rs
171	│ ├── macho/
172	│ │ ├── constants.rs
173	│ │ ├── dylib.rs
174	│ │ ├── exports.rs
175	│ │ ├── reader.rs
176	│ │ ├── tbd.rs
177	│ │ └── tbd_yaml.rs
178	│ └── reloc/
179	│ └── mod.rs
180	└── tests/
181	├── common/harness.rs
182	├── archive_runtime.rs
183	├── atom_integration.rs
184	├── diff_harness_*.rs
185	├── dylib_integration.rs
186	├── reader_*.rs
187	├── resolve_integration.rs
188	├── tbd_*.rs
189	└── reader_corpus_round_trip.rs
190	```
191
192	Planned future modules listed in the docs should be treated as design intent,
193	not as present-tense implementation.
194
195	## Implemented Pipeline Vs Planned Pipeline
196
197	Implemented today:
198
199	```text
200	argv
201	-> args.rs
202	-> dump/read paths
203	-> archive/object/dylib/TBD ingestion
204	-> symbol/section/reloc decoding
205	-> resolve.rs
206	-> atom.rs
207	```
208
209	Current real-link path:
210
211	```text
212	argv -> args.rs -> Linker::run -> NotYetImplemented
213	```
214
215	Planned end-to-end pipeline from the roadmap:
216
217	```text
218	args -> inputs -> resolve -> atomize -> layout -> apply relocs
219	-> synth sections -> write -> sign
220	```
221
222	When you are planning work, always identify which of those stages is real in
223	this checkout and which stage is still only described in docs.
224
225	## Development Guidance
226
227	### 1. Trust code and tests over roadmap prose
228
229	Read these in order before substantial work:
230
231	1. `CLAUDE.md`
232	2. `.docs/overview.md`
233	3. the relevant sprint file in `.docs/sprints/`
234	4. the actual Rust module you will touch
235	5. the tests covering that module
236
237	If the docs and the code disagree, treat the code plus tests as the truth about
238	what exists today, then decide whether the docs need to be refreshed.
239
240	### 2. Keep the bespoke contract intact
241
242	- Stdlib only unless a dependency discussion happens first.
243	- Do not couple afs-ld to afs-as at a Rust type level.
244	- Duplicate Mach-O constants locally when needed.
245	- Do not hide format details behind clever abstractions that erase wire truth.
246
247	### 3. Preserve the wire
248
249	- Keep raw bytes or raw fields accessible when lossless re-emission matters.
250	- Prefer explicit parse and write pairs for on-disk structures.
251	- Avoid converting fixed-size or padded wire data into lossy higher-level forms
252	unless the raw representation is still available somewhere.
253	- If a new decoder lands, pair it with tests that prove it round-trips or at
254	least preserves the exact bytes relevant to the current stage.
255
256	### 4. Be explicit about incomplete work
257
258	- Hard errors are better than silent wrong answers.
259	- If something is not implemented, say so directly.
260	- Do not introduce "temporary" behavior that quietly emits malformed Mach-O.
261	- Do not soften a missing feature into a no-op unless the flag or structure is
262	explicitly intended to be ignored.
263
264	### 5. Exhaustive matches matter
265
266	- Prefer enums for wire forms and linker-side states.
267	- Avoid catch-all `_` arms in production matches when a new variant should force
268	the compiler to help us.
269	- When adding a new variant, update every relevant match deliberately.
270
271	### 6. Keep dump surfaces useful
272
273	- `--dump*` modes are an active debugging tool, not a side feature.
274	- When new reader functionality lands, extend the corresponding dump output.
275	- If you add a new parsed field but the dump cannot show it, the repo loses one
276	of its best inspection surfaces.
277
278	### 7. Respect deterministic behavior
279
280	- Avoid nondeterministic iteration when output order matters.
281	- Avoid timestamps, random ids, or unstable hashing in any future write path.
282	- When adding diagnostics, keep them stable and testable.
283
284	## Testing Practices
285
286	- Every bug fix gets a regression test.
287	- New parser behavior should land with unit tests close to the module.
288	- When touching integration behavior, prefer real fixtures over mocked ones.
289	- For archive work, look first at `tests/archive_runtime.rs`.
290	- For dylib and TBD work, look first at `tests/dylib_integration.rs`,
291	`tests/tbd_integration.rs`, and `tests/tbd_smoke.rs`.
292	- For reader invariants, `tests/reader_corpus_round_trip.rs` is a key guardrail.
293	- For resolution and atomization, `tests/resolve_integration.rs` and
294	`tests/atom_integration.rs` should move with the code.
295	- If you add future write-side functionality, extend the differential harness
296	rather than building a parallel ad hoc test path.
297
298	Run focused tests first, then widen:
299
300	- module-local or single integration test while developing
301	- `cargo test -p afs-ld` before handing work off
302	- `cargo clippy -p afs-ld --all-targets -- -D warnings` when changing code paths
303	broadly enough to justify it
304
305	## Documentation Practices
306
307	- `CLAUDE.md` is policy and development discipline.
308	- `.docs/overview.md` is the intended architecture and scope.
309	- `.docs/sprints/` is the staged roadmap.
310	- `README.md` is user-facing and currently stale relative to the read-side code.
311
312	When a change materially shifts reality, update the tracked docs that are now
313	misleading. This is especially important in this repo because the roadmap is
314	ambitious and can otherwise create false assumptions for future work.
315
316	## References
317
318	Use the parent repository's references when you need to confirm Mach-O or linker
319	behavior instead of inventing from memory:
320
321	- `.refs/llvm/lld/MachO/` for architecture and pass structure
322	- `.refs/ld64/` for Apple-parity edge cases
323	- `.refs/mold/` for performance ideas and comparative implementation choices
324
325	Also use Apple's Mach-O and arm64 relocation headers as the numeric source of
326	truth for constants mirrored in `src/macho/constants.rs`.
327
328	## Working Style For This Repo
329
330	- Prefer small, reviewable changes.
331	- Keep commit messages terse and imperative.
332	- Do not mention sprint numbers in commit subjects.
333	- Avoid monolithic "land the whole linker" changes; the sprint plan is granular
334	for a reason.
335	- Before implementing a planned module from the roadmap, make sure the crate
336	actually has the prerequisites the sprint assumed.
337	- If you are about to say "the docs say this exists," stop and confirm with
338	`ls`, `rg`, and the tests.
339
340	## Practical Shortcuts
341
342	- Use `rg --files` and `rg` first; the repo is small enough that this is fast
343	and keeps context grounded in the actual tree.
344	- For current status, start with `src/lib.rs`, `src/main.rs`, `src/args.rs`,
345	`tests/common/harness.rs`, and `README.md`.
346	- For architectural intent, then read `.docs/overview.md` and the relevant
347	sprint file.
348
349	That order will save a lot of confusion.