Sprint 4: Static Archives (ar)
Prerequisites
Sprints 1–3 — Mach-O reading complete.
Goals
Read static archives (.a) including the BSD, System V, and GNU-thin variants. Support lazy member fetching: a member is only parsed when an undefined symbol names it. This is the mechanism by which libarmfortas_rt.a gets pulled in.
Closeout note: the force-load surface landed in the resolver as
resolve::force_load_archive / resolve::force_load_all, and one-level
nested archives are expanded through the fetched-member path with provenance
chains such as outer.a(inner.a)(foo.o). --dump-archive now intentionally
prints the same member listing shape as ar -t, and parity is checked
against both generated archives and libarmfortas_rt.a when available.
Deliverables
1. Archive format recognizer
afs-ld/src/archive.rs:
pub struct Archive<'a> {
pub path: PathBuf,
pub flavor: Flavor, // Bsd, Sysv, GnuThin
pub symdef: SymbolIndex, // names → member offsets
pub members: Vec<Member<'a>>,
}
pub enum Flavor { Bsd, Sysv, GnuThin }
Detection by magic: !<arch>\n for all flavors; thin archives use !<thin>\n. BSD vs SysV distinguished by the first entry: #1/<N> BSD extended filenames vs // SysV long-name string table.
2. Header parsing
Each member preceded by a 60-byte ar_hdr:
char name[16]; // "#1/<N>" on BSD, or "//" string-table index on SysV, or "foo.o/" on SysV short
char date[12];
char uid[6];
char gid[6];
char mode[8];
char size[10];
char fmag[2]; // "`\n"
Parse field-by-field with tight bounds checks. Size is a decimal ASCII integer, not a C literal.
3. Name decoding
- BSD: name field
#1/<N>, real name is the first N bytes of the member body (body shrinks accordingly). - SysV: name field holds a byte offset into the
//string table. - SysV short:
foo.o/— slash-terminated, space-padded. - GNU-thin: member body is zero bytes; the name encodes a path relative to the archive. afs-ld
mmaps the external file.
Names stored canonical (null-stripped, slash-stripped).
4. Symbol index
SysV / member or BSD __.SYMDEF / __.SYMDEF SORTED member. BSD layout:
uint32 ranlib_count
ranlib[ranlib_count] { uint32 strx; uint32 offset; }
uint32 stringsize
char strings[stringsize]
SysV: big-endian nsyms: u32, then nsyms big-endian u32 offsets, then packed null-terminated strings.
SymbolIndex exposes fn members_defining(name: &str) -> impl Iterator<Item = MemberRef>.
5. Lazy fetch API
impl<'a> Archive<'a> {
pub fn fetch(&mut self, name: &str) -> Option<ObjectFile>;
}
Returns None if the archive does not define name. Fetching an archive member memoizes: a second lookup for the same member returns a cached handle. The resolution pass (Sprint 8) is the only caller.
6. -force_load / -all_load support (semantics, not CLI yet)
Implemented via the resolver-level helpers
resolve::force_load_archive / resolve::force_load_all, which pre-fetch
archive members against the live linker input registry. Sprint 19 wires the
CLI surface.
7. Archive-of-archives
Rare but legal: member can be another .a. Recurse one level. If a sub-archive defines name, the outer fetch returns the sub-member's object file and records a provenance chain for diagnostics.
Testing Strategy
- Fixtures in
tests/corpus/archives/:libbsd.amade by Applear(BSD flavor, extended filenames).libsysv.amade by GNUaron Linux (for cross-check).libthin.amade byar --thin(GNU-thin).libmulti.acontaining several members each defining one or more symbols.
cargo test -p afs-ld test_archive_bsdverifies BSD index → correct member for each name.- Symbol-defining-two-members scenario: archive picks the one whose member comes first (ld's traditional rule).
- Missing-symbol lookup returns
None, does not error. - Thin-archive member file missing on disk produces a path-qualified diagnostic.
Definition of Done
- All three archive flavors read.
libarmfortas_rt.a(built by parent workspace) parses and every runtime symbol is findable by name.- Archive-of-archives works one level deep.
- Differential:
ar -t libarmfortas_rt.aoutput matches our--dump-archiveoutput.
View source
| 1 | # Sprint 4: Static Archives (`ar`) |
| 2 | |
| 3 | ## Prerequisites |
| 4 | Sprints 1–3 — Mach-O reading complete. |
| 5 | |
| 6 | ## Goals |
| 7 | Read static archives (`.a`) including the BSD, System V, and GNU-thin variants. Support lazy member fetching: a member is only parsed when an undefined symbol names it. This is the mechanism by which `libarmfortas_rt.a` gets pulled in. |
| 8 | |
| 9 | Closeout note: the force-load surface landed in the resolver as |
| 10 | `resolve::force_load_archive` / `resolve::force_load_all`, and one-level |
| 11 | nested archives are expanded through the fetched-member path with provenance |
| 12 | chains such as `outer.a(inner.a)(foo.o)`. `--dump-archive` now intentionally |
| 13 | prints the same member listing shape as `ar -t`, and parity is checked |
| 14 | against both generated archives and `libarmfortas_rt.a` when available. |
| 15 | |
| 16 | ## Deliverables |
| 17 | |
| 18 | ### 1. Archive format recognizer |
| 19 | `afs-ld/src/archive.rs`: |
| 20 | |
| 21 | ```rust |
| 22 | pub struct Archive<'a> { |
| 23 | pub path: PathBuf, |
| 24 | pub flavor: Flavor, // Bsd, Sysv, GnuThin |
| 25 | pub symdef: SymbolIndex, // names → member offsets |
| 26 | pub members: Vec<Member<'a>>, |
| 27 | } |
| 28 | |
| 29 | pub enum Flavor { Bsd, Sysv, GnuThin } |
| 30 | ``` |
| 31 | |
| 32 | Detection by magic: `!<arch>\n` for all flavors; thin archives use `!<thin>\n`. BSD vs SysV distinguished by the first entry: `#1/<N>` BSD extended filenames vs `//` SysV long-name string table. |
| 33 | |
| 34 | ### 2. Header parsing |
| 35 | Each member preceded by a 60-byte `ar_hdr`: |
| 36 | ``` |
| 37 | char name[16]; // "#1/<N>" on BSD, or "//" string-table index on SysV, or "foo.o/" on SysV short |
| 38 | char date[12]; |
| 39 | char uid[6]; |
| 40 | char gid[6]; |
| 41 | char mode[8]; |
| 42 | char size[10]; |
| 43 | char fmag[2]; // "`\n" |
| 44 | ``` |
| 45 | |
| 46 | Parse field-by-field with tight bounds checks. Size is a decimal ASCII integer, not a C literal. |
| 47 | |
| 48 | ### 3. Name decoding |
| 49 | - BSD: name field `#1/<N>`, real name is the first N bytes of the member body (body shrinks accordingly). |
| 50 | - SysV: name field holds a byte offset into the `//` string table. |
| 51 | - SysV short: `foo.o/ ` — slash-terminated, space-padded. |
| 52 | - GNU-thin: member body is zero bytes; the name encodes a path relative to the archive. afs-ld `mmap`s the external file. |
| 53 | |
| 54 | Names stored canonical (null-stripped, slash-stripped). |
| 55 | |
| 56 | ### 4. Symbol index |
| 57 | SysV `/` member or BSD `__.SYMDEF` / `__.SYMDEF SORTED` member. BSD layout: |
| 58 | ``` |
| 59 | uint32 ranlib_count |
| 60 | ranlib[ranlib_count] { uint32 strx; uint32 offset; } |
| 61 | uint32 stringsize |
| 62 | char strings[stringsize] |
| 63 | ``` |
| 64 | |
| 65 | SysV: big-endian `nsyms: u32`, then `nsyms` big-endian `u32` offsets, then packed null-terminated strings. |
| 66 | |
| 67 | `SymbolIndex` exposes `fn members_defining(name: &str) -> impl Iterator<Item = MemberRef>`. |
| 68 | |
| 69 | ### 5. Lazy fetch API |
| 70 | ```rust |
| 71 | impl<'a> Archive<'a> { |
| 72 | pub fn fetch(&mut self, name: &str) -> Option<ObjectFile>; |
| 73 | } |
| 74 | ``` |
| 75 | |
| 76 | Returns `None` if the archive does not define `name`. Fetching an archive member memoizes: a second lookup for the same member returns a cached handle. The resolution pass (Sprint 8) is the only caller. |
| 77 | |
| 78 | ### 6. `-force_load` / `-all_load` support (semantics, not CLI yet) |
| 79 | Implemented via the resolver-level helpers |
| 80 | `resolve::force_load_archive` / `resolve::force_load_all`, which pre-fetch |
| 81 | archive members against the live linker input registry. Sprint 19 wires the |
| 82 | CLI surface. |
| 83 | |
| 84 | ### 7. Archive-of-archives |
| 85 | Rare but legal: member can be another `.a`. Recurse one level. If a sub-archive defines `name`, the outer `fetch` returns the sub-member's object file and records a provenance chain for diagnostics. |
| 86 | |
| 87 | ## Testing Strategy |
| 88 | - Fixtures in `tests/corpus/archives/`: |
| 89 | - `libbsd.a` made by Apple `ar` (BSD flavor, extended filenames). |
| 90 | - `libsysv.a` made by GNU `ar` on Linux (for cross-check). |
| 91 | - `libthin.a` made by `ar --thin` (GNU-thin). |
| 92 | - `libmulti.a` containing several members each defining one or more symbols. |
| 93 | - `cargo test -p afs-ld test_archive_bsd` verifies BSD index → correct member for each name. |
| 94 | - Symbol-defining-two-members scenario: archive picks the one whose member comes first (ld's traditional rule). |
| 95 | - Missing-symbol lookup returns `None`, does not error. |
| 96 | - Thin-archive member file missing on disk produces a path-qualified diagnostic. |
| 97 | |
| 98 | ## Definition of Done |
| 99 | - All three archive flavors read. |
| 100 | - `libarmfortas_rt.a` (built by parent workspace) parses and every runtime symbol is findable by name. |
| 101 | - Archive-of-archives works one level deep. |
| 102 | - Differential: `ar -t libarmfortas_rt.a` output matches our `--dump-archive` output. |