markdown · 3763 bytes Raw Blame History

Sprint 4: Static Archives (ar)

Prerequisites

Sprints 1–3 — Mach-O reading complete.

Goals

Read static archives (.a) including the BSD, System V, and GNU-thin variants. Support lazy member fetching: a member is only parsed when an undefined symbol names it. This is the mechanism by which libarmfortas_rt.a gets pulled in.

Deliverables

1. Archive format recognizer

afs-ld/src/archive.rs:

pub struct Archive<'a> {
    pub path: PathBuf,
    pub flavor: Flavor,         // Bsd, Sysv, GnuThin
    pub symdef: SymbolIndex,    // names → member offsets
    pub members: Vec<Member<'a>>,
}

pub enum Flavor { Bsd, Sysv, GnuThin }

Detection by magic: !<arch>\n for all flavors; thin archives use !<thin>\n. BSD vs SysV distinguished by the first entry: #1/<N> BSD extended filenames vs // SysV long-name string table.

2. Header parsing

Each member preceded by a 60-byte ar_hdr:

char name[16];   // "#1/<N>" on BSD, or "//" string-table index on SysV, or "foo.o/" on SysV short
char date[12];
char uid[6];
char gid[6];
char mode[8];
char size[10];
char fmag[2];    // "`\n"

Parse field-by-field with tight bounds checks. Size is a decimal ASCII integer, not a C literal.

3. Name decoding

  • BSD: name field #1/<N>, real name is the first N bytes of the member body (body shrinks accordingly).
  • SysV: name field holds a byte offset into the // string table.
  • SysV short: foo.o/ — slash-terminated, space-padded.
  • GNU-thin: member body is zero bytes; the name encodes a path relative to the archive. afs-ld mmaps the external file.

Names stored canonical (null-stripped, slash-stripped).

4. Symbol index

SysV / member or BSD __.SYMDEF / __.SYMDEF SORTED member. BSD layout:

uint32 ranlib_count
ranlib[ranlib_count] { uint32 strx; uint32 offset; }
uint32 stringsize
char strings[stringsize]

SysV: big-endian nsyms: u32, then nsyms big-endian u32 offsets, then packed null-terminated strings.

SymbolIndex exposes fn members_defining(name: &str) -> impl Iterator<Item = MemberRef>.

5. Lazy fetch API

impl<'a> Archive<'a> {
    pub fn fetch(&mut self, name: &str) -> Option<ObjectFile>;
}

Returns None if the archive does not define name. Fetching an archive member memoizes: a second lookup for the same member returns a cached handle. The resolution pass (Sprint 8) is the only caller.

6. -force_load / -all_load support (semantics, not CLI yet)

Archive has a force_all(&mut self) method that pre-fetches every member. Sprint 19 wires the CLI.

7. Archive-of-archives

Rare but legal: member can be another .a. Recurse one level. If a sub-archive defines name, the outer fetch returns the sub-member's object file and records a provenance chain for diagnostics.

Testing Strategy

  • Fixtures in tests/corpus/archives/:
    • libbsd.a made by Apple ar (BSD flavor, extended filenames).
    • libsysv.a made by GNU ar on Linux (for cross-check).
    • libthin.a made by ar --thin (GNU-thin).
    • libmulti.a containing several members each defining one or more symbols.
  • cargo test -p afs-ld test_archive_bsd verifies BSD index → correct member for each name.
  • Symbol-defining-two-members scenario: archive picks the one whose member comes first (ld's traditional rule).
  • Missing-symbol lookup returns None, does not error.
  • Thin-archive member file missing on disk produces a path-qualified diagnostic.

Definition of Done

  • All three archive flavors read.
  • libarmfortas_rt.a (built by parent workspace) parses and every runtime symbol is findable by name.
  • Archive-of-archives works one level deep.
  • Differential: ar -t libarmfortas_rt.a output matches our --dump-archive output.
View source
1 # Sprint 4: Static Archives (`ar`)
2
3 ## Prerequisites
4 Sprints 1–3 — Mach-O reading complete.
5
6 ## Goals
7 Read static archives (`.a`) including the BSD, System V, and GNU-thin variants. Support lazy member fetching: a member is only parsed when an undefined symbol names it. This is the mechanism by which `libarmfortas_rt.a` gets pulled in.
8
9 ## Deliverables
10
11 ### 1. Archive format recognizer
12 `afs-ld/src/archive.rs`:
13
14 ```rust
15 pub struct Archive<'a> {
16 pub path: PathBuf,
17 pub flavor: Flavor, // Bsd, Sysv, GnuThin
18 pub symdef: SymbolIndex, // names → member offsets
19 pub members: Vec<Member<'a>>,
20 }
21
22 pub enum Flavor { Bsd, Sysv, GnuThin }
23 ```
24
25 Detection by magic: `!<arch>\n` for all flavors; thin archives use `!<thin>\n`. BSD vs SysV distinguished by the first entry: `#1/<N>` BSD extended filenames vs `//` SysV long-name string table.
26
27 ### 2. Header parsing
28 Each member preceded by a 60-byte `ar_hdr`:
29 ```
30 char name[16]; // "#1/<N>" on BSD, or "//" string-table index on SysV, or "foo.o/" on SysV short
31 char date[12];
32 char uid[6];
33 char gid[6];
34 char mode[8];
35 char size[10];
36 char fmag[2]; // "`\n"
37 ```
38
39 Parse field-by-field with tight bounds checks. Size is a decimal ASCII integer, not a C literal.
40
41 ### 3. Name decoding
42 - BSD: name field `#1/<N>`, real name is the first N bytes of the member body (body shrinks accordingly).
43 - SysV: name field holds a byte offset into the `//` string table.
44 - SysV short: `foo.o/ ` — slash-terminated, space-padded.
45 - GNU-thin: member body is zero bytes; the name encodes a path relative to the archive. afs-ld `mmap`s the external file.
46
47 Names stored canonical (null-stripped, slash-stripped).
48
49 ### 4. Symbol index
50 SysV `/` member or BSD `__.SYMDEF` / `__.SYMDEF SORTED` member. BSD layout:
51 ```
52 uint32 ranlib_count
53 ranlib[ranlib_count] { uint32 strx; uint32 offset; }
54 uint32 stringsize
55 char strings[stringsize]
56 ```
57
58 SysV: big-endian `nsyms: u32`, then `nsyms` big-endian `u32` offsets, then packed null-terminated strings.
59
60 `SymbolIndex` exposes `fn members_defining(name: &str) -> impl Iterator<Item = MemberRef>`.
61
62 ### 5. Lazy fetch API
63 ```rust
64 impl<'a> Archive<'a> {
65 pub fn fetch(&mut self, name: &str) -> Option<ObjectFile>;
66 }
67 ```
68
69 Returns `None` if the archive does not define `name`. Fetching an archive member memoizes: a second lookup for the same member returns a cached handle. The resolution pass (Sprint 8) is the only caller.
70
71 ### 6. `-force_load` / `-all_load` support (semantics, not CLI yet)
72 Archive has a `force_all(&mut self)` method that pre-fetches every member. Sprint 19 wires the CLI.
73
74 ### 7. Archive-of-archives
75 Rare but legal: member can be another `.a`. Recurse one level. If a sub-archive defines `name`, the outer `fetch` returns the sub-member's object file and records a provenance chain for diagnostics.
76
77 ## Testing Strategy
78 - Fixtures in `tests/corpus/archives/`:
79 - `libbsd.a` made by Apple `ar` (BSD flavor, extended filenames).
80 - `libsysv.a` made by GNU `ar` on Linux (for cross-check).
81 - `libthin.a` made by `ar --thin` (GNU-thin).
82 - `libmulti.a` containing several members each defining one or more symbols.
83 - `cargo test -p afs-ld test_archive_bsd` verifies BSD index → correct member for each name.
84 - Symbol-defining-two-members scenario: archive picks the one whose member comes first (ld's traditional rule).
85 - Missing-symbol lookup returns `None`, does not error.
86 - Thin-archive member file missing on disk produces a path-qualified diagnostic.
87
88 ## Definition of Done
89 - All three archive flavors read.
90 - `libarmfortas_rt.a` (built by parent workspace) parses and every runtime symbol is findable by name.
91 - Archive-of-archives works one level deep.
92 - Differential: `ar -t libarmfortas_rt.a` output matches our `--dump-archive` output.