Sprint 7: Symbol Model & Table
Prerequisites
Sprints 2, 4, 5, 6 — object, archive, dylib, TBD readers in place.
Goals
A uniform symbol table that fuses definitions from every input kind. Establishes the invariants Sprint 8's resolution pass will preserve.
Deliverables
1. Symbol sum type
afs-ld/src/symbol.rs:
pub enum Symbol {
Undefined { name: Istr, origin: InputId, weak_ref: bool },
Defined { name: Istr, origin: InputId, atom: AtomId, value: u64,
weak: bool, private_extern: bool, no_dead_strip: bool },
Common { name: Istr, origin: InputId, size: u64, align_pow2: u8 },
DylibImport { name: Istr, dylib: DylibId, ordinal: u16, weak_import: bool },
LazyArchive { name: Istr, archive: ArchiveId, member: MemberId },
LazyObject { name: Istr, origin: InputId }, // --start-lib / --end-lib
Alias { name: Istr, aliased: Istr }, // N_INDR
}
Istr = interned string handle. Interning happens once when a name first enters the table; all comparisons are handle-equality.
2. SymbolTable
pub struct SymbolTable {
names: StringInterner,
by_name: HashMap<Istr, SymbolId>,
symbols: Vec<Symbol>,
// replacement log for diagnostics + -why_live
transitions: Vec<Transition>,
}
pub struct Transition { pub at: SymbolId, pub from: SymbolKindTag, pub to: SymbolKindTag, pub cause: Cause }
HashMap is fine for Sprint 7; Sprint 28 may swap in a custom open-addressing table.
3. Insertion semantics
SymbolTable::insert(sym: Symbol) runs the resolution rules inline:
| Existing \ New | Undefined | Defined | Common | DylibImport | LazyArchive | LazyObject |
|---|---|---|---|---|---|---|
| vacant | insert | insert | insert | insert | insert | insert |
| Undefined | keep | replace | replace | replace | replace | replace |
| Defined (strong) | keep | error if both strong and same kind | keep | keep | keep | keep |
| Defined (weak) | keep | replace if new strong | keep | keep | keep | keep |
| Common | keep | replace (common → Defined) | pick larger size / stricter align | keep | keep | keep |
| DylibImport | keep | replace (definition shadows import) | keep | keep | keep | keep |
| LazyArchive | fetch | replace | replace | replace | keep first | replace |
| LazyObject | fetch | replace | replace | replace | replace | keep |
"Fetch" means: load the member/object, enqueue its symbols, mark this entry's transition.
4. Weak coalescing rules
weak_def+weak_def→ first wins.weak_def+ strong → strong wins.- Strong + strong → hard error, diagnostic cites both input paths.
weak_refwithout a definition is not an error; the reference resolves to address 0 (handled in relocation pass).
5. Aliases (N_INDR)
Flattened on insertion: an Alias(name → aliased) is resolved by looking up aliased. If aliased is itself an alias, walk until a non-alias is found; cycle detection with a depth cap.
6. Transition log
Every insert records the old/new kind + input path + (for lazy fetches) the reason the fetch happened. The -why_live diagnostic introduced in Sprint 19 reads this log.
7. Tombstoned symbols
Common → Defined promotion preserves the common size and alignment (so the BSS slot is large enough). Dead-stripping (Sprint 23) can tombstone a Defined without removing it from the table.
Testing Strategy
- Unit tests for every cell in the resolution matrix. Each combination has a named test.
- Synthetic inputs: two
.os both defining_foostrong → error; one strong + one weak → strong wins; two weak → first wins; common + strong → common replaced. - Alias-chain cycles detected with a diagnostic, not a stack overflow.
- Interner stress test: 100K unique names, membership queries are O(1) average.
Definition of Done
- Every matrix cell has a passing test.
- Weak coalescing matches
ldon a corpus of 20+ scenarios (differential test: both linkers produce the samenmoutput). - Alias flattening correct and cycle-safe.
- Transition log surfaces replacement causes for
-why_live.
View source
| 1 | # Sprint 7: Symbol Model & Table |
| 2 | |
| 3 | ## Prerequisites |
| 4 | Sprints 2, 4, 5, 6 — object, archive, dylib, TBD readers in place. |
| 5 | |
| 6 | ## Goals |
| 7 | A uniform symbol table that fuses definitions from every input kind. Establishes the invariants Sprint 8's resolution pass will preserve. |
| 8 | |
| 9 | ## Deliverables |
| 10 | |
| 11 | ### 1. `Symbol` sum type |
| 12 | `afs-ld/src/symbol.rs`: |
| 13 | |
| 14 | ```rust |
| 15 | pub enum Symbol { |
| 16 | Undefined { name: Istr, origin: InputId, weak_ref: bool }, |
| 17 | Defined { name: Istr, origin: InputId, atom: AtomId, value: u64, |
| 18 | weak: bool, private_extern: bool, no_dead_strip: bool }, |
| 19 | Common { name: Istr, origin: InputId, size: u64, align_pow2: u8 }, |
| 20 | DylibImport { name: Istr, dylib: DylibId, ordinal: u16, weak_import: bool }, |
| 21 | LazyArchive { name: Istr, archive: ArchiveId, member: MemberId }, |
| 22 | LazyObject { name: Istr, origin: InputId }, // --start-lib / --end-lib |
| 23 | Alias { name: Istr, aliased: Istr }, // N_INDR |
| 24 | } |
| 25 | ``` |
| 26 | |
| 27 | `Istr` = interned string handle. Interning happens once when a name first enters the table; all comparisons are handle-equality. |
| 28 | |
| 29 | ### 2. `SymbolTable` |
| 30 | ```rust |
| 31 | pub struct SymbolTable { |
| 32 | names: StringInterner, |
| 33 | by_name: HashMap<Istr, SymbolId>, |
| 34 | symbols: Vec<Symbol>, |
| 35 | // replacement log for diagnostics + -why_live |
| 36 | transitions: Vec<Transition>, |
| 37 | } |
| 38 | |
| 39 | pub struct Transition { pub at: SymbolId, pub from: SymbolKindTag, pub to: SymbolKindTag, pub cause: Cause } |
| 40 | ``` |
| 41 | |
| 42 | `HashMap` is fine for Sprint 7; Sprint 28 may swap in a custom open-addressing table. |
| 43 | |
| 44 | ### 3. Insertion semantics |
| 45 | `SymbolTable::insert(sym: Symbol)` runs the resolution rules inline: |
| 46 | |
| 47 | | Existing \ New | Undefined | Defined | Common | DylibImport | LazyArchive | LazyObject | |
| 48 | |----------------------|-----------|---------|--------|-------------|-------------|------------| |
| 49 | | *vacant* | insert | insert | insert | insert | insert | insert | |
| 50 | | Undefined | keep | replace | replace| replace | replace | replace | |
| 51 | | Defined (strong) | keep | **error if both strong and same kind** | keep | keep | keep | keep | |
| 52 | | Defined (weak) | keep | replace if new strong | keep | keep | keep | keep | |
| 53 | | Common | keep | replace (common → Defined) | pick larger size / stricter align | keep | keep | keep | |
| 54 | | DylibImport | keep | replace (definition shadows import) | keep | keep | keep | keep | |
| 55 | | LazyArchive | **fetch** | replace | replace | replace | keep first | replace | |
| 56 | | LazyObject | **fetch** | replace | replace | replace | replace | keep | |
| 57 | |
| 58 | "Fetch" means: load the member/object, enqueue its symbols, mark this entry's transition. |
| 59 | |
| 60 | ### 4. Weak coalescing rules |
| 61 | - `weak_def` + `weak_def` → first wins. |
| 62 | - `weak_def` + strong → strong wins. |
| 63 | - Strong + strong → hard error, diagnostic cites both input paths. |
| 64 | - `weak_ref` without a definition is not an error; the reference resolves to address 0 (handled in relocation pass). |
| 65 | |
| 66 | ### 5. Aliases (N_INDR) |
| 67 | Flattened on insertion: an `Alias(name → aliased)` is resolved by looking up `aliased`. If `aliased` is itself an alias, walk until a non-alias is found; cycle detection with a depth cap. |
| 68 | |
| 69 | ### 6. Transition log |
| 70 | Every `insert` records the old/new kind + input path + (for lazy fetches) the reason the fetch happened. The `-why_live` diagnostic introduced in Sprint 19 reads this log. |
| 71 | |
| 72 | ### 7. Tombstoned symbols |
| 73 | Common → Defined promotion preserves the common size and alignment (so the BSS slot is large enough). Dead-stripping (Sprint 23) can tombstone a Defined without removing it from the table. |
| 74 | |
| 75 | ## Testing Strategy |
| 76 | - Unit tests for every cell in the resolution matrix. Each combination has a named test. |
| 77 | - Synthetic inputs: two `.o`s both defining `_foo` strong → error; one strong + one weak → strong wins; two weak → first wins; common + strong → common replaced. |
| 78 | - Alias-chain cycles detected with a diagnostic, not a stack overflow. |
| 79 | - Interner stress test: 100K unique names, membership queries are O(1) average. |
| 80 | |
| 81 | ## Definition of Done |
| 82 | - Every matrix cell has a passing test. |
| 83 | - Weak coalescing matches `ld` on a corpus of 20+ scenarios (differential test: both linkers produce the same `nm` output). |
| 84 | - Alias flattening correct and cycle-safe. |
| 85 | - Transition log surfaces replacement causes for `-why_live`. |