markdown · 4351 bytes Raw Blame History

Sprint 7: Symbol Model & Table

Prerequisites

Sprints 2, 4, 5, 6 — object, archive, dylib, TBD readers in place.

Goals

A uniform symbol table that fuses definitions from every input kind. Establishes the invariants Sprint 8's resolution pass will preserve.

Deliverables

1. Symbol sum type

afs-ld/src/symbol.rs:

pub enum Symbol {
    Undefined   { name: Istr, origin: InputId,        weak_ref: bool },
    Defined     { name: Istr, origin: InputId, atom:  AtomId, value: u64,
                  weak: bool, private_extern: bool, no_dead_strip: bool },
    Common      { name: Istr, origin: InputId, size: u64, align_pow2: u8 },
    DylibImport { name: Istr, dylib: DylibId, ordinal: u16, weak_import: bool },
    LazyArchive { name: Istr, archive: ArchiveId, member: MemberId },
    LazyObject  { name: Istr, origin: InputId },        // --start-lib / --end-lib
    Alias       { name: Istr, aliased: Istr },          // N_INDR
}

Istr = interned string handle. Interning happens once when a name first enters the table; all comparisons are handle-equality.

2. SymbolTable

pub struct SymbolTable {
    names: StringInterner,
    by_name: HashMap<Istr, SymbolId>,
    symbols: Vec<Symbol>,
    // replacement log for diagnostics + -why_live
    transitions: Vec<Transition>,
}

pub struct Transition { pub at: SymbolId, pub from: SymbolKindTag, pub to: SymbolKindTag, pub cause: Cause }

HashMap is fine for Sprint 7; Sprint 28 may swap in a custom open-addressing table.

3. Insertion semantics

SymbolTable::insert(sym: Symbol) runs the resolution rules inline:

Existing \ New Undefined Defined Common DylibImport LazyArchive LazyObject
vacant insert insert insert insert insert insert
Undefined keep replace replace replace replace replace
Defined (strong) keep error if both strong and same kind keep keep keep keep
Defined (weak) keep replace if new strong keep keep keep keep
Common keep replace (common → Defined) pick larger size / stricter align keep keep keep
DylibImport keep replace (definition shadows import) keep keep keep keep
LazyArchive fetch replace replace replace keep first replace
LazyObject fetch replace replace replace replace keep

"Fetch" means: load the member/object, enqueue its symbols, mark this entry's transition.

4. Weak coalescing rules

  • weak_def + weak_def → first wins.
  • weak_def + strong → strong wins.
  • Strong + strong → hard error, diagnostic cites both input paths.
  • weak_ref without a definition is not an error; the reference resolves to address 0 (handled in relocation pass).

5. Aliases (N_INDR)

Flattened on insertion: an Alias(name → aliased) is resolved by looking up aliased. If aliased is itself an alias, walk until a non-alias is found; cycle detection with a depth cap.

6. Transition log

Every insert records the old/new kind + input path + (for lazy fetches) the reason the fetch happened. The -why_live diagnostic introduced in Sprint 19 reads this log.

7. Tombstoned symbols

Common → Defined promotion preserves the common size and alignment (so the BSS slot is large enough). Dead-stripping (Sprint 23) can tombstone a Defined without removing it from the table.

Testing Strategy

  • Unit tests for every cell in the resolution matrix. Each combination has a named test.
  • Synthetic inputs: two .os both defining _foo strong → error; one strong + one weak → strong wins; two weak → first wins; common + strong → common replaced.
  • Alias-chain cycles detected with a diagnostic, not a stack overflow.
  • Interner stress test: 100K unique names, membership queries are O(1) average.

Definition of Done

  • Every matrix cell has a passing test.
  • Weak coalescing matches ld on a corpus of 20+ scenarios (differential test: both linkers produce the same nm output).
  • Alias flattening correct and cycle-safe.
  • Transition log surfaces replacement causes for -why_live.
View source
1 # Sprint 7: Symbol Model & Table
2
3 ## Prerequisites
4 Sprints 2, 4, 5, 6 — object, archive, dylib, TBD readers in place.
5
6 ## Goals
7 A uniform symbol table that fuses definitions from every input kind. Establishes the invariants Sprint 8's resolution pass will preserve.
8
9 ## Deliverables
10
11 ### 1. `Symbol` sum type
12 `afs-ld/src/symbol.rs`:
13
14 ```rust
15 pub enum Symbol {
16 Undefined { name: Istr, origin: InputId, weak_ref: bool },
17 Defined { name: Istr, origin: InputId, atom: AtomId, value: u64,
18 weak: bool, private_extern: bool, no_dead_strip: bool },
19 Common { name: Istr, origin: InputId, size: u64, align_pow2: u8 },
20 DylibImport { name: Istr, dylib: DylibId, ordinal: u16, weak_import: bool },
21 LazyArchive { name: Istr, archive: ArchiveId, member: MemberId },
22 LazyObject { name: Istr, origin: InputId }, // --start-lib / --end-lib
23 Alias { name: Istr, aliased: Istr }, // N_INDR
24 }
25 ```
26
27 `Istr` = interned string handle. Interning happens once when a name first enters the table; all comparisons are handle-equality.
28
29 ### 2. `SymbolTable`
30 ```rust
31 pub struct SymbolTable {
32 names: StringInterner,
33 by_name: HashMap<Istr, SymbolId>,
34 symbols: Vec<Symbol>,
35 // replacement log for diagnostics + -why_live
36 transitions: Vec<Transition>,
37 }
38
39 pub struct Transition { pub at: SymbolId, pub from: SymbolKindTag, pub to: SymbolKindTag, pub cause: Cause }
40 ```
41
42 `HashMap` is fine for Sprint 7; Sprint 28 may swap in a custom open-addressing table.
43
44 ### 3. Insertion semantics
45 `SymbolTable::insert(sym: Symbol)` runs the resolution rules inline:
46
47 | Existing \ New | Undefined | Defined | Common | DylibImport | LazyArchive | LazyObject |
48 |----------------------|-----------|---------|--------|-------------|-------------|------------|
49 | *vacant* | insert | insert | insert | insert | insert | insert |
50 | Undefined | keep | replace | replace| replace | replace | replace |
51 | Defined (strong) | keep | **error if both strong and same kind** | keep | keep | keep | keep |
52 | Defined (weak) | keep | replace if new strong | keep | keep | keep | keep |
53 | Common | keep | replace (common → Defined) | pick larger size / stricter align | keep | keep | keep |
54 | DylibImport | keep | replace (definition shadows import) | keep | keep | keep | keep |
55 | LazyArchive | **fetch** | replace | replace | replace | keep first | replace |
56 | LazyObject | **fetch** | replace | replace | replace | replace | keep |
57
58 "Fetch" means: load the member/object, enqueue its symbols, mark this entry's transition.
59
60 ### 4. Weak coalescing rules
61 - `weak_def` + `weak_def` → first wins.
62 - `weak_def` + strong → strong wins.
63 - Strong + strong → hard error, diagnostic cites both input paths.
64 - `weak_ref` without a definition is not an error; the reference resolves to address 0 (handled in relocation pass).
65
66 ### 5. Aliases (N_INDR)
67 Flattened on insertion: an `Alias(name → aliased)` is resolved by looking up `aliased`. If `aliased` is itself an alias, walk until a non-alias is found; cycle detection with a depth cap.
68
69 ### 6. Transition log
70 Every `insert` records the old/new kind + input path + (for lazy fetches) the reason the fetch happened. The `-why_live` diagnostic introduced in Sprint 19 reads this log.
71
72 ### 7. Tombstoned symbols
73 Common → Defined promotion preserves the common size and alignment (so the BSS slot is large enough). Dead-stripping (Sprint 23) can tombstone a Defined without removing it from the table.
74
75 ## Testing Strategy
76 - Unit tests for every cell in the resolution matrix. Each combination has a named test.
77 - Synthetic inputs: two `.o`s both defining `_foo` strong → error; one strong + one weak → strong wins; two weak → first wins; common + strong → common replaced.
78 - Alias-chain cycles detected with a diagnostic, not a stack overflow.
79 - Interner stress test: 100K unique names, membership queries are O(1) average.
80
81 ## Definition of Done
82 - Every matrix cell has a passing test.
83 - Weak coalescing matches `ld` on a corpus of 20+ scenarios (differential test: both linkers produce the same `nm` output).
84 - Alias flattening correct and cycle-safe.
85 - Transition log surfaces replacement causes for `-why_live`.