Sprint 14: LC_SYMTAB / LC_DYSYMTAB / String Table
Prerequisites
Sprints 10–13 — layout, relocs, GOT/stubs/TLV all emitted.
Goals
Build the final symbol table, string table, and LC_DYSYMTAB partitioning expected by dyld. Byte-level matches with ld's layout on simple inputs.
Deliverables
1. Symbol table partitioning
dyld requires symbols in this order inside LC_SYMTAB:
- Locals (ilocalsym..ilocalsym+nlocalsym): private Defined +
N_PEXTprivate-external symbols, debug stabs (we have none from afs-as today),N_STABentries. - External defined (iextdefsym..iextdefsym+nextdefsym):
N_EXTDefined symbols sorted by name for dyld lookups. - Undefined (iundefsym..iundefsym+nundefsym): dylib imports, sorted by name.
LC_DYSYMTAB records each partition's start and count.
2. Symbol entry construction
Per output symbol, emit an nlist_64:
strx: offset into the string table
n_type: N_SECT | N_EXT for external Defined;
N_SECT | N_PEXT for private Defined;
N_UNDF | N_EXT for undefined / dylib import;
N_ABS for absolute
n_sect: 1-based index into the section-table-in-header order; 0 for UNDF/ABS
n_desc: for UNDF: high 16 bits = library ordinal (1-based) or special (0..-3)
N_WEAK_REF | N_WEAK_DEF | N_NO_DEAD_STRIP as appropriate
n_value: Defined's VM address; UNDF = 0
Two-level namespace: every dylib-imported symbol gets the ordinal of its DylibFile in n_desc's high 16 bits. Flat lookup = 0; special ordinals per <mach-o/nlist.h>.
3. String table
- Starts with a null byte at offset 0 (dyld-required).
- All symbol names follow, each null-terminated.
- Suffix-dedup like afs-as: sort names by reverse-lexicographic suffix order and reuse trailing bytes where possible. Cheap space win and preserves the style contract with afs-as.
- 8-byte pad at end.
4. Indirect symbol table
Already populated by Sprint 12 via GOT/stubs/lazy-pointers. Lives in __LINKEDIT, pointed at by LC_DYSYMTAB.indirectsymoff / nindirectsyms. Each entry is a u32:
- Symbol-table index for symbols in
__stubs,__la_symbol_ptr,__got. - Special sentinel
INDIRECT_SYMBOL_LOCAL=0x80000000for entries pointing at local symbols (not exported). - Special sentinel
INDIRECT_SYMBOL_ABS=0x40000000for absolute symbols.
5. Local-symbol stripping (-x)
ld supports -x to strip local symbols from the output. We record the flag (Sprint 19 wires CLI) and at emission time drop locals from the symbol table. Relocs (if any were external-only) and debug info are unaffected. If -x is not set, emit all locals.
6. Relocations in the output
For MH_EXECUTE and MH_DYLIB, dyld-era outputs don't emit per-section relocations — LC_DYLD_INFO (or chained fixups) does that job. afs-ld writes zero nreloc/reloff on output sections. (For MH_OBJECT, which we're not emitting, it would matter.)
7. File-offset sequencing in __LINKEDIT
__LINKEDIT data layout order ld uses (we match for differential ease):
- Chained fixups blob (if present) or dyld-info opcode streams.
- Function starts blob.
- Data-in-code blob.
- Symbol table (
nsyms * 16bytes). - Indirect symbol table.
- String table.
- Code signature.
Each block aligned to 8 bytes; __LINKEDIT itself page-aligned. LC_SYMTAB, LC_DYSYMTAB, LC_FUNCTION_STARTS, LC_DATA_IN_CODE, LC_DYLD_INFO_ONLY, LC_CODE_SIGNATURE all point into this region with file offsets.
Testing Strategy
- Build a fixture with one local, one external Defined, one undefined dylib import. Verify
LC_SYMTAB/LC_DYSYMTABpartitions matchld's output exactly. - String-table dedup: two symbols
_afs_array_sumand_array_sumshare suffix bytes. - Two-level namespace ordinals assigned in load-command order; mismatches produce hard errors when the dylib isn't listed.
- Differential: symbol-table byte-level match for every staging fixture.
Definition of Done
nm -voutput identical (modulo address offsets allowed by differential harness) between afs-ld andldon all staging fixtures.LC_DYSYMTABpartition boundaries exact.- Indirect symbol table entries point to the correct nlist indices for stubs, lazy pointers, and GOT.
- String-table byte length within 5% of
ld's (suffix-dedup parity).
View source
| 1 | # Sprint 14: LC_SYMTAB / LC_DYSYMTAB / String Table |
| 2 | |
| 3 | ## Prerequisites |
| 4 | Sprints 10–13 — layout, relocs, GOT/stubs/TLV all emitted. |
| 5 | |
| 6 | ## Goals |
| 7 | Build the final symbol table, string table, and `LC_DYSYMTAB` partitioning expected by dyld. Byte-level matches with `ld`'s layout on simple inputs. |
| 8 | |
| 9 | ## Deliverables |
| 10 | |
| 11 | ### 1. Symbol table partitioning |
| 12 | dyld requires symbols in this order inside `LC_SYMTAB`: |
| 13 | |
| 14 | 1. **Locals** (ilocalsym..ilocalsym+nlocalsym): private Defined + `N_PEXT` private-external symbols, debug stabs (we have none from afs-as today), `N_STAB` entries. |
| 15 | 2. **External defined** (iextdefsym..iextdefsym+nextdefsym): `N_EXT` Defined symbols sorted by name for dyld lookups. |
| 16 | 3. **Undefined** (iundefsym..iundefsym+nundefsym): dylib imports, sorted by name. |
| 17 | |
| 18 | `LC_DYSYMTAB` records each partition's start and count. |
| 19 | |
| 20 | ### 2. Symbol entry construction |
| 21 | Per output symbol, emit an `nlist_64`: |
| 22 | |
| 23 | ``` |
| 24 | strx: offset into the string table |
| 25 | n_type: N_SECT | N_EXT for external Defined; |
| 26 | N_SECT | N_PEXT for private Defined; |
| 27 | N_UNDF | N_EXT for undefined / dylib import; |
| 28 | N_ABS for absolute |
| 29 | n_sect: 1-based index into the section-table-in-header order; 0 for UNDF/ABS |
| 30 | n_desc: for UNDF: high 16 bits = library ordinal (1-based) or special (0..-3) |
| 31 | N_WEAK_REF | N_WEAK_DEF | N_NO_DEAD_STRIP as appropriate |
| 32 | n_value: Defined's VM address; UNDF = 0 |
| 33 | ``` |
| 34 | |
| 35 | Two-level namespace: every dylib-imported symbol gets the ordinal of its DylibFile in `n_desc`'s high 16 bits. Flat lookup = 0; special ordinals per `<mach-o/nlist.h>`. |
| 36 | |
| 37 | ### 3. String table |
| 38 | - Starts with a null byte at offset 0 (dyld-required). |
| 39 | - All symbol names follow, each null-terminated. |
| 40 | - Suffix-dedup like afs-as: sort names by reverse-lexicographic suffix order and reuse trailing bytes where possible. Cheap space win and preserves the style contract with afs-as. |
| 41 | - 8-byte pad at end. |
| 42 | |
| 43 | ### 4. Indirect symbol table |
| 44 | Already populated by Sprint 12 via GOT/stubs/lazy-pointers. Lives in `__LINKEDIT`, pointed at by `LC_DYSYMTAB.indirectsymoff / nindirectsyms`. Each entry is a u32: |
| 45 | - Symbol-table index for symbols in `__stubs`, `__la_symbol_ptr`, `__got`. |
| 46 | - Special sentinel `INDIRECT_SYMBOL_LOCAL=0x80000000` for entries pointing at local symbols (not exported). |
| 47 | - Special sentinel `INDIRECT_SYMBOL_ABS=0x40000000` for absolute symbols. |
| 48 | |
| 49 | ### 5. Local-symbol stripping (`-x`) |
| 50 | `ld` supports `-x` to strip local symbols from the output. We record the flag (Sprint 19 wires CLI) and at emission time drop locals from the symbol table. Relocs (if any were external-only) and debug info are unaffected. If `-x` is not set, emit all locals. |
| 51 | |
| 52 | ### 6. Relocations in the output |
| 53 | For `MH_EXECUTE` and `MH_DYLIB`, dyld-era outputs don't emit per-section relocations — `LC_DYLD_INFO` (or chained fixups) does that job. afs-ld writes zero `nreloc`/`reloff` on output sections. (For `MH_OBJECT`, which we're not emitting, it would matter.) |
| 54 | |
| 55 | ### 7. File-offset sequencing in __LINKEDIT |
| 56 | `__LINKEDIT` data layout order ld uses (we match for differential ease): |
| 57 | 1. Chained fixups blob (if present) or dyld-info opcode streams. |
| 58 | 2. Function starts blob. |
| 59 | 3. Data-in-code blob. |
| 60 | 4. Symbol table (`nsyms * 16` bytes). |
| 61 | 5. Indirect symbol table. |
| 62 | 6. String table. |
| 63 | 7. Code signature. |
| 64 | |
| 65 | Each block aligned to 8 bytes; `__LINKEDIT` itself page-aligned. `LC_SYMTAB`, `LC_DYSYMTAB`, `LC_FUNCTION_STARTS`, `LC_DATA_IN_CODE`, `LC_DYLD_INFO_ONLY`, `LC_CODE_SIGNATURE` all point into this region with file offsets. |
| 66 | |
| 67 | ## Testing Strategy |
| 68 | - Build a fixture with one local, one external Defined, one undefined dylib import. Verify `LC_SYMTAB` / `LC_DYSYMTAB` partitions match `ld`'s output exactly. |
| 69 | - String-table dedup: two symbols `_afs_array_sum` and `_array_sum` share suffix bytes. |
| 70 | - Two-level namespace ordinals assigned in load-command order; mismatches produce hard errors when the dylib isn't listed. |
| 71 | - Differential: symbol-table byte-level match for every staging fixture. |
| 72 | |
| 73 | ## Definition of Done |
| 74 | - `nm -v` output identical (modulo address offsets allowed by differential harness) between afs-ld and `ld` on all staging fixtures. |
| 75 | - `LC_DYSYMTAB` partition boundaries exact. |
| 76 | - Indirect symbol table entries point to the correct nlist indices for stubs, lazy pointers, and GOT. |
| 77 | - String-table byte length within 5% of `ld`'s (suffix-dedup parity). |