Sprint 2: Sections, Symbols, String Tables
Prerequisites
Sprint 1 — header + load commands parsed.
Goals
Decode section payloads, the symbol table (nlist_64), and the string table. Expose the full section/symbol/string model that later sprints build on.
Closeout note: tests/reader_malformed_stress.rs now also covers malformed
symbol/string-table variants derived from real corpus objects so the reader's
symbol and string surfaces are exercised under targeted bad-input cases, not
just hand-written unit fixtures. tests/reader_tool_parity.rs now also checks
symbol classification against nm -a and raw relocation tables against
otool -r across the afs-as corpus.
Deliverables
1. Section attributes and kinds
afs-ld/src/section.rs — SectionKind mirrors afs-as's but richer on the reader side (we receive inputs with flags already set):
pub enum SectionKind {
Text, CStringLiterals, Literal4, Literal8, Literal16,
ConstData, Data, ZeroFill, GbZeroFill,
ThreadLocalRegular, ThreadLocalZerofill,
ThreadLocalVariables, ThreadLocalInitPointers,
CompactUnwind, EhFrame, Coalesced,
Regular, Unknown,
}
pub fn kind_from_flags(flags: u32) -> SectionKind; // S_* attribute bits
Respect all S_ATTR_* flags and section-type nibble (flags & 0xff).
2. Section content slicing
InputSection struct: segment, name, kind, addr, size, align (log2), flags, raw data: &[u8] borrowed from the mmap'd input, plus the raw relocation entries as &[u8] (decoded in Sprint 3). For S_ZEROFILL / S_THREAD_LOCAL_ZEROFILL, data is empty; size is virtual.
3. nlist_64 and symbol flags
afs-ld/src/symbol.rs:
pub const N_STAB: u8 = 0xe0;
pub const N_PEXT: u8 = 0x10;
pub const N_TYPE: u8 = 0x0e; // mask
pub const N_EXT: u8 = 0x01;
pub const N_UNDF: u8 = 0x0;
pub const N_ABS: u8 = 0x2;
pub const N_SECT: u8 = 0xe;
pub const N_INDR: u8 = 0xa;
pub const N_NO_DEAD_STRIP: u16 = 0x0020;
pub const N_WEAK_REF: u16 = 0x0040;
pub const N_WEAK_DEF: u16 = 0x0080;
pub const N_ARM_THUMB_DEF: u16 = 0x0008;
pub const N_SYMBOL_RESOLVER: u16 = 0x0100;
pub struct RawNlist {
pub strx: u32,
pub n_type: u8, pub n_sect: u8, pub n_desc: u16,
pub n_value: u64,
}
pub struct InputSymbol<'a> {
pub name: &'a str,
pub kind: SymKind, // Undef, Abs, SectLocal, SectExt, PExt, Indirect
pub weak_ref: bool, pub weak_def: bool,
pub no_dead_strip: bool, pub private_extern: bool,
pub sect_idx: u8, pub value: u64,
pub common_align_pow2: Option<u8>, // from n_desc bits 8..15 when UNDF + value != 0
}
Common symbols detected the way afs-as emits them: N_UNDF | N_EXT with nonzero n_value encoding the size and n_desc >> 8 encoding alignment.
4. Indirect (N_INDR) pass-through
Alias symbols: record the aliased name from the string table via n_value used as a strx into the string table. Resolution lives in Sprint 7; this sprint just surfaces the data.
5. String table reader
StringTable wraps the raw bytes of __LINKEDIT string table, exposes name_at(strx: u32) -> &str, validates null termination, gracefully handles the suffix-dedup trick afs-as uses ("_foo\0" can overlap with a later "_bar_foo\0" by pointing mid-string).
6. DYSYMTAB partitioning
Decode the partition (ilocalsym, nlocalsym), (iextdefsym, nextdefsym), (iundefsym, nundefsym). Record toc, modtab, extrefsym, indirectsymoff/nindirectsyms, extreloff, locreloff offsets for later phases (most are for dylibs).
7. Input file model
afs-ld/src/input.rs:
pub struct ObjectFile {
pub path: PathBuf,
pub header: MachHeader64,
pub commands: Vec<LoadCommand>,
pub sections: Vec<InputSection>,
pub symbols: Vec<InputSymbol>,
pub strings: StringTable,
pub dysymtab: DysymtabView,
}
Testing Strategy
- Round-trip: parse every section/symbol/string from the afs-as corpus; re-emit; match bytes.
- Diffing against
nm -aandotool -rfor symbols and relocation offsets (relocation bodies come in Sprint 3). - Edge cases: empty
__bss, tentative common with 16-byte alignment, weak-def withN_NO_DEAD_STRIP, indirect symbol chains. - Fuzz: malformed nlist entries (strx out of bounds, n_sect out of range, invalid n_type bits) produce sourced diagnostics, never panics.
Definition of Done
- Every symbol attribute afs-as can emit is recognized and round-trips.
- Common symbols surface with correct size and alignment.
- String table reader handles suffix-dedup overlaps correctly.
- Corpus-wide symbol and section parity against
nm -a/otool -v.
View source
| 1 | # Sprint 2: Sections, Symbols, String Tables |
| 2 | |
| 3 | ## Prerequisites |
| 4 | Sprint 1 — header + load commands parsed. |
| 5 | |
| 6 | ## Goals |
| 7 | Decode section payloads, the symbol table (nlist_64), and the string table. Expose the full section/symbol/string model that later sprints build on. |
| 8 | |
| 9 | Closeout note: `tests/reader_malformed_stress.rs` now also covers malformed |
| 10 | symbol/string-table variants derived from real corpus objects so the reader's |
| 11 | symbol and string surfaces are exercised under targeted bad-input cases, not |
| 12 | just hand-written unit fixtures. `tests/reader_tool_parity.rs` now also checks |
| 13 | symbol classification against `nm -a` and raw relocation tables against |
| 14 | `otool -r` across the afs-as corpus. |
| 15 | |
| 16 | ## Deliverables |
| 17 | |
| 18 | ### 1. Section attributes and kinds |
| 19 | `afs-ld/src/section.rs` — `SectionKind` mirrors afs-as's but richer on the reader side (we receive inputs with flags already set): |
| 20 | |
| 21 | ```rust |
| 22 | pub enum SectionKind { |
| 23 | Text, CStringLiterals, Literal4, Literal8, Literal16, |
| 24 | ConstData, Data, ZeroFill, GbZeroFill, |
| 25 | ThreadLocalRegular, ThreadLocalZerofill, |
| 26 | ThreadLocalVariables, ThreadLocalInitPointers, |
| 27 | CompactUnwind, EhFrame, Coalesced, |
| 28 | Regular, Unknown, |
| 29 | } |
| 30 | |
| 31 | pub fn kind_from_flags(flags: u32) -> SectionKind; // S_* attribute bits |
| 32 | ``` |
| 33 | |
| 34 | Respect all `S_ATTR_*` flags and section-type nibble (`flags & 0xff`). |
| 35 | |
| 36 | ### 2. Section content slicing |
| 37 | `InputSection` struct: segment, name, kind, addr, size, align (log2), flags, raw `data: &[u8]` borrowed from the mmap'd input, plus the raw relocation entries as `&[u8]` (decoded in Sprint 3). For `S_ZEROFILL` / `S_THREAD_LOCAL_ZEROFILL`, `data` is empty; size is virtual. |
| 38 | |
| 39 | ### 3. nlist_64 and symbol flags |
| 40 | `afs-ld/src/symbol.rs`: |
| 41 | |
| 42 | ```rust |
| 43 | pub const N_STAB: u8 = 0xe0; |
| 44 | pub const N_PEXT: u8 = 0x10; |
| 45 | pub const N_TYPE: u8 = 0x0e; // mask |
| 46 | pub const N_EXT: u8 = 0x01; |
| 47 | |
| 48 | pub const N_UNDF: u8 = 0x0; |
| 49 | pub const N_ABS: u8 = 0x2; |
| 50 | pub const N_SECT: u8 = 0xe; |
| 51 | pub const N_INDR: u8 = 0xa; |
| 52 | |
| 53 | pub const N_NO_DEAD_STRIP: u16 = 0x0020; |
| 54 | pub const N_WEAK_REF: u16 = 0x0040; |
| 55 | pub const N_WEAK_DEF: u16 = 0x0080; |
| 56 | pub const N_ARM_THUMB_DEF: u16 = 0x0008; |
| 57 | pub const N_SYMBOL_RESOLVER: u16 = 0x0100; |
| 58 | |
| 59 | pub struct RawNlist { |
| 60 | pub strx: u32, |
| 61 | pub n_type: u8, pub n_sect: u8, pub n_desc: u16, |
| 62 | pub n_value: u64, |
| 63 | } |
| 64 | |
| 65 | pub struct InputSymbol<'a> { |
| 66 | pub name: &'a str, |
| 67 | pub kind: SymKind, // Undef, Abs, SectLocal, SectExt, PExt, Indirect |
| 68 | pub weak_ref: bool, pub weak_def: bool, |
| 69 | pub no_dead_strip: bool, pub private_extern: bool, |
| 70 | pub sect_idx: u8, pub value: u64, |
| 71 | pub common_align_pow2: Option<u8>, // from n_desc bits 8..15 when UNDF + value != 0 |
| 72 | } |
| 73 | ``` |
| 74 | |
| 75 | Common symbols detected the way afs-as emits them: `N_UNDF | N_EXT` with nonzero `n_value` encoding the size and `n_desc >> 8` encoding alignment. |
| 76 | |
| 77 | ### 4. Indirect (N_INDR) pass-through |
| 78 | Alias symbols: record the aliased name from the string table via `n_value` used as a strx into the string table. Resolution lives in Sprint 7; this sprint just surfaces the data. |
| 79 | |
| 80 | ### 5. String table reader |
| 81 | `StringTable` wraps the raw bytes of `__LINKEDIT` string table, exposes `name_at(strx: u32) -> &str`, validates null termination, gracefully handles the suffix-dedup trick afs-as uses (`"_foo\0"` can overlap with a later `"_bar_foo\0"` by pointing mid-string). |
| 82 | |
| 83 | ### 6. DYSYMTAB partitioning |
| 84 | Decode the partition `(ilocalsym, nlocalsym)`, `(iextdefsym, nextdefsym)`, `(iundefsym, nundefsym)`. Record `toc`, `modtab`, `extrefsym`, `indirectsymoff/nindirectsyms`, `extreloff`, `locreloff` offsets for later phases (most are for dylibs). |
| 85 | |
| 86 | ### 7. Input file model |
| 87 | `afs-ld/src/input.rs`: |
| 88 | |
| 89 | ```rust |
| 90 | pub struct ObjectFile { |
| 91 | pub path: PathBuf, |
| 92 | pub header: MachHeader64, |
| 93 | pub commands: Vec<LoadCommand>, |
| 94 | pub sections: Vec<InputSection>, |
| 95 | pub symbols: Vec<InputSymbol>, |
| 96 | pub strings: StringTable, |
| 97 | pub dysymtab: DysymtabView, |
| 98 | } |
| 99 | ``` |
| 100 | |
| 101 | ## Testing Strategy |
| 102 | - Round-trip: parse every section/symbol/string from the afs-as corpus; re-emit; match bytes. |
| 103 | - Diffing against `nm -a` and `otool -r` for symbols and relocation offsets (relocation bodies come in Sprint 3). |
| 104 | - Edge cases: empty `__bss`, tentative common with 16-byte alignment, weak-def with `N_NO_DEAD_STRIP`, indirect symbol chains. |
| 105 | - Fuzz: malformed nlist entries (strx out of bounds, n_sect out of range, invalid n_type bits) produce sourced diagnostics, never panics. |
| 106 | |
| 107 | ## Definition of Done |
| 108 | - Every symbol attribute afs-as can emit is recognized and round-trips. |
| 109 | - Common symbols surface with correct size and alignment. |
| 110 | - String table reader handles suffix-dedup overlaps correctly. |
| 111 | - Corpus-wide symbol and section parity against `nm -a` / `otool -v`. |