Sprint 2: Sections, Symbols, String Tables
Prerequisites
Sprint 1 — header + load commands parsed.
Goals
Decode section payloads, the symbol table (nlist_64), and the string table. Expose the full section/symbol/string model that later sprints build on.
Deliverables
1. Section attributes and kinds
afs-ld/src/section.rs — SectionKind mirrors afs-as's but richer on the reader side (we receive inputs with flags already set):
pub enum SectionKind {
Text, CStringLiterals, Literal4, Literal8, Literal16,
ConstData, Data, ZeroFill, GbZeroFill,
ThreadLocalRegular, ThreadLocalZerofill,
ThreadLocalVariables, ThreadLocalInitPointers,
CompactUnwind, EhFrame, Coalesced,
Regular, Unknown,
}
pub fn kind_from_flags(flags: u32) -> SectionKind; // S_* attribute bits
Respect all S_ATTR_* flags and section-type nibble (flags & 0xff).
2. Section content slicing
InputSection struct: segment, name, kind, addr, size, align (log2), flags, raw data: &[u8] borrowed from the mmap'd input, plus the raw relocation entries as &[u8] (decoded in Sprint 3). For S_ZEROFILL / S_THREAD_LOCAL_ZEROFILL, data is empty; size is virtual.
3. nlist_64 and symbol flags
afs-ld/src/symbol.rs:
pub const N_STAB: u8 = 0xe0;
pub const N_PEXT: u8 = 0x10;
pub const N_TYPE: u8 = 0x0e; // mask
pub const N_EXT: u8 = 0x01;
pub const N_UNDF: u8 = 0x0;
pub const N_ABS: u8 = 0x2;
pub const N_SECT: u8 = 0xe;
pub const N_INDR: u8 = 0xa;
pub const N_NO_DEAD_STRIP: u16 = 0x0020;
pub const N_WEAK_REF: u16 = 0x0040;
pub const N_WEAK_DEF: u16 = 0x0080;
pub const N_ARM_THUMB_DEF: u16 = 0x0008;
pub const N_SYMBOL_RESOLVER: u16 = 0x0100;
pub struct RawNlist {
pub strx: u32,
pub n_type: u8, pub n_sect: u8, pub n_desc: u16,
pub n_value: u64,
}
pub struct InputSymbol<'a> {
pub name: &'a str,
pub kind: SymKind, // Undef, Abs, SectLocal, SectExt, PExt, Indirect
pub weak_ref: bool, pub weak_def: bool,
pub no_dead_strip: bool, pub private_extern: bool,
pub sect_idx: u8, pub value: u64,
pub common_align_pow2: Option<u8>, // from n_desc bits 8..15 when UNDF + value != 0
}
Common symbols detected the way afs-as emits them: N_UNDF | N_EXT with nonzero n_value encoding the size and n_desc >> 8 encoding alignment.
4. Indirect (N_INDR) pass-through
Alias symbols: record the aliased name from the string table via n_value used as a strx into the string table. Resolution lives in Sprint 7; this sprint just surfaces the data.
5. String table reader
StringTable wraps the raw bytes of __LINKEDIT string table, exposes name_at(strx: u32) -> &str, validates null termination, gracefully handles the suffix-dedup trick afs-as uses ("_foo\0" can overlap with a later "_bar_foo\0" by pointing mid-string).
6. DYSYMTAB partitioning
Decode the partition (ilocalsym, nlocalsym), (iextdefsym, nextdefsym), (iundefsym, nundefsym). Record toc, modtab, extrefsym, indirectsymoff/nindirectsyms, extreloff, locreloff offsets for later phases (most are for dylibs).
7. Input file model
afs-ld/src/input.rs:
pub struct ObjectFile {
pub path: PathBuf,
pub header: MachHeader64,
pub commands: Vec<LoadCommand>,
pub sections: Vec<InputSection>,
pub symbols: Vec<InputSymbol>,
pub strings: StringTable,
pub dysymtab: DysymtabView,
}
Testing Strategy
- Round-trip: parse every section/symbol/string from the afs-as corpus; re-emit; match bytes.
- Diffing against
nm -aandotool -rfor symbols and relocation offsets (relocation bodies come in Sprint 3). - Edge cases: empty
__bss, tentative common with 16-byte alignment, weak-def withN_NO_DEAD_STRIP, indirect symbol chains. - Fuzz: malformed nlist entries (strx out of bounds, n_sect out of range, invalid n_type bits) produce sourced diagnostics, never panics.
Definition of Done
- Every symbol attribute afs-as can emit is recognized and round-trips.
- Common symbols surface with correct size and alignment.
- String table reader handles suffix-dedup overlaps correctly.
- Corpus-wide symbol and section parity against
nm -a/otool -v.
View source
| 1 | # Sprint 2: Sections, Symbols, String Tables |
| 2 | |
| 3 | ## Prerequisites |
| 4 | Sprint 1 — header + load commands parsed. |
| 5 | |
| 6 | ## Goals |
| 7 | Decode section payloads, the symbol table (nlist_64), and the string table. Expose the full section/symbol/string model that later sprints build on. |
| 8 | |
| 9 | ## Deliverables |
| 10 | |
| 11 | ### 1. Section attributes and kinds |
| 12 | `afs-ld/src/section.rs` — `SectionKind` mirrors afs-as's but richer on the reader side (we receive inputs with flags already set): |
| 13 | |
| 14 | ```rust |
| 15 | pub enum SectionKind { |
| 16 | Text, CStringLiterals, Literal4, Literal8, Literal16, |
| 17 | ConstData, Data, ZeroFill, GbZeroFill, |
| 18 | ThreadLocalRegular, ThreadLocalZerofill, |
| 19 | ThreadLocalVariables, ThreadLocalInitPointers, |
| 20 | CompactUnwind, EhFrame, Coalesced, |
| 21 | Regular, Unknown, |
| 22 | } |
| 23 | |
| 24 | pub fn kind_from_flags(flags: u32) -> SectionKind; // S_* attribute bits |
| 25 | ``` |
| 26 | |
| 27 | Respect all `S_ATTR_*` flags and section-type nibble (`flags & 0xff`). |
| 28 | |
| 29 | ### 2. Section content slicing |
| 30 | `InputSection` struct: segment, name, kind, addr, size, align (log2), flags, raw `data: &[u8]` borrowed from the mmap'd input, plus the raw relocation entries as `&[u8]` (decoded in Sprint 3). For `S_ZEROFILL` / `S_THREAD_LOCAL_ZEROFILL`, `data` is empty; size is virtual. |
| 31 | |
| 32 | ### 3. nlist_64 and symbol flags |
| 33 | `afs-ld/src/symbol.rs`: |
| 34 | |
| 35 | ```rust |
| 36 | pub const N_STAB: u8 = 0xe0; |
| 37 | pub const N_PEXT: u8 = 0x10; |
| 38 | pub const N_TYPE: u8 = 0x0e; // mask |
| 39 | pub const N_EXT: u8 = 0x01; |
| 40 | |
| 41 | pub const N_UNDF: u8 = 0x0; |
| 42 | pub const N_ABS: u8 = 0x2; |
| 43 | pub const N_SECT: u8 = 0xe; |
| 44 | pub const N_INDR: u8 = 0xa; |
| 45 | |
| 46 | pub const N_NO_DEAD_STRIP: u16 = 0x0020; |
| 47 | pub const N_WEAK_REF: u16 = 0x0040; |
| 48 | pub const N_WEAK_DEF: u16 = 0x0080; |
| 49 | pub const N_ARM_THUMB_DEF: u16 = 0x0008; |
| 50 | pub const N_SYMBOL_RESOLVER: u16 = 0x0100; |
| 51 | |
| 52 | pub struct RawNlist { |
| 53 | pub strx: u32, |
| 54 | pub n_type: u8, pub n_sect: u8, pub n_desc: u16, |
| 55 | pub n_value: u64, |
| 56 | } |
| 57 | |
| 58 | pub struct InputSymbol<'a> { |
| 59 | pub name: &'a str, |
| 60 | pub kind: SymKind, // Undef, Abs, SectLocal, SectExt, PExt, Indirect |
| 61 | pub weak_ref: bool, pub weak_def: bool, |
| 62 | pub no_dead_strip: bool, pub private_extern: bool, |
| 63 | pub sect_idx: u8, pub value: u64, |
| 64 | pub common_align_pow2: Option<u8>, // from n_desc bits 8..15 when UNDF + value != 0 |
| 65 | } |
| 66 | ``` |
| 67 | |
| 68 | Common symbols detected the way afs-as emits them: `N_UNDF | N_EXT` with nonzero `n_value` encoding the size and `n_desc >> 8` encoding alignment. |
| 69 | |
| 70 | ### 4. Indirect (N_INDR) pass-through |
| 71 | Alias symbols: record the aliased name from the string table via `n_value` used as a strx into the string table. Resolution lives in Sprint 7; this sprint just surfaces the data. |
| 72 | |
| 73 | ### 5. String table reader |
| 74 | `StringTable` wraps the raw bytes of `__LINKEDIT` string table, exposes `name_at(strx: u32) -> &str`, validates null termination, gracefully handles the suffix-dedup trick afs-as uses (`"_foo\0"` can overlap with a later `"_bar_foo\0"` by pointing mid-string). |
| 75 | |
| 76 | ### 6. DYSYMTAB partitioning |
| 77 | Decode the partition `(ilocalsym, nlocalsym)`, `(iextdefsym, nextdefsym)`, `(iundefsym, nundefsym)`. Record `toc`, `modtab`, `extrefsym`, `indirectsymoff/nindirectsyms`, `extreloff`, `locreloff` offsets for later phases (most are for dylibs). |
| 78 | |
| 79 | ### 7. Input file model |
| 80 | `afs-ld/src/input.rs`: |
| 81 | |
| 82 | ```rust |
| 83 | pub struct ObjectFile { |
| 84 | pub path: PathBuf, |
| 85 | pub header: MachHeader64, |
| 86 | pub commands: Vec<LoadCommand>, |
| 87 | pub sections: Vec<InputSection>, |
| 88 | pub symbols: Vec<InputSymbol>, |
| 89 | pub strings: StringTable, |
| 90 | pub dysymtab: DysymtabView, |
| 91 | } |
| 92 | ``` |
| 93 | |
| 94 | ## Testing Strategy |
| 95 | - Round-trip: parse every section/symbol/string from the afs-as corpus; re-emit; match bytes. |
| 96 | - Diffing against `nm -a` and `otool -r` for symbols and relocation offsets (relocation bodies come in Sprint 3). |
| 97 | - Edge cases: empty `__bss`, tentative common with 16-byte alignment, weak-def with `N_NO_DEAD_STRIP`, indirect symbol chains. |
| 98 | - Fuzz: malformed nlist entries (strx out of bounds, n_sect out of range, invalid n_type bits) produce sourced diagnostics, never panics. |
| 99 | |
| 100 | ## Definition of Done |
| 101 | - Every symbol attribute afs-as can emit is recognized and round-trips. |
| 102 | - Common symbols surface with correct size and alignment. |
| 103 | - String table reader handles suffix-dedup overlaps correctly. |
| 104 | - Corpus-wide symbol and section parity against `nm -a` / `otool -v`. |