markdown · 4641 bytes Raw Blame History

Sprint 2: Sections, Symbols, String Tables

Prerequisites

Sprint 1 — header + load commands parsed.

Goals

Decode section payloads, the symbol table (nlist_64), and the string table. Expose the full section/symbol/string model that later sprints build on.

Closeout note: tests/reader_malformed_stress.rs now also covers malformed symbol/string-table variants derived from real corpus objects so the reader's symbol and string surfaces are exercised under targeted bad-input cases, not just hand-written unit fixtures. tests/reader_tool_parity.rs now also checks symbol classification against nm -a and raw relocation tables against otool -r across the afs-as corpus.

Deliverables

1. Section attributes and kinds

afs-ld/src/section.rsSectionKind mirrors afs-as's but richer on the reader side (we receive inputs with flags already set):

pub enum SectionKind {
    Text, CStringLiterals, Literal4, Literal8, Literal16,
    ConstData, Data, ZeroFill, GbZeroFill,
    ThreadLocalRegular, ThreadLocalZerofill,
    ThreadLocalVariables, ThreadLocalInitPointers,
    CompactUnwind, EhFrame, Coalesced,
    Regular, Unknown,
}

pub fn kind_from_flags(flags: u32) -> SectionKind; // S_* attribute bits

Respect all S_ATTR_* flags and section-type nibble (flags & 0xff).

2. Section content slicing

InputSection struct: segment, name, kind, addr, size, align (log2), flags, raw data: &[u8] borrowed from the mmap'd input, plus the raw relocation entries as &[u8] (decoded in Sprint 3). For S_ZEROFILL / S_THREAD_LOCAL_ZEROFILL, data is empty; size is virtual.

3. nlist_64 and symbol flags

afs-ld/src/symbol.rs:

pub const N_STAB: u8 = 0xe0;
pub const N_PEXT: u8 = 0x10;
pub const N_TYPE: u8 = 0x0e; // mask
pub const N_EXT:  u8 = 0x01;

pub const N_UNDF: u8 = 0x0;
pub const N_ABS:  u8 = 0x2;
pub const N_SECT: u8 = 0xe;
pub const N_INDR: u8 = 0xa;

pub const N_NO_DEAD_STRIP: u16 = 0x0020;
pub const N_WEAK_REF:      u16 = 0x0040;
pub const N_WEAK_DEF:      u16 = 0x0080;
pub const N_ARM_THUMB_DEF: u16 = 0x0008;
pub const N_SYMBOL_RESOLVER: u16 = 0x0100;

pub struct RawNlist {
    pub strx: u32,
    pub n_type: u8, pub n_sect: u8, pub n_desc: u16,
    pub n_value: u64,
}

pub struct InputSymbol<'a> {
    pub name: &'a str,
    pub kind: SymKind,                // Undef, Abs, SectLocal, SectExt, PExt, Indirect
    pub weak_ref: bool, pub weak_def: bool,
    pub no_dead_strip: bool, pub private_extern: bool,
    pub sect_idx: u8, pub value: u64,
    pub common_align_pow2: Option<u8>, // from n_desc bits 8..15 when UNDF + value != 0
}

Common symbols detected the way afs-as emits them: N_UNDF | N_EXT with nonzero n_value encoding the size and n_desc >> 8 encoding alignment.

4. Indirect (N_INDR) pass-through

Alias symbols: record the aliased name from the string table via n_value used as a strx into the string table. Resolution lives in Sprint 7; this sprint just surfaces the data.

5. String table reader

StringTable wraps the raw bytes of __LINKEDIT string table, exposes name_at(strx: u32) -> &str, validates null termination, gracefully handles the suffix-dedup trick afs-as uses ("_foo\0" can overlap with a later "_bar_foo\0" by pointing mid-string).

6. DYSYMTAB partitioning

Decode the partition (ilocalsym, nlocalsym), (iextdefsym, nextdefsym), (iundefsym, nundefsym). Record toc, modtab, extrefsym, indirectsymoff/nindirectsyms, extreloff, locreloff offsets for later phases (most are for dylibs).

7. Input file model

afs-ld/src/input.rs:

pub struct ObjectFile {
    pub path: PathBuf,
    pub header: MachHeader64,
    pub commands: Vec<LoadCommand>,
    pub sections: Vec<InputSection>,
    pub symbols: Vec<InputSymbol>,
    pub strings: StringTable,
    pub dysymtab: DysymtabView,
}

Testing Strategy

  • Round-trip: parse every section/symbol/string from the afs-as corpus; re-emit; match bytes.
  • Diffing against nm -a and otool -r for symbols and relocation offsets (relocation bodies come in Sprint 3).
  • Edge cases: empty __bss, tentative common with 16-byte alignment, weak-def with N_NO_DEAD_STRIP, indirect symbol chains.
  • Fuzz: malformed nlist entries (strx out of bounds, n_sect out of range, invalid n_type bits) produce sourced diagnostics, never panics.

Definition of Done

  • Every symbol attribute afs-as can emit is recognized and round-trips.
  • Common symbols surface with correct size and alignment.
  • String table reader handles suffix-dedup overlaps correctly.
  • Corpus-wide symbol and section parity against nm -a / otool -v.
View source
1 # Sprint 2: Sections, Symbols, String Tables
2
3 ## Prerequisites
4 Sprint 1 — header + load commands parsed.
5
6 ## Goals
7 Decode section payloads, the symbol table (nlist_64), and the string table. Expose the full section/symbol/string model that later sprints build on.
8
9 Closeout note: `tests/reader_malformed_stress.rs` now also covers malformed
10 symbol/string-table variants derived from real corpus objects so the reader's
11 symbol and string surfaces are exercised under targeted bad-input cases, not
12 just hand-written unit fixtures. `tests/reader_tool_parity.rs` now also checks
13 symbol classification against `nm -a` and raw relocation tables against
14 `otool -r` across the afs-as corpus.
15
16 ## Deliverables
17
18 ### 1. Section attributes and kinds
19 `afs-ld/src/section.rs``SectionKind` mirrors afs-as's but richer on the reader side (we receive inputs with flags already set):
20
21 ```rust
22 pub enum SectionKind {
23 Text, CStringLiterals, Literal4, Literal8, Literal16,
24 ConstData, Data, ZeroFill, GbZeroFill,
25 ThreadLocalRegular, ThreadLocalZerofill,
26 ThreadLocalVariables, ThreadLocalInitPointers,
27 CompactUnwind, EhFrame, Coalesced,
28 Regular, Unknown,
29 }
30
31 pub fn kind_from_flags(flags: u32) -> SectionKind; // S_* attribute bits
32 ```
33
34 Respect all `S_ATTR_*` flags and section-type nibble (`flags & 0xff`).
35
36 ### 2. Section content slicing
37 `InputSection` struct: segment, name, kind, addr, size, align (log2), flags, raw `data: &[u8]` borrowed from the mmap'd input, plus the raw relocation entries as `&[u8]` (decoded in Sprint 3). For `S_ZEROFILL` / `S_THREAD_LOCAL_ZEROFILL`, `data` is empty; size is virtual.
38
39 ### 3. nlist_64 and symbol flags
40 `afs-ld/src/symbol.rs`:
41
42 ```rust
43 pub const N_STAB: u8 = 0xe0;
44 pub const N_PEXT: u8 = 0x10;
45 pub const N_TYPE: u8 = 0x0e; // mask
46 pub const N_EXT: u8 = 0x01;
47
48 pub const N_UNDF: u8 = 0x0;
49 pub const N_ABS: u8 = 0x2;
50 pub const N_SECT: u8 = 0xe;
51 pub const N_INDR: u8 = 0xa;
52
53 pub const N_NO_DEAD_STRIP: u16 = 0x0020;
54 pub const N_WEAK_REF: u16 = 0x0040;
55 pub const N_WEAK_DEF: u16 = 0x0080;
56 pub const N_ARM_THUMB_DEF: u16 = 0x0008;
57 pub const N_SYMBOL_RESOLVER: u16 = 0x0100;
58
59 pub struct RawNlist {
60 pub strx: u32,
61 pub n_type: u8, pub n_sect: u8, pub n_desc: u16,
62 pub n_value: u64,
63 }
64
65 pub struct InputSymbol<'a> {
66 pub name: &'a str,
67 pub kind: SymKind, // Undef, Abs, SectLocal, SectExt, PExt, Indirect
68 pub weak_ref: bool, pub weak_def: bool,
69 pub no_dead_strip: bool, pub private_extern: bool,
70 pub sect_idx: u8, pub value: u64,
71 pub common_align_pow2: Option<u8>, // from n_desc bits 8..15 when UNDF + value != 0
72 }
73 ```
74
75 Common symbols detected the way afs-as emits them: `N_UNDF | N_EXT` with nonzero `n_value` encoding the size and `n_desc >> 8` encoding alignment.
76
77 ### 4. Indirect (N_INDR) pass-through
78 Alias symbols: record the aliased name from the string table via `n_value` used as a strx into the string table. Resolution lives in Sprint 7; this sprint just surfaces the data.
79
80 ### 5. String table reader
81 `StringTable` wraps the raw bytes of `__LINKEDIT` string table, exposes `name_at(strx: u32) -> &str`, validates null termination, gracefully handles the suffix-dedup trick afs-as uses (`"_foo\0"` can overlap with a later `"_bar_foo\0"` by pointing mid-string).
82
83 ### 6. DYSYMTAB partitioning
84 Decode the partition `(ilocalsym, nlocalsym)`, `(iextdefsym, nextdefsym)`, `(iundefsym, nundefsym)`. Record `toc`, `modtab`, `extrefsym`, `indirectsymoff/nindirectsyms`, `extreloff`, `locreloff` offsets for later phases (most are for dylibs).
85
86 ### 7. Input file model
87 `afs-ld/src/input.rs`:
88
89 ```rust
90 pub struct ObjectFile {
91 pub path: PathBuf,
92 pub header: MachHeader64,
93 pub commands: Vec<LoadCommand>,
94 pub sections: Vec<InputSection>,
95 pub symbols: Vec<InputSymbol>,
96 pub strings: StringTable,
97 pub dysymtab: DysymtabView,
98 }
99 ```
100
101 ## Testing Strategy
102 - Round-trip: parse every section/symbol/string from the afs-as corpus; re-emit; match bytes.
103 - Diffing against `nm -a` and `otool -r` for symbols and relocation offsets (relocation bodies come in Sprint 3).
104 - Edge cases: empty `__bss`, tentative common with 16-byte alignment, weak-def with `N_NO_DEAD_STRIP`, indirect symbol chains.
105 - Fuzz: malformed nlist entries (strx out of bounds, n_sect out of range, invalid n_type bits) produce sourced diagnostics, never panics.
106
107 ## Definition of Done
108 - Every symbol attribute afs-as can emit is recognized and round-trips.
109 - Common symbols surface with correct size and alignment.
110 - String table reader handles suffix-dedup overlaps correctly.
111 - Corpus-wide symbol and section parity against `nm -a` / `otool -v`.