markdown · 4221 bytes Raw Blame History

Sprint 2: Sections, Symbols, String Tables

Prerequisites

Sprint 1 — header + load commands parsed.

Goals

Decode section payloads, the symbol table (nlist_64), and the string table. Expose the full section/symbol/string model that later sprints build on.

Deliverables

1. Section attributes and kinds

afs-ld/src/section.rsSectionKind mirrors afs-as's but richer on the reader side (we receive inputs with flags already set):

pub enum SectionKind {
    Text, CStringLiterals, Literal4, Literal8, Literal16,
    ConstData, Data, ZeroFill, GbZeroFill,
    ThreadLocalRegular, ThreadLocalZerofill,
    ThreadLocalVariables, ThreadLocalInitPointers,
    CompactUnwind, EhFrame, Coalesced,
    Regular, Unknown,
}

pub fn kind_from_flags(flags: u32) -> SectionKind; // S_* attribute bits

Respect all S_ATTR_* flags and section-type nibble (flags & 0xff).

2. Section content slicing

InputSection struct: segment, name, kind, addr, size, align (log2), flags, raw data: &[u8] borrowed from the mmap'd input, plus the raw relocation entries as &[u8] (decoded in Sprint 3). For S_ZEROFILL / S_THREAD_LOCAL_ZEROFILL, data is empty; size is virtual.

3. nlist_64 and symbol flags

afs-ld/src/symbol.rs:

pub const N_STAB: u8 = 0xe0;
pub const N_PEXT: u8 = 0x10;
pub const N_TYPE: u8 = 0x0e; // mask
pub const N_EXT:  u8 = 0x01;

pub const N_UNDF: u8 = 0x0;
pub const N_ABS:  u8 = 0x2;
pub const N_SECT: u8 = 0xe;
pub const N_INDR: u8 = 0xa;

pub const N_NO_DEAD_STRIP: u16 = 0x0020;
pub const N_WEAK_REF:      u16 = 0x0040;
pub const N_WEAK_DEF:      u16 = 0x0080;
pub const N_ARM_THUMB_DEF: u16 = 0x0008;
pub const N_SYMBOL_RESOLVER: u16 = 0x0100;

pub struct RawNlist {
    pub strx: u32,
    pub n_type: u8, pub n_sect: u8, pub n_desc: u16,
    pub n_value: u64,
}

pub struct InputSymbol<'a> {
    pub name: &'a str,
    pub kind: SymKind,                // Undef, Abs, SectLocal, SectExt, PExt, Indirect
    pub weak_ref: bool, pub weak_def: bool,
    pub no_dead_strip: bool, pub private_extern: bool,
    pub sect_idx: u8, pub value: u64,
    pub common_align_pow2: Option<u8>, // from n_desc bits 8..15 when UNDF + value != 0
}

Common symbols detected the way afs-as emits them: N_UNDF | N_EXT with nonzero n_value encoding the size and n_desc >> 8 encoding alignment.

4. Indirect (N_INDR) pass-through

Alias symbols: record the aliased name from the string table via n_value used as a strx into the string table. Resolution lives in Sprint 7; this sprint just surfaces the data.

5. String table reader

StringTable wraps the raw bytes of __LINKEDIT string table, exposes name_at(strx: u32) -> &str, validates null termination, gracefully handles the suffix-dedup trick afs-as uses ("_foo\0" can overlap with a later "_bar_foo\0" by pointing mid-string).

6. DYSYMTAB partitioning

Decode the partition (ilocalsym, nlocalsym), (iextdefsym, nextdefsym), (iundefsym, nundefsym). Record toc, modtab, extrefsym, indirectsymoff/nindirectsyms, extreloff, locreloff offsets for later phases (most are for dylibs).

7. Input file model

afs-ld/src/input.rs:

pub struct ObjectFile {
    pub path: PathBuf,
    pub header: MachHeader64,
    pub commands: Vec<LoadCommand>,
    pub sections: Vec<InputSection>,
    pub symbols: Vec<InputSymbol>,
    pub strings: StringTable,
    pub dysymtab: DysymtabView,
}

Testing Strategy

  • Round-trip: parse every section/symbol/string from the afs-as corpus; re-emit; match bytes.
  • Diffing against nm -a and otool -r for symbols and relocation offsets (relocation bodies come in Sprint 3).
  • Edge cases: empty __bss, tentative common with 16-byte alignment, weak-def with N_NO_DEAD_STRIP, indirect symbol chains.
  • Fuzz: malformed nlist entries (strx out of bounds, n_sect out of range, invalid n_type bits) produce sourced diagnostics, never panics.

Definition of Done

  • Every symbol attribute afs-as can emit is recognized and round-trips.
  • Common symbols surface with correct size and alignment.
  • String table reader handles suffix-dedup overlaps correctly.
  • Corpus-wide symbol and section parity against nm -a / otool -v.
View source
1 # Sprint 2: Sections, Symbols, String Tables
2
3 ## Prerequisites
4 Sprint 1 — header + load commands parsed.
5
6 ## Goals
7 Decode section payloads, the symbol table (nlist_64), and the string table. Expose the full section/symbol/string model that later sprints build on.
8
9 ## Deliverables
10
11 ### 1. Section attributes and kinds
12 `afs-ld/src/section.rs``SectionKind` mirrors afs-as's but richer on the reader side (we receive inputs with flags already set):
13
14 ```rust
15 pub enum SectionKind {
16 Text, CStringLiterals, Literal4, Literal8, Literal16,
17 ConstData, Data, ZeroFill, GbZeroFill,
18 ThreadLocalRegular, ThreadLocalZerofill,
19 ThreadLocalVariables, ThreadLocalInitPointers,
20 CompactUnwind, EhFrame, Coalesced,
21 Regular, Unknown,
22 }
23
24 pub fn kind_from_flags(flags: u32) -> SectionKind; // S_* attribute bits
25 ```
26
27 Respect all `S_ATTR_*` flags and section-type nibble (`flags & 0xff`).
28
29 ### 2. Section content slicing
30 `InputSection` struct: segment, name, kind, addr, size, align (log2), flags, raw `data: &[u8]` borrowed from the mmap'd input, plus the raw relocation entries as `&[u8]` (decoded in Sprint 3). For `S_ZEROFILL` / `S_THREAD_LOCAL_ZEROFILL`, `data` is empty; size is virtual.
31
32 ### 3. nlist_64 and symbol flags
33 `afs-ld/src/symbol.rs`:
34
35 ```rust
36 pub const N_STAB: u8 = 0xe0;
37 pub const N_PEXT: u8 = 0x10;
38 pub const N_TYPE: u8 = 0x0e; // mask
39 pub const N_EXT: u8 = 0x01;
40
41 pub const N_UNDF: u8 = 0x0;
42 pub const N_ABS: u8 = 0x2;
43 pub const N_SECT: u8 = 0xe;
44 pub const N_INDR: u8 = 0xa;
45
46 pub const N_NO_DEAD_STRIP: u16 = 0x0020;
47 pub const N_WEAK_REF: u16 = 0x0040;
48 pub const N_WEAK_DEF: u16 = 0x0080;
49 pub const N_ARM_THUMB_DEF: u16 = 0x0008;
50 pub const N_SYMBOL_RESOLVER: u16 = 0x0100;
51
52 pub struct RawNlist {
53 pub strx: u32,
54 pub n_type: u8, pub n_sect: u8, pub n_desc: u16,
55 pub n_value: u64,
56 }
57
58 pub struct InputSymbol<'a> {
59 pub name: &'a str,
60 pub kind: SymKind, // Undef, Abs, SectLocal, SectExt, PExt, Indirect
61 pub weak_ref: bool, pub weak_def: bool,
62 pub no_dead_strip: bool, pub private_extern: bool,
63 pub sect_idx: u8, pub value: u64,
64 pub common_align_pow2: Option<u8>, // from n_desc bits 8..15 when UNDF + value != 0
65 }
66 ```
67
68 Common symbols detected the way afs-as emits them: `N_UNDF | N_EXT` with nonzero `n_value` encoding the size and `n_desc >> 8` encoding alignment.
69
70 ### 4. Indirect (N_INDR) pass-through
71 Alias symbols: record the aliased name from the string table via `n_value` used as a strx into the string table. Resolution lives in Sprint 7; this sprint just surfaces the data.
72
73 ### 5. String table reader
74 `StringTable` wraps the raw bytes of `__LINKEDIT` string table, exposes `name_at(strx: u32) -> &str`, validates null termination, gracefully handles the suffix-dedup trick afs-as uses (`"_foo\0"` can overlap with a later `"_bar_foo\0"` by pointing mid-string).
75
76 ### 6. DYSYMTAB partitioning
77 Decode the partition `(ilocalsym, nlocalsym)`, `(iextdefsym, nextdefsym)`, `(iundefsym, nundefsym)`. Record `toc`, `modtab`, `extrefsym`, `indirectsymoff/nindirectsyms`, `extreloff`, `locreloff` offsets for later phases (most are for dylibs).
78
79 ### 7. Input file model
80 `afs-ld/src/input.rs`:
81
82 ```rust
83 pub struct ObjectFile {
84 pub path: PathBuf,
85 pub header: MachHeader64,
86 pub commands: Vec<LoadCommand>,
87 pub sections: Vec<InputSection>,
88 pub symbols: Vec<InputSymbol>,
89 pub strings: StringTable,
90 pub dysymtab: DysymtabView,
91 }
92 ```
93
94 ## Testing Strategy
95 - Round-trip: parse every section/symbol/string from the afs-as corpus; re-emit; match bytes.
96 - Diffing against `nm -a` and `otool -r` for symbols and relocation offsets (relocation bodies come in Sprint 3).
97 - Edge cases: empty `__bss`, tentative common with 16-byte alignment, weak-def with `N_NO_DEAD_STRIP`, indirect symbol chains.
98 - Fuzz: malformed nlist entries (strx out of bounds, n_sect out of range, invalid n_type bits) produce sourced diagnostics, never panics.
99
100 ## Definition of Done
101 - Every symbol attribute afs-as can emit is recognized and round-trips.
102 - Common symbols surface with correct size and alignment.
103 - String table reader handles suffix-dedup overlaps correctly.
104 - Corpus-wide symbol and section parity against `nm -a` / `otool -v`.