markdown · 3888 bytes Raw Blame History

Sprint 5: Dylibs (MH_DYLIB Binary)

Prerequisites

Sprints 1–3 — Mach-O reading complete.

Goals

Parse binary dylibs (MH_DYLIB). Extract exported symbols via the export trie or LC_DYLD_CHAINED_FIXUPS exports, resolve re-exports through umbrella frameworks, and expose a linkable DylibFile surface.

Deliverables

1. DylibFile model

afs-ld/src/macho/dylib.rs:

pub struct DylibFile {
    pub path: PathBuf,
    pub install_name: String,
    pub current_version: u32,         // X.Y.Z packed
    pub compat_version: u32,
    pub is_umbrella: bool,
    pub load_kind: DylibLoadKind,     // Normal, Weak, Reexport, Upward
    pub ordinal: u16,                 // two-level namespace ordinal
    pub reexports: Vec<PathBuf>,      // LC_REEXPORT_DYLIB paths
    pub exports: ExportTrie,          // resolved during loading
}

pub enum DylibLoadKind { Normal, Weak, Reexport, Upward }

2. Load command decoding

  • LC_ID_DYLIB (for the dylib itself): install_name, timestamp, current_version, compat_version.
  • LC_LOAD_DYLIB: normal dependency.
  • LC_LOAD_WEAK_DYLIB: weak dep (imports allowed to be null at runtime).
  • LC_REEXPORT_DYLIB: dependency whose exports we rebroadcast (umbrella-framework case).
  • LC_LOAD_UPWARD_DYLIB: cyclic dependency escape hatch.

3. Export trie decoder

Export trie lives in __LINKEDIT pointed at by either LC_DYLD_INFO_ONLY.export_off/export_size (classic) or LC_DYLD_CHAINED_FIXUPS.exports_trie_offset (modern). Trie format:

  • Each node: ULEB128 terminal-size, optional terminal payload (flags ULEB + address ULEB, plus re-export or resolver data), then child count, then (edge_string, child_offset_ULEB) pairs.
  • Terminal flags: EXPORT_SYMBOL_FLAGS_KIND_REGULAR/_THREAD_LOCAL/_ABSOLUTE, EXPORT_SYMBOL_FLAGS_WEAK_DEFINITION, EXPORT_SYMBOL_FLAGS_REEXPORT, EXPORT_SYMBOL_FLAGS_STUB_AND_RESOLVER.
pub struct ExportTrie { /* walk-only view */ }
impl ExportTrie {
    pub fn lookup(&self, name: &str) -> Option<ExportEntry>;
    pub fn iter(&self) -> impl Iterator<Item = (String, ExportEntry)>;
}

pub struct ExportEntry {
    pub flags: u32,
    pub address: u64,
    pub reexport: Option<(u16 /*ordinal*/, String /*imported_name*/)>,
    pub resolver: Option<u64>,
}

Walking is recursive; we guard against malformed trees with a depth cap and visited-offset set.

4. Two-level namespace ordinals

Each dylib loaded by path gets an ordinal (1..=N) assigned in load-command order; BIND_SPECIAL_DYLIB_SELF=0, BIND_SPECIAL_DYLIB_MAIN_EXECUTABLE=-1, BIND_SPECIAL_DYLIB_FLAT_LOOKUP=-2, BIND_SPECIAL_DYLIB_WEAK_LOOKUP=-3. When an imported symbol is bound in Sprint 15, we use this ordinal.

5. Re-export resolution

Loading a dylib recursively loads its LC_REEXPORT_DYLIB chain. Names looked up in the umbrella are delegated down the chain. For CoreFoundation / Foundation style umbrella frameworks (not strictly required for armfortas today but landed now to avoid retrofit).

6. SDK path resolution

-syslibroot <SDK> + -l<name> needs to locate ${SDK}/usr/lib/lib<name>.{dylib,tbd}. This sprint establishes the search order; the rest lands in Sprint 19's CLI work.

Testing Strategy

  • Fixtures: tiny hand-built .dylib via the system toolchain (one exported symbol, one re-export). Parsed and exports match nm -g.
  • Differential: load CoreFoundation.tbd in Sprint 6, not here; this sprint uses real binary .dylibs from /usr/lib/ (where present on older macOS) or synthetic ones.
  • Malformed trie: cycle, out-of-bounds child offset, ULEB128 overrun — diagnostics, no panics.

Definition of Done

  • Export trie walker handles real .dylib files correctly.
  • DylibFile constructed with correct install_name, versions, ordinal.
  • Re-exports chained through umbrella fixtures.
  • dyld_info -export <dylib> output matches our export dumper.
View source
1 # Sprint 5: Dylibs (MH_DYLIB Binary)
2
3 ## Prerequisites
4 Sprints 1–3 — Mach-O reading complete.
5
6 ## Goals
7 Parse binary dylibs (`MH_DYLIB`). Extract exported symbols via the export trie or `LC_DYLD_CHAINED_FIXUPS` exports, resolve re-exports through umbrella frameworks, and expose a linkable `DylibFile` surface.
8
9 ## Deliverables
10
11 ### 1. DylibFile model
12 `afs-ld/src/macho/dylib.rs`:
13
14 ```rust
15 pub struct DylibFile {
16 pub path: PathBuf,
17 pub install_name: String,
18 pub current_version: u32, // X.Y.Z packed
19 pub compat_version: u32,
20 pub is_umbrella: bool,
21 pub load_kind: DylibLoadKind, // Normal, Weak, Reexport, Upward
22 pub ordinal: u16, // two-level namespace ordinal
23 pub reexports: Vec<PathBuf>, // LC_REEXPORT_DYLIB paths
24 pub exports: ExportTrie, // resolved during loading
25 }
26
27 pub enum DylibLoadKind { Normal, Weak, Reexport, Upward }
28 ```
29
30 ### 2. Load command decoding
31 - `LC_ID_DYLIB` (for the dylib itself): install_name, timestamp, current_version, compat_version.
32 - `LC_LOAD_DYLIB`: normal dependency.
33 - `LC_LOAD_WEAK_DYLIB`: weak dep (imports allowed to be null at runtime).
34 - `LC_REEXPORT_DYLIB`: dependency whose exports we rebroadcast (umbrella-framework case).
35 - `LC_LOAD_UPWARD_DYLIB`: cyclic dependency escape hatch.
36
37 ### 3. Export trie decoder
38 Export trie lives in `__LINKEDIT` pointed at by either `LC_DYLD_INFO_ONLY.export_off/export_size` (classic) or `LC_DYLD_CHAINED_FIXUPS.exports_trie_offset` (modern). Trie format:
39
40 - Each node: ULEB128 terminal-size, optional terminal payload (flags ULEB + address ULEB, plus re-export or resolver data), then child count, then `(edge_string, child_offset_ULEB)` pairs.
41 - Terminal flags: `EXPORT_SYMBOL_FLAGS_KIND_REGULAR`/`_THREAD_LOCAL`/`_ABSOLUTE`, `EXPORT_SYMBOL_FLAGS_WEAK_DEFINITION`, `EXPORT_SYMBOL_FLAGS_REEXPORT`, `EXPORT_SYMBOL_FLAGS_STUB_AND_RESOLVER`.
42
43 ```rust
44 pub struct ExportTrie { /* walk-only view */ }
45 impl ExportTrie {
46 pub fn lookup(&self, name: &str) -> Option<ExportEntry>;
47 pub fn iter(&self) -> impl Iterator<Item = (String, ExportEntry)>;
48 }
49
50 pub struct ExportEntry {
51 pub flags: u32,
52 pub address: u64,
53 pub reexport: Option<(u16 /*ordinal*/, String /*imported_name*/)>,
54 pub resolver: Option<u64>,
55 }
56 ```
57
58 Walking is recursive; we guard against malformed trees with a depth cap and visited-offset set.
59
60 ### 4. Two-level namespace ordinals
61 Each dylib loaded by path gets an ordinal (1..=N) assigned in load-command order; `BIND_SPECIAL_DYLIB_SELF=0`, `BIND_SPECIAL_DYLIB_MAIN_EXECUTABLE=-1`, `BIND_SPECIAL_DYLIB_FLAT_LOOKUP=-2`, `BIND_SPECIAL_DYLIB_WEAK_LOOKUP=-3`. When an imported symbol is bound in Sprint 15, we use this ordinal.
62
63 ### 5. Re-export resolution
64 Loading a dylib recursively loads its `LC_REEXPORT_DYLIB` chain. Names looked up in the umbrella are delegated down the chain. For CoreFoundation / Foundation style umbrella frameworks (not strictly required for armfortas today but landed now to avoid retrofit).
65
66 ### 6. SDK path resolution
67 `-syslibroot <SDK>` + `-l<name>` needs to locate `${SDK}/usr/lib/lib<name>.{dylib,tbd}`. This sprint establishes the search order; the rest lands in Sprint 19's CLI work.
68
69 ## Testing Strategy
70 - Fixtures: tiny hand-built `.dylib` via the system toolchain (one exported symbol, one re-export). Parsed and exports match `nm -g`.
71 - Differential: load `CoreFoundation.tbd` in Sprint 6, not here; this sprint uses real binary `.dylib`s from `/usr/lib/` (where present on older macOS) or synthetic ones.
72 - Malformed trie: cycle, out-of-bounds child offset, ULEB128 overrun — diagnostics, no panics.
73
74 ## Definition of Done
75 - Export trie walker handles real `.dylib` files correctly.
76 - `DylibFile` constructed with correct install_name, versions, ordinal.
77 - Re-exports chained through umbrella fixtures.
78 - `dyld_info -export <dylib>` output matches our export dumper.