Sprint 15: Classic LC_DYLD_INFO Opcodes
Prerequisites
Sprints 12, 14 — GOT/stubs/lazy-pointers in place, symbol table shaped.
Goals
Generate the four ULEB128 opcode streams and the export trie that dyld reads via LC_DYLD_INFO_ONLY. This is the classic format (macOS 11–13 default) and the -no_fixup_chains path on newer macOS. Chained fixups land in Sprint 15.5.
Deliverables
1. The five streams
LC_DYLD_INFO_ONLY load command points at five blobs in __LINKEDIT:
- rebase_off / rebase_size: rebase opcodes — fix up absolute pointers for ASLR slide.
- bind_off / bind_size: bind opcodes — non-lazy imports from dylibs.
- weak_bind_off / weak_bind_size: weak-bind opcodes — C++-style weak symbol coalescing at runtime.
- lazy_bind_off / lazy_bind_size: lazy-bind opcodes — one block per stub_helper entry.
- export_off / export_size: export trie — what this image exports to other images.
2. Opcode encoder
afs-ld/src/synth/dyld_info.rs:
pub struct OpcodeStream { buf: Vec<u8> }
impl OpcodeStream {
pub fn uleb(&mut self, v: u64);
pub fn sleb(&mut self, v: i64);
pub fn string(&mut self, s: &str); // null-terminated
pub fn byte(&mut self, op_and_imm: u8);
pub fn done(&mut self); // terminating REBASE_OPCODE_DONE / BIND_OPCODE_DONE
}
Opcode byte = (opcode_nibble << 4) | imm_nibble.
3. Rebase stream
For every absolute pointer in output __DATA / __DATA_CONST (an Unsigned reloc or a GOT entry resolved to a local address), emit rebase opcodes:
REBASE_OPCODE_SET_TYPE_IMM(REBASE_TYPE_POINTER)
REBASE_OPCODE_SET_SEGMENT_AND_OFFSET_ULEB(seg_idx, offset_within_seg)
REBASE_OPCODE_DO_REBASE_ULEB_TIMES(count) or _IMM(count)
Batching: consecutive rebases collapse into single _ULEB_TIMES; strided rebases use _ULEB_TIMES_SKIPPING_ULEB. Matching ld's batching is what keeps the differential harness happy.
4. Non-lazy bind stream
For every GOT entry pointing at a dylib import:
BIND_OPCODE_SET_DYLIB_ORDINAL_IMM(ordinal) or _ULEB(ordinal)
BIND_OPCODE_SET_SYMBOL_TRAILING_FLAGS_IMM(flags) + <name>\0
BIND_OPCODE_SET_TYPE_IMM(BIND_TYPE_POINTER)
BIND_OPCODE_SET_SEGMENT_AND_OFFSET_ULEB(seg_idx, offset)
BIND_OPCODE_DO_BIND
Flags: BIND_SYMBOL_FLAGS_WEAK_IMPORT, BIND_SYMBOL_FLAGS_NON_WEAK_DEFINITION.
5. Weak bind stream
For symbols that participate in weak coalescing across the program (weak defs that can be overridden by other images). For armfortas today this is empty; fortsh may or may not need it. Emit a terminator-only stream by default.
6. Lazy bind stream
One block per stub_helper entry (one dylib-imported callable per stub). Each block:
BIND_OPCODE_SET_SEGMENT_AND_OFFSET_ULEB(seg_idx_of_la_symbol_ptr, offset_of_this_slot)
BIND_OPCODE_SET_DYLIB_ORDINAL_IMM/ULEB(ordinal)
BIND_OPCODE_SET_SYMBOL_TRAILING_FLAGS_IMM(flags) + <name>\0
BIND_OPCODE_DO_BIND
BIND_OPCODE_DONE
The stub_helper entry pushes the byte offset of its block; dyld_stub_binder reads from that offset, interprets the block, patches the lazy pointer.
7. Export trie
Rooted at __LINKEDIT[export_off]. Built from the output's external Defined symbols (including re-exports from dylibs we re-export). Tree construction:
- Collect
(name, ExportEntry)pairs. - Build a prefix trie.
- Emit depth-first: each node = ULEB terminal-size, optional terminal payload (flags + address ULEB), child-count, (edge_string, child_offset) pairs.
- Child offsets are fixed up in a second pass once sizes are known.
Terminal payload formats:
- Regular:
flags ULEB | address_from_file_start ULEB. - Re-export:
flags ULEB | dylib_ordinal ULEB | imported_name\0. - Stub-and-resolver:
flags ULEB | stub_addr ULEB | resolver_addr ULEB.
8. Stream-size determinism
Every stream must be deterministic across invocations given identical inputs. Sort keys everywhere, no hashmap iteration order.
Testing Strategy
- Differential: for every staging fixture, afs-ld and
ldproduce byte-identical opcode streams after normalizing any tolerated differences. - Unit tests for ULEB128 encoding at boundary values (0, 127, 128, 16383, 16384, big).
- Export-trie walker (Sprint 5's
DylibFile::exportsreader) round-trips our emitted tries: emit a trie, parse it back, every name resolves.
Definition of Done
- All five streams emitted correctly.
- Export trie round-trips through our own reader.
- Differential byte-level parity with
ldon 10+ staging fixtures. - Opcode emission is deterministic.
View source
| 1 | # Sprint 15: Classic LC_DYLD_INFO Opcodes |
| 2 | |
| 3 | ## Prerequisites |
| 4 | Sprints 12, 14 — GOT/stubs/lazy-pointers in place, symbol table shaped. |
| 5 | |
| 6 | ## Goals |
| 7 | Generate the four ULEB128 opcode streams and the export trie that dyld reads via `LC_DYLD_INFO_ONLY`. This is the classic format (macOS 11–13 default) and the `-no_fixup_chains` path on newer macOS. Chained fixups land in Sprint 15.5. |
| 8 | |
| 9 | ## Deliverables |
| 10 | |
| 11 | ### 1. The five streams |
| 12 | |
| 13 | `LC_DYLD_INFO_ONLY` load command points at five blobs in `__LINKEDIT`: |
| 14 | - **rebase_off / rebase_size**: rebase opcodes — fix up absolute pointers for ASLR slide. |
| 15 | - **bind_off / bind_size**: bind opcodes — non-lazy imports from dylibs. |
| 16 | - **weak_bind_off / weak_bind_size**: weak-bind opcodes — C++-style weak symbol coalescing at runtime. |
| 17 | - **lazy_bind_off / lazy_bind_size**: lazy-bind opcodes — one block per stub_helper entry. |
| 18 | - **export_off / export_size**: export trie — what this image exports to other images. |
| 19 | |
| 20 | ### 2. Opcode encoder |
| 21 | `afs-ld/src/synth/dyld_info.rs`: |
| 22 | |
| 23 | ```rust |
| 24 | pub struct OpcodeStream { buf: Vec<u8> } |
| 25 | |
| 26 | impl OpcodeStream { |
| 27 | pub fn uleb(&mut self, v: u64); |
| 28 | pub fn sleb(&mut self, v: i64); |
| 29 | pub fn string(&mut self, s: &str); // null-terminated |
| 30 | pub fn byte(&mut self, op_and_imm: u8); |
| 31 | pub fn done(&mut self); // terminating REBASE_OPCODE_DONE / BIND_OPCODE_DONE |
| 32 | } |
| 33 | ``` |
| 34 | |
| 35 | Opcode byte = (opcode_nibble << 4) | imm_nibble. |
| 36 | |
| 37 | ### 3. Rebase stream |
| 38 | For every absolute pointer in output `__DATA` / `__DATA_CONST` (an `Unsigned` reloc or a GOT entry resolved to a local address), emit rebase opcodes: |
| 39 | |
| 40 | ``` |
| 41 | REBASE_OPCODE_SET_TYPE_IMM(REBASE_TYPE_POINTER) |
| 42 | REBASE_OPCODE_SET_SEGMENT_AND_OFFSET_ULEB(seg_idx, offset_within_seg) |
| 43 | REBASE_OPCODE_DO_REBASE_ULEB_TIMES(count) or _IMM(count) |
| 44 | ``` |
| 45 | |
| 46 | Batching: consecutive rebases collapse into single `_ULEB_TIMES`; strided rebases use `_ULEB_TIMES_SKIPPING_ULEB`. Matching ld's batching is what keeps the differential harness happy. |
| 47 | |
| 48 | ### 4. Non-lazy bind stream |
| 49 | For every GOT entry pointing at a dylib import: |
| 50 | |
| 51 | ``` |
| 52 | BIND_OPCODE_SET_DYLIB_ORDINAL_IMM(ordinal) or _ULEB(ordinal) |
| 53 | BIND_OPCODE_SET_SYMBOL_TRAILING_FLAGS_IMM(flags) + <name>\0 |
| 54 | BIND_OPCODE_SET_TYPE_IMM(BIND_TYPE_POINTER) |
| 55 | BIND_OPCODE_SET_SEGMENT_AND_OFFSET_ULEB(seg_idx, offset) |
| 56 | BIND_OPCODE_DO_BIND |
| 57 | ``` |
| 58 | |
| 59 | Flags: `BIND_SYMBOL_FLAGS_WEAK_IMPORT`, `BIND_SYMBOL_FLAGS_NON_WEAK_DEFINITION`. |
| 60 | |
| 61 | ### 5. Weak bind stream |
| 62 | For symbols that participate in weak coalescing across the program (weak defs that can be overridden by other images). For armfortas today this is empty; fortsh may or may not need it. Emit a terminator-only stream by default. |
| 63 | |
| 64 | ### 6. Lazy bind stream |
| 65 | One block per stub_helper entry (one dylib-imported callable per stub). Each block: |
| 66 | ``` |
| 67 | BIND_OPCODE_SET_SEGMENT_AND_OFFSET_ULEB(seg_idx_of_la_symbol_ptr, offset_of_this_slot) |
| 68 | BIND_OPCODE_SET_DYLIB_ORDINAL_IMM/ULEB(ordinal) |
| 69 | BIND_OPCODE_SET_SYMBOL_TRAILING_FLAGS_IMM(flags) + <name>\0 |
| 70 | BIND_OPCODE_DO_BIND |
| 71 | BIND_OPCODE_DONE |
| 72 | ``` |
| 73 | |
| 74 | The stub_helper entry pushes the byte offset of its block; `dyld_stub_binder` reads from that offset, interprets the block, patches the lazy pointer. |
| 75 | |
| 76 | ### 7. Export trie |
| 77 | Rooted at `__LINKEDIT[export_off]`. Built from the output's external Defined symbols (including re-exports from dylibs we re-export). Tree construction: |
| 78 | |
| 79 | - Collect `(name, ExportEntry)` pairs. |
| 80 | - Build a prefix trie. |
| 81 | - Emit depth-first: each node = ULEB terminal-size, optional terminal payload (flags + address ULEB), child-count, (edge_string, child_offset) pairs. |
| 82 | - Child offsets are fixed up in a second pass once sizes are known. |
| 83 | |
| 84 | Terminal payload formats: |
| 85 | - Regular: `flags ULEB | address_from_file_start ULEB`. |
| 86 | - Re-export: `flags ULEB | dylib_ordinal ULEB | imported_name\0`. |
| 87 | - Stub-and-resolver: `flags ULEB | stub_addr ULEB | resolver_addr ULEB`. |
| 88 | |
| 89 | ### 8. Stream-size determinism |
| 90 | Every stream must be deterministic across invocations given identical inputs. Sort keys everywhere, no hashmap iteration order. |
| 91 | |
| 92 | ## Testing Strategy |
| 93 | - Differential: for every staging fixture, afs-ld and `ld` produce byte-identical opcode streams after normalizing any tolerated differences. |
| 94 | - Unit tests for ULEB128 encoding at boundary values (0, 127, 128, 16383, 16384, big). |
| 95 | - Export-trie walker (Sprint 5's `DylibFile::exports` reader) round-trips our emitted tries: emit a trie, parse it back, every name resolves. |
| 96 | |
| 97 | ## Definition of Done |
| 98 | - All five streams emitted correctly. |
| 99 | - Export trie round-trips through our own reader. |
| 100 | - Differential byte-level parity with `ld` on 10+ staging fixtures. |
| 101 | - Opcode emission is deterministic. |