Sprint 16: LC_FUNCTION_STARTS & LC_DATA_IN_CODE
Prerequisites
Sprint 14 — __LINKEDIT layout sequencing; Sprint 11 — atoms placed.
Goals
Emit the two small __LINKEDIT blobs used by debuggers, disassemblers, and the dynamic loader: LC_FUNCTION_STARTS (delta-encoded entry points) and LC_DATA_IN_CODE (markers for data embedded in __text).
Deliverables
1. LC_FUNCTION_STARTS
Format: a single stream of ULEB128 deltas. First ULEB = offset from the Mach-O header to the first function entry. Each subsequent ULEB = delta from the previous entry. A terminating 0 ends the stream. 8-byte aligned.
Source: every atom in __TEXT,__text plus .alt_entry chain members. Exclude atoms from __stubs and __stub_helper — ld doesn't list those.
2. LC_DATA_IN_CODE
Format: a packed array of:
struct data_in_code_entry {
uint32 offset; // from Mach-O header
uint16 length; // bytes
uint16 kind; // DICE_KIND_DATA=1, _JUMP_TABLE8=2, _JUMP_TABLE16=3,
// _JUMP_TABLE32=4, _ABS_JUMP_TABLE32=5
}
Source: per-input LC_DATA_IN_CODE blocks. Remap each entry's offset from its input-section base to the final output VM address. Entries sorted by offset.
afs-as doesn't emit jump tables today, but we preserve whatever the input has so future C/Objective-C objects with jump tables survive linking.
3. Sorting determinism
Function starts: strictly ascending by VM address. Data-in-code: strictly ascending by output offset. Ties resolved by input command-line order.
4. Integration with __LINKEDIT layout (Sprint 14)
Both blobs get file offsets assigned after chained fixups / classic dyld-info but before the symbol table. Pointed at by their respective load commands with dataoff / datasize.
Testing Strategy
- Differential: function starts list byte-identical between afs-ld and
ldon every staging fixture. - Data-in-code: fixture with a jump table input; entries survive linking with correct remapped offsets.
- Empty output: fixtures with no functions produce zero-byte LC_FUNCTION_STARTS (actually: ld still emits a terminator? check) and absent LC_DATA_IN_CODE when no input had data-in-code.
Definition of Done
- LC_FUNCTION_STARTS parity with
ldon every staging fixture. - LC_DATA_IN_CODE entries remapped correctly across linking.
- Both blobs placed in the right
__LINKEDITslot per Sprint 14.
View source
| 1 | # Sprint 16: LC_FUNCTION_STARTS & LC_DATA_IN_CODE |
| 2 | |
| 3 | ## Prerequisites |
| 4 | Sprint 14 — `__LINKEDIT` layout sequencing; Sprint 11 — atoms placed. |
| 5 | |
| 6 | ## Goals |
| 7 | Emit the two small `__LINKEDIT` blobs used by debuggers, disassemblers, and the dynamic loader: `LC_FUNCTION_STARTS` (delta-encoded entry points) and `LC_DATA_IN_CODE` (markers for data embedded in `__text`). |
| 8 | |
| 9 | ## Deliverables |
| 10 | |
| 11 | ### 1. LC_FUNCTION_STARTS |
| 12 | |
| 13 | Format: a single stream of ULEB128 deltas. First ULEB = offset from the Mach-O header to the first function entry. Each subsequent ULEB = delta from the previous entry. A terminating `0` ends the stream. 8-byte aligned. |
| 14 | |
| 15 | Source: every atom in `__TEXT,__text` plus `.alt_entry` chain members. Exclude atoms from `__stubs` and `__stub_helper` — ld doesn't list those. |
| 16 | |
| 17 | ### 2. LC_DATA_IN_CODE |
| 18 | |
| 19 | Format: a packed array of: |
| 20 | ``` |
| 21 | struct data_in_code_entry { |
| 22 | uint32 offset; // from Mach-O header |
| 23 | uint16 length; // bytes |
| 24 | uint16 kind; // DICE_KIND_DATA=1, _JUMP_TABLE8=2, _JUMP_TABLE16=3, |
| 25 | // _JUMP_TABLE32=4, _ABS_JUMP_TABLE32=5 |
| 26 | } |
| 27 | ``` |
| 28 | |
| 29 | Source: per-input `LC_DATA_IN_CODE` blocks. Remap each entry's offset from its input-section base to the final output VM address. Entries sorted by offset. |
| 30 | |
| 31 | afs-as doesn't emit jump tables today, but we preserve whatever the input has so future C/Objective-C objects with jump tables survive linking. |
| 32 | |
| 33 | ### 3. Sorting determinism |
| 34 | |
| 35 | Function starts: strictly ascending by VM address. Data-in-code: strictly ascending by output offset. Ties resolved by input command-line order. |
| 36 | |
| 37 | ### 4. Integration with `__LINKEDIT` layout (Sprint 14) |
| 38 | |
| 39 | Both blobs get file offsets assigned after chained fixups / classic dyld-info but before the symbol table. Pointed at by their respective load commands with `dataoff / datasize`. |
| 40 | |
| 41 | ## Testing Strategy |
| 42 | |
| 43 | - Differential: function starts list byte-identical between afs-ld and `ld` on every staging fixture. |
| 44 | - Data-in-code: fixture with a jump table input; entries survive linking with correct remapped offsets. |
| 45 | - Empty output: fixtures with no functions produce zero-byte LC_FUNCTION_STARTS (actually: ld still emits a terminator? check) and absent LC_DATA_IN_CODE when no input had data-in-code. |
| 46 | |
| 47 | ## Definition of Done |
| 48 | |
| 49 | - LC_FUNCTION_STARTS parity with `ld` on every staging fixture. |
| 50 | - LC_DATA_IN_CODE entries remapped correctly across linking. |
| 51 | - Both blobs placed in the right `__LINKEDIT` slot per Sprint 14. |