Sprint 3: Relocations (Read-Side)
Prerequisites
Sprints 1–2 — header/load commands and section/symbol parsing in place.
Goals
Decode every ARM64 relocation type afs-as emits. Normalize paired relocations (ADDEND + primary, SUBTRACTOR + UNSIGNED) into a linker-friendly form. End state: the linker's reloc model captures every arithmetic and semantic constraint needed by Sprint 11.
Deliverables
1. Relocation constants and raw form
afs-ld/src/macho/constants.rs additions:
pub const ARM64_RELOC_UNSIGNED: u8 = 0;
pub const ARM64_RELOC_SUBTRACTOR: u8 = 1;
pub const ARM64_RELOC_BRANCH26: u8 = 2;
pub const ARM64_RELOC_PAGE21: u8 = 3;
pub const ARM64_RELOC_PAGEOFF12: u8 = 4;
pub const ARM64_RELOC_GOT_LOAD_PAGE21: u8 = 5;
pub const ARM64_RELOC_GOT_LOAD_PAGEOFF12: u8 = 6;
pub const ARM64_RELOC_POINTER_TO_GOT: u8 = 7;
pub const ARM64_RELOC_TLVP_LOAD_PAGE21: u8 = 8;
pub const ARM64_RELOC_TLVP_LOAD_PAGEOFF12: u8 = 9;
pub const ARM64_RELOC_ADDEND: u8 = 10;
Raw relocation_info: 8 bytes. r_address: i32, r_info: u32 packed as [r_symbolnum:24][r_pcrel:1][r_length:2][r_extern:1][r_type:4].
2. Parsed relocation form
afs-ld/src/reloc/mod.rs:
pub struct Reloc {
pub offset: u32, // byte offset into input section
pub kind: RelocKind,
pub length: RelocLength, // Byte=0, Half=1, Word=2, Quad=3
pub pcrel: bool,
pub referent: Referent,
pub addend: i64, // folded from ARM64_RELOC_ADDEND prefix or inline
}
pub enum RelocKind {
Unsigned, Branch26,
Page21, PageOff12,
GotLoadPage21, GotLoadPageOff12, PointerToGot,
TlvpLoadPage21, TlvpLoadPageOff12,
Subtractor, // minuend in `referent`; paired with a following Unsigned subtrahend
}
pub enum Referent {
Symbol(SymRef), // r_extern = 1
Section(SectRef), // r_extern = 0
}
3. Paired reloc fusion
Two pairings afs-as emits:
-
ARM64_RELOC_ADDEND: a prefix reloc whose
r_symbolnumfield is actually a 24-bit signed addend. The next reloc in the list is the primary (UNSIGNED, PAGE21, PAGEOFF12, BRANCH26, or GOT/TLVP variant). Parser fuses:addend_reloc.symnum → primary.addend, primary kept. -
ARM64_RELOC_SUBTRACTOR + ARM64_RELOC_UNSIGNED: difference expression. Emitted as a pair where SUBTRACTOR names the subtrahend symbol and UNSIGNED names the minuend. Parser fuses into a single
RelocKind::Subtractor { minuend, subtrahend }record on the minuend-carrying entry.
After fusion no ADDEND or SUBTRACTOR relocs should leak out of the reader.
4. Integrity checks
r_address + (length_bytes)within section bounds.r_extern = 1→r_symbolnum < nsyms.r_extern = 0→r_symbolnumis a 1-based section index in range.r_pcrelmatches the reloc kind (PC-relative: BRANCH26, PAGE21 variants, PAGEOFF12 that looks at ADRP page, TLVP variants; not PC-relative: UNSIGNED, PAGEOFF12 for immediates, POINTER_TO_GOT is PC-relative).r_lengthmatches kind (all ARM64 reloc kinds are length = 2 except UNSIGNED which is length = 2 or 3 and SUBTRACTOR which matches UNSIGNED).
5. Round-trip serializer (for golden tests)
afs-ld/src/reloc/mod.rs::write_relocs(sect: &InputSection, relocs: &[Reloc]) -> Vec<u8> reassembles into Mach-O wire form, including the ADDEND prefix when necessary. Used to prove the reader lost nothing.
Testing Strategy
- Round-trip every reloc in the afs-as corpus. Byte equality after ADDEND/SUBTRACTOR fusion + re-emission.
- Synthetic fixtures for each reloc kind (smallest possible
.sinput through afs-as):bl _extern→ BRANCH26 external.adrp x0, _g@PAGE; add x0, x0, _g@PAGEOFF→ PAGE21 + PAGEOFF12.adrp x0, _g@GOTPAGE; ldr x0, [x0, _g@GOTPAGEOFF]→ GOT_LOAD_PAGE21 + GOT_LOAD_PAGEOFF12..quad _g + 0x1000→ ADDEND + UNSIGNED pair..quad _a - _b→ SUBTRACTOR + UNSIGNED pair.adrp x0, _tlv@TLVPPAGE; ldr x0, [x0, _tlv@TLVPPAGEOFF]→ TLVP_LOAD_* pair.
- Malformed-input: reloc pointing past section end, unpaired SUBTRACTOR, unpaired ADDEND. Each produces a specific diagnostic citing input path and offset.
Definition of Done
- Every ARM64 reloc afs-as emits is represented in
RelocKindpost-fusion. - Paired relocs never leak as separate entries into downstream code.
- Corpus-wide round-trip byte equality.
- Integrity checks trigger diagnostics on malformed fixtures.
View source
| 1 | # Sprint 3: Relocations (Read-Side) |
| 2 | |
| 3 | ## Prerequisites |
| 4 | Sprints 1–2 — header/load commands and section/symbol parsing in place. |
| 5 | |
| 6 | ## Goals |
| 7 | Decode every ARM64 relocation type afs-as emits. Normalize paired relocations (ADDEND + primary, SUBTRACTOR + UNSIGNED) into a linker-friendly form. End state: the linker's reloc model captures every arithmetic and semantic constraint needed by Sprint 11. |
| 8 | |
| 9 | ## Deliverables |
| 10 | |
| 11 | ### 1. Relocation constants and raw form |
| 12 | `afs-ld/src/macho/constants.rs` additions: |
| 13 | |
| 14 | ```rust |
| 15 | pub const ARM64_RELOC_UNSIGNED: u8 = 0; |
| 16 | pub const ARM64_RELOC_SUBTRACTOR: u8 = 1; |
| 17 | pub const ARM64_RELOC_BRANCH26: u8 = 2; |
| 18 | pub const ARM64_RELOC_PAGE21: u8 = 3; |
| 19 | pub const ARM64_RELOC_PAGEOFF12: u8 = 4; |
| 20 | pub const ARM64_RELOC_GOT_LOAD_PAGE21: u8 = 5; |
| 21 | pub const ARM64_RELOC_GOT_LOAD_PAGEOFF12: u8 = 6; |
| 22 | pub const ARM64_RELOC_POINTER_TO_GOT: u8 = 7; |
| 23 | pub const ARM64_RELOC_TLVP_LOAD_PAGE21: u8 = 8; |
| 24 | pub const ARM64_RELOC_TLVP_LOAD_PAGEOFF12: u8 = 9; |
| 25 | pub const ARM64_RELOC_ADDEND: u8 = 10; |
| 26 | ``` |
| 27 | |
| 28 | Raw `relocation_info`: 8 bytes. `r_address: i32`, `r_info: u32` packed as `[r_symbolnum:24][r_pcrel:1][r_length:2][r_extern:1][r_type:4]`. |
| 29 | |
| 30 | ### 2. Parsed relocation form |
| 31 | `afs-ld/src/reloc/mod.rs`: |
| 32 | |
| 33 | ```rust |
| 34 | pub struct Reloc { |
| 35 | pub offset: u32, // byte offset into input section |
| 36 | pub kind: RelocKind, |
| 37 | pub length: RelocLength, // Byte=0, Half=1, Word=2, Quad=3 |
| 38 | pub pcrel: bool, |
| 39 | pub referent: Referent, |
| 40 | pub addend: i64, // folded from ARM64_RELOC_ADDEND prefix or inline |
| 41 | } |
| 42 | |
| 43 | pub enum RelocKind { |
| 44 | Unsigned, Branch26, |
| 45 | Page21, PageOff12, |
| 46 | GotLoadPage21, GotLoadPageOff12, PointerToGot, |
| 47 | TlvpLoadPage21, TlvpLoadPageOff12, |
| 48 | Subtractor, // minuend in `referent`; paired with a following Unsigned subtrahend |
| 49 | } |
| 50 | |
| 51 | pub enum Referent { |
| 52 | Symbol(SymRef), // r_extern = 1 |
| 53 | Section(SectRef), // r_extern = 0 |
| 54 | } |
| 55 | ``` |
| 56 | |
| 57 | ### 3. Paired reloc fusion |
| 58 | Two pairings afs-as emits: |
| 59 | |
| 60 | 1. **ARM64_RELOC_ADDEND**: a prefix reloc whose `r_symbolnum` field is actually a 24-bit signed addend. The next reloc in the list is the primary (UNSIGNED, PAGE21, PAGEOFF12, BRANCH26, or GOT/TLVP variant). Parser fuses: `addend_reloc.symnum → primary.addend`, primary kept. |
| 61 | |
| 62 | 2. **ARM64_RELOC_SUBTRACTOR + ARM64_RELOC_UNSIGNED**: difference expression. Emitted as a pair where SUBTRACTOR names the subtrahend symbol and UNSIGNED names the minuend. Parser fuses into a single `RelocKind::Subtractor { minuend, subtrahend }` record on the minuend-carrying entry. |
| 63 | |
| 64 | After fusion no ADDEND or SUBTRACTOR relocs should leak out of the reader. |
| 65 | |
| 66 | ### 4. Integrity checks |
| 67 | - `r_address + (length_bytes)` within section bounds. |
| 68 | - `r_extern = 1` → `r_symbolnum < nsyms`. |
| 69 | - `r_extern = 0` → `r_symbolnum` is a 1-based section index in range. |
| 70 | - `r_pcrel` matches the reloc kind (PC-relative: BRANCH26, PAGE21 variants, PAGEOFF12 that looks at ADRP page, TLVP variants; not PC-relative: UNSIGNED, PAGEOFF12 for immediates, POINTER_TO_GOT is PC-relative). |
| 71 | - `r_length` matches kind (all ARM64 reloc kinds are length = 2 except UNSIGNED which is length = 2 or 3 and SUBTRACTOR which matches UNSIGNED). |
| 72 | |
| 73 | ### 5. Round-trip serializer (for golden tests) |
| 74 | `afs-ld/src/reloc/mod.rs::write_relocs(sect: &InputSection, relocs: &[Reloc]) -> Vec<u8>` reassembles into Mach-O wire form, including the ADDEND prefix when necessary. Used to prove the reader lost nothing. |
| 75 | |
| 76 | ## Testing Strategy |
| 77 | - Round-trip every reloc in the afs-as corpus. Byte equality after ADDEND/SUBTRACTOR fusion + re-emission. |
| 78 | - Synthetic fixtures for each reloc kind (smallest possible `.s` input through afs-as): |
| 79 | - `bl _extern` → BRANCH26 external. |
| 80 | - `adrp x0, _g@PAGE; add x0, x0, _g@PAGEOFF` → PAGE21 + PAGEOFF12. |
| 81 | - `adrp x0, _g@GOTPAGE; ldr x0, [x0, _g@GOTPAGEOFF]` → GOT_LOAD_PAGE21 + GOT_LOAD_PAGEOFF12. |
| 82 | - `.quad _g + 0x1000` → ADDEND + UNSIGNED pair. |
| 83 | - `.quad _a - _b` → SUBTRACTOR + UNSIGNED pair. |
| 84 | - `adrp x0, _tlv@TLVPPAGE; ldr x0, [x0, _tlv@TLVPPAGEOFF]` → TLVP_LOAD_* pair. |
| 85 | - Malformed-input: reloc pointing past section end, unpaired SUBTRACTOR, unpaired ADDEND. Each produces a specific diagnostic citing input path and offset. |
| 86 | |
| 87 | ## Definition of Done |
| 88 | - Every ARM64 reloc afs-as emits is represented in `RelocKind` post-fusion. |
| 89 | - Paired relocs never leak as separate entries into downstream code. |
| 90 | - Corpus-wide round-trip byte equality. |
| 91 | - Integrity checks trigger diagnostics on malformed fixtures. |