Sprint 25: LOH Relaxation
Prerequisites
Sprints 1, 11 — LOH hints preserved, reloc application in place.
Goals
Apply Linker Optimization Hints that afs-as emits. LOHs describe safe peephole opportunities — replace ADRP+ADD with a single ADR when the target is in ±1 MB, or nop-out an unnecessary LDR. Until this sprint, LOHs are preserved as-is and no relaxation happens.
Deliverables
1. LOH kinds afs-as emits
From the existing .loh directives:
AdrpAdd: ADRP + ADD (compute address). If target is in ±1 MB, replace with ADR + nop.AdrpLdr: ADRP + LDR (load through page/pageoff). If target is in ±1 MB and aligned, can become ADR + LDR (using the LDR's literal form).AdrpLdrGot: ADRP + LDR from GOT. If the GOT entry's content is a local symbol with known address, can skip the GOT load entirely (ADR + nop).AdrpLdrGotLdr: ADRP + LDR (GOT) + LDR (final). Similar combo: if GOT can be skipped, fold into a direct LDR.
2. LOH data format
LC_LINKER_OPTIMIZATION_HINT points to a ULEB128 stream:
uleb128 kind
uleb128 argcount
uleb128 arg1 // file offset
uleb128 arg2
...
Kind constants: LOH_ARM64_ADRP_ADRP=1, LOH_ARM64_ADRP_LDR=2, LOH_ARM64_ADRP_ADD_LDR=3, LOH_ARM64_ADRP_LDR_GOT_LDR=4, LOH_ARM64_ADRP_ADD_STR=5, LOH_ARM64_ADRP_LDR_GOT_STR=6, LOH_ARM64_ADRP_ADD=7, LOH_ARM64_ADRP_LDR_GOT=8.
afs-as emits kinds 3, 7, 8 (and 4 for load-from-pointer-in-GOT).
3. Relaxation pass
Runs after reloc application (Sprint 11) and before LOH re-serialization. For each LOH:
- Parse the referenced instructions.
- Compute if the symbolic target fits the tighter encoding.
- If yes: rewrite the instruction bytes; mark the LOH as "applied" so it can be either dropped or left in the output (ld's convention varies; we match).
- If no: leave untouched.
Safety: every relaxation is reversible (the original instructions still achieve the goal), and no relaxation narrows a correctly-wider encoding to an incorrect one. Extensive testing required.
4. Safe conservatism
A LOH is only applied when the target fits strictly within the narrower range. Off-by-one guard: recompute both original and relaxed forms, assert the relaxed form computes the same address.
5. Cross-LOH interaction
A single instruction can participate in multiple LOHs (one as a member of an ADRP+ADD, another as a member of an ADRP+ADD+LDR). Apply LOHs in a deterministic order — longest first — and skip any LOH whose instructions have already been rewritten.
6. Output LOH preservation
ld emits LC_LINKER_OPTIMIZATION_HINT in the output even for executables (for the benefit of post-processing tools). We match: emit a new LOH blob with the final state (applied LOHs marked or omitted).
7. -no_loh flag
For debugging: -no_loh skips relaxation. Helpful when comparing output against a known-bad state.
Testing Strategy
- Synthetic fixture with a function whose
__datatarget is 1 MB away → AdrpAdd LOH applies; disassembly shows ADR + nop. - Fixture with a target 10 MB away → LOH does not apply; ADRP + ADD preserved.
- Differential: afs-ld output byte-matches
ldoutput for both fixtures. - Runtime test: the relaxed code still dereferences the right address.
- Random-fuzz: 100 fixtures with various target distances; every relaxation verified against recomputed ground truth.
Definition of Done
- LOH relaxation applied correctly on fixtures that fit.
- LOH skipped correctly on fixtures that don't.
- Byte-parity with
ldon a representative corpus. -no_lohflag produces a cleanly un-relaxed output.
View source
| 1 | # Sprint 25: LOH Relaxation |
| 2 | |
| 3 | ## Prerequisites |
| 4 | Sprints 1, 11 — LOH hints preserved, reloc application in place. |
| 5 | |
| 6 | ## Goals |
| 7 | Apply Linker Optimization Hints that afs-as emits. LOHs describe safe peephole opportunities — replace ADRP+ADD with a single ADR when the target is in ±1 MB, or nop-out an unnecessary LDR. Until this sprint, LOHs are preserved as-is and no relaxation happens. |
| 8 | |
| 9 | ## Deliverables |
| 10 | |
| 11 | ### 1. LOH kinds afs-as emits |
| 12 | |
| 13 | From the existing `.loh` directives: |
| 14 | - `AdrpAdd`: ADRP + ADD (compute address). If target is in ±1 MB, replace with ADR + nop. |
| 15 | - `AdrpLdr`: ADRP + LDR (load through page/pageoff). If target is in ±1 MB and aligned, can become ADR + LDR (using the LDR's literal form). |
| 16 | - `AdrpLdrGot`: ADRP + LDR from GOT. If the GOT entry's content is a local symbol with known address, can skip the GOT load entirely (ADR + nop). |
| 17 | - `AdrpLdrGotLdr`: ADRP + LDR (GOT) + LDR (final). Similar combo: if GOT can be skipped, fold into a direct LDR. |
| 18 | |
| 19 | ### 2. LOH data format |
| 20 | |
| 21 | `LC_LINKER_OPTIMIZATION_HINT` points to a ULEB128 stream: |
| 22 | ``` |
| 23 | uleb128 kind |
| 24 | uleb128 argcount |
| 25 | uleb128 arg1 // file offset |
| 26 | uleb128 arg2 |
| 27 | ... |
| 28 | ``` |
| 29 | |
| 30 | Kind constants: `LOH_ARM64_ADRP_ADRP=1`, `LOH_ARM64_ADRP_LDR=2`, `LOH_ARM64_ADRP_ADD_LDR=3`, `LOH_ARM64_ADRP_LDR_GOT_LDR=4`, `LOH_ARM64_ADRP_ADD_STR=5`, `LOH_ARM64_ADRP_LDR_GOT_STR=6`, `LOH_ARM64_ADRP_ADD=7`, `LOH_ARM64_ADRP_LDR_GOT=8`. |
| 31 | |
| 32 | afs-as emits kinds 3, 7, 8 (and 4 for load-from-pointer-in-GOT). |
| 33 | |
| 34 | ### 3. Relaxation pass |
| 35 | |
| 36 | Runs **after** reloc application (Sprint 11) and **before** LOH re-serialization. For each LOH: |
| 37 | |
| 38 | 1. Parse the referenced instructions. |
| 39 | 2. Compute if the symbolic target fits the tighter encoding. |
| 40 | 3. If yes: rewrite the instruction bytes; mark the LOH as "applied" so it can be either dropped or left in the output (ld's convention varies; we match). |
| 41 | 4. If no: leave untouched. |
| 42 | |
| 43 | Safety: every relaxation is reversible (the original instructions still achieve the goal), and no relaxation narrows a correctly-wider encoding to an incorrect one. Extensive testing required. |
| 44 | |
| 45 | ### 4. Safe conservatism |
| 46 | |
| 47 | A LOH is only applied when the target fits **strictly** within the narrower range. Off-by-one guard: recompute both original and relaxed forms, assert the relaxed form computes the same address. |
| 48 | |
| 49 | ### 5. Cross-LOH interaction |
| 50 | |
| 51 | A single instruction can participate in multiple LOHs (one as a member of an ADRP+ADD, another as a member of an ADRP+ADD+LDR). Apply LOHs in a deterministic order — longest first — and skip any LOH whose instructions have already been rewritten. |
| 52 | |
| 53 | ### 6. Output LOH preservation |
| 54 | |
| 55 | ld emits `LC_LINKER_OPTIMIZATION_HINT` in the output even for executables (for the benefit of post-processing tools). We match: emit a new LOH blob with the final state (applied LOHs marked or omitted). |
| 56 | |
| 57 | ### 7. `-no_loh` flag |
| 58 | |
| 59 | For debugging: `-no_loh` skips relaxation. Helpful when comparing output against a known-bad state. |
| 60 | |
| 61 | ## Testing Strategy |
| 62 | |
| 63 | - Synthetic fixture with a function whose `__data` target is 1 MB away → AdrpAdd LOH applies; disassembly shows ADR + nop. |
| 64 | - Fixture with a target 10 MB away → LOH does not apply; ADRP + ADD preserved. |
| 65 | - Differential: afs-ld output byte-matches `ld` output for both fixtures. |
| 66 | - Runtime test: the relaxed code still dereferences the right address. |
| 67 | - Random-fuzz: 100 fixtures with various target distances; every relaxation verified against recomputed ground truth. |
| 68 | |
| 69 | ## Definition of Done |
| 70 | |
| 71 | - LOH relaxation applied correctly on fixtures that fit. |
| 72 | - LOH skipped correctly on fixtures that don't. |
| 73 | - Byte-parity with `ld` on a representative corpus. |
| 74 | - `-no_loh` flag produces a cleanly un-relaxed output. |