markdown · 3623 bytes Raw Blame History

Sprint 25: LOH Relaxation

Prerequisites

Sprints 1, 11 — LOH hints preserved, reloc application in place.

Goals

Apply Linker Optimization Hints that afs-as emits. LOHs describe safe peephole opportunities — replace ADRP+ADD with a single ADR when the target is in ±1 MB, or nop-out an unnecessary LDR. Until this sprint, LOHs are preserved as-is and no relaxation happens.

Deliverables

1. LOH kinds afs-as emits

From the existing .loh directives:

  • AdrpAdd: ADRP + ADD (compute address). If target is in ±1 MB, replace with ADR + nop.
  • AdrpLdr: ADRP + LDR (load through page/pageoff). If target is in ±1 MB and aligned, can become ADR + LDR (using the LDR's literal form).
  • AdrpLdrGot: ADRP + LDR from GOT. If the GOT entry's content is a local symbol with known address, can skip the GOT load entirely (ADR + nop).
  • AdrpLdrGotLdr: ADRP + LDR (GOT) + LDR (final). Similar combo: if GOT can be skipped, fold into a direct LDR.

2. LOH data format

LC_LINKER_OPTIMIZATION_HINT points to a ULEB128 stream:

uleb128 kind
uleb128 argcount
uleb128 arg1  // file offset
uleb128 arg2
...

Kind constants: LOH_ARM64_ADRP_ADRP=1, LOH_ARM64_ADRP_LDR=2, LOH_ARM64_ADRP_ADD_LDR=3, LOH_ARM64_ADRP_LDR_GOT_LDR=4, LOH_ARM64_ADRP_ADD_STR=5, LOH_ARM64_ADRP_LDR_GOT_STR=6, LOH_ARM64_ADRP_ADD=7, LOH_ARM64_ADRP_LDR_GOT=8.

afs-as emits kinds 3, 7, 8 (and 4 for load-from-pointer-in-GOT).

3. Relaxation pass

Runs after reloc application (Sprint 11) and before LOH re-serialization. For each LOH:

  1. Parse the referenced instructions.
  2. Compute if the symbolic target fits the tighter encoding.
  3. If yes: rewrite the instruction bytes; mark the LOH as "applied" so it can be either dropped or left in the output (ld's convention varies; we match).
  4. If no: leave untouched.

Safety: every relaxation is reversible (the original instructions still achieve the goal), and no relaxation narrows a correctly-wider encoding to an incorrect one. Extensive testing required.

4. Safe conservatism

A LOH is only applied when the target fits strictly within the narrower range. Off-by-one guard: recompute both original and relaxed forms, assert the relaxed form computes the same address.

5. Cross-LOH interaction

A single instruction can participate in multiple LOHs (one as a member of an ADRP+ADD, another as a member of an ADRP+ADD+LDR). Apply LOHs in a deterministic order — longest first — and skip any LOH whose instructions have already been rewritten.

6. Output LOH preservation

ld emits LC_LINKER_OPTIMIZATION_HINT in the output even for executables (for the benefit of post-processing tools). We match: emit a new LOH blob with the final state (applied LOHs marked or omitted).

7. -no_loh flag

For debugging: -no_loh skips relaxation. Helpful when comparing output against a known-bad state.

Testing Strategy

  • Synthetic fixture with a function whose __data target is 1 MB away → AdrpAdd LOH applies; disassembly shows ADR + nop.
  • Fixture with a target 10 MB away → LOH does not apply; ADRP + ADD preserved.
  • Differential: afs-ld output byte-matches ld output for both fixtures.
  • Runtime test: the relaxed code still dereferences the right address.
  • Random-fuzz: 100 fixtures with various target distances; every relaxation verified against recomputed ground truth.

Definition of Done

  • LOH relaxation applied correctly on fixtures that fit.
  • LOH skipped correctly on fixtures that don't.
  • Byte-parity with ld on a representative corpus.
  • -no_loh flag produces a cleanly un-relaxed output.
View source
1 # Sprint 25: LOH Relaxation
2
3 ## Prerequisites
4 Sprints 1, 11 — LOH hints preserved, reloc application in place.
5
6 ## Goals
7 Apply Linker Optimization Hints that afs-as emits. LOHs describe safe peephole opportunities — replace ADRP+ADD with a single ADR when the target is in ±1 MB, or nop-out an unnecessary LDR. Until this sprint, LOHs are preserved as-is and no relaxation happens.
8
9 ## Deliverables
10
11 ### 1. LOH kinds afs-as emits
12
13 From the existing `.loh` directives:
14 - `AdrpAdd`: ADRP + ADD (compute address). If target is in ±1 MB, replace with ADR + nop.
15 - `AdrpLdr`: ADRP + LDR (load through page/pageoff). If target is in ±1 MB and aligned, can become ADR + LDR (using the LDR's literal form).
16 - `AdrpLdrGot`: ADRP + LDR from GOT. If the GOT entry's content is a local symbol with known address, can skip the GOT load entirely (ADR + nop).
17 - `AdrpLdrGotLdr`: ADRP + LDR (GOT) + LDR (final). Similar combo: if GOT can be skipped, fold into a direct LDR.
18
19 ### 2. LOH data format
20
21 `LC_LINKER_OPTIMIZATION_HINT` points to a ULEB128 stream:
22 ```
23 uleb128 kind
24 uleb128 argcount
25 uleb128 arg1 // file offset
26 uleb128 arg2
27 ...
28 ```
29
30 Kind constants: `LOH_ARM64_ADRP_ADRP=1`, `LOH_ARM64_ADRP_LDR=2`, `LOH_ARM64_ADRP_ADD_LDR=3`, `LOH_ARM64_ADRP_LDR_GOT_LDR=4`, `LOH_ARM64_ADRP_ADD_STR=5`, `LOH_ARM64_ADRP_LDR_GOT_STR=6`, `LOH_ARM64_ADRP_ADD=7`, `LOH_ARM64_ADRP_LDR_GOT=8`.
31
32 afs-as emits kinds 3, 7, 8 (and 4 for load-from-pointer-in-GOT).
33
34 ### 3. Relaxation pass
35
36 Runs **after** reloc application (Sprint 11) and **before** LOH re-serialization. For each LOH:
37
38 1. Parse the referenced instructions.
39 2. Compute if the symbolic target fits the tighter encoding.
40 3. If yes: rewrite the instruction bytes; mark the LOH as "applied" so it can be either dropped or left in the output (ld's convention varies; we match).
41 4. If no: leave untouched.
42
43 Safety: every relaxation is reversible (the original instructions still achieve the goal), and no relaxation narrows a correctly-wider encoding to an incorrect one. Extensive testing required.
44
45 ### 4. Safe conservatism
46
47 A LOH is only applied when the target fits **strictly** within the narrower range. Off-by-one guard: recompute both original and relaxed forms, assert the relaxed form computes the same address.
48
49 ### 5. Cross-LOH interaction
50
51 A single instruction can participate in multiple LOHs (one as a member of an ADRP+ADD, another as a member of an ADRP+ADD+LDR). Apply LOHs in a deterministic order — longest first — and skip any LOH whose instructions have already been rewritten.
52
53 ### 6. Output LOH preservation
54
55 ld emits `LC_LINKER_OPTIMIZATION_HINT` in the output even for executables (for the benefit of post-processing tools). We match: emit a new LOH blob with the final state (applied LOHs marked or omitted).
56
57 ### 7. `-no_loh` flag
58
59 For debugging: `-no_loh` skips relaxation. Helpful when comparing output against a known-bad state.
60
61 ## Testing Strategy
62
63 - Synthetic fixture with a function whose `__data` target is 1 MB away → AdrpAdd LOH applies; disassembly shows ADR + nop.
64 - Fixture with a target 10 MB away → LOH does not apply; ADRP + ADD preserved.
65 - Differential: afs-ld output byte-matches `ld` output for both fixtures.
66 - Runtime test: the relaxed code still dereferences the right address.
67 - Random-fuzz: 100 fixtures with various target distances; every relaxation verified against recomputed ground truth.
68
69 ## Definition of Done
70
71 - LOH relaxation applied correctly on fixtures that fit.
72 - LOH skipped correctly on fixtures that don't.
73 - Byte-parity with `ld` on a representative corpus.
74 - `-no_loh` flag produces a cleanly un-relaxed output.