markdown · 4459 bytes Raw Blame History

Sprint 27: Differential Harness vs Apple ld

Prerequisites

All prior sprints, especially 18, 18.5, 21 — end-to-end milestones.

Goals

Industrial-strength parity harness. Automated byte-level comparison of afs-ld output against ld across a curated corpus. Explicit tolerance lists, regression-gated CI. This is the Sprint 20 default-swap gate: afs-ld becomes the armfortas default only after this sprint's corpus is green.

Deliverables

1. Corpus

tests/parity_corpus/ contains 50+ link scenarios, each a small test directory with:

  • inputs/ (the .o, .a, .tbd files).
  • args.txt (the afs-ld / ld command-line).
  • notes.md (what this exercises).

Scenarios cover:

  • Hello-world variants (classic vs chained, with/without -dead_strip, with/without -icf).
  • Every relocation type in isolation.
  • GOT and stub exercises.
  • TLV exercises.
  • Weak-def coalescing.
  • Common-symbol promotion.
  • Multi-archive resolution with order dependence.
  • Dylib-with-reexport chain.
  • LSystem links with real system SDK.
  • libarmfortas_rt.a + a 3-function Fortran program.

2. Diff dimensions

For each scenario, compare:

  • Load commands: count, order, contents (with tolerated-diff for UUID/timestamp).
  • Segment sizes and file offsets.
  • Section bytes (byte-level equality after reloc application).
  • Symbol table: same nlist entries in the same partition order.
  • String table: same content (byte-level is ideal, length within 5% is tolerated for suffix-dedup variation).
  • LC_DYLD_INFO_ONLY opcode streams (classic) or LC_DYLD_CHAINED_FIXUPS chains (chained).
  • Export trie walk equivalence (may differ in byte layout but must export the same names with the same flags and addresses).
  • __unwind_info byte-level.
  • Code signature: ignored in diff (ld signs with sha256 hashes over its output's bytes; we sign over ours; different bytes, different hashes — expected).

3. Tolerated-diff rules

Current Sprint 27 allowlist is intentionally small and explicit:

  • UUID load-command bytes.
  • Dylib timestamp fields.
  • Code-signature load-command/blob bytes.
  • Case-specific section-byte ranges declared in notes.md.
  • String-table length drift within 5% for suffix-dedup variance.

Each tolerance has a precise predicate — no loose "any byte in __LINKEDIT". Unknown diffs fail.

4. Harness structure

afs-ld/tests/parity_matrix.rs walks tests/parity_corpus/ and runs each scenario:

#[test]
fn parity_corpus() {
    for case in load_corpus("tests/parity_corpus/") {
        let ours = link_with_afs_ld(&case).unwrap();
        let theirs = link_with_system_ld(&case).unwrap();
        let diffs = diff_macho(&ours, &theirs);
        let critical: Vec<_> = diffs.into_iter().filter(|d| !is_tolerated(d)).collect();
        assert!(critical.is_empty(),
            "{}: {} critical diff(s):\n{:#?}", case.name, critical.len(), critical);
    }
}

5. CI gating

GitHub Actions job runs on every PR:

  • cargo test --test parity_matrix green.
  • Artifact uploaded: per-scenario HTML diff viewer for debugging.
  • A failing scenario blocks merge.

6. Per-scenario allowed-diff annotation

Some scenarios might have legitimate small differences we don't want to suppress globally. Each scenario's notes.md can declare case-specific tolerances:

tolerated:
  - region: __LINKEDIT  bytes 0x1000-0x1010  reason: "ld emits padding here"

Use sparingly; each tolerance must be justified and date-stamped.

7. Runtime parity

Beyond byte-level: each scenario that produces a runnable executable is also executed; stdout, stderr, and exit code must match between the two linked binaries.

8. Parity budget

The goal is zero critical diffs across the corpus. Sprint 27 is not done until the harness is fully green; if a diff can't be resolved within the sprint, it must be filed as a bug blocking default-swap in Sprint 20.

Testing Strategy

  • cargo test --test parity_matrix green.
  • Intentional-regression: mutate one byte in afs-ld's writer, confirm the harness catches it.
  • Scale test: full corpus runs in <2 minutes on a reasonable machine (gates Sprint 28 perf work).

Definition of Done

  • 50+ corpus scenarios all pass with zero critical diffs.
  • CI-enforced.
  • Every tolerated-diff category has a justification and a test that proves it triggers.
  • Intentional-regression canary detects any change outside the allowlist.
  • Sprint 20's default-swap is unblocked.
View source
1 # Sprint 27: Differential Harness vs Apple ld
2
3 ## Prerequisites
4 All prior sprints, especially 18, 18.5, 21 — end-to-end milestones.
5
6 ## Goals
7 Industrial-strength parity harness. Automated byte-level comparison of afs-ld output against `ld` across a curated corpus. Explicit tolerance lists, regression-gated CI. This is the Sprint 20 default-swap gate: afs-ld becomes the armfortas default only after this sprint's corpus is green.
8
9 ## Deliverables
10
11 ### 1. Corpus
12
13 `tests/parity_corpus/` contains 50+ link scenarios, each a small test directory with:
14 - `inputs/` (the `.o`, `.a`, `.tbd` files).
15 - `args.txt` (the afs-ld / `ld` command-line).
16 - `notes.md` (what this exercises).
17
18 Scenarios cover:
19 - Hello-world variants (classic vs chained, with/without `-dead_strip`, with/without `-icf`).
20 - Every relocation type in isolation.
21 - GOT and stub exercises.
22 - TLV exercises.
23 - Weak-def coalescing.
24 - Common-symbol promotion.
25 - Multi-archive resolution with order dependence.
26 - Dylib-with-reexport chain.
27 - LSystem links with real system SDK.
28 - `libarmfortas_rt.a` + a 3-function Fortran program.
29
30 ### 2. Diff dimensions
31
32 For each scenario, compare:
33 - Load commands: count, order, contents (with tolerated-diff for UUID/timestamp).
34 - Segment sizes and file offsets.
35 - Section bytes (byte-level equality after reloc application).
36 - Symbol table: same nlist entries in the same partition order.
37 - String table: same content (byte-level is ideal, length within 5% is tolerated for suffix-dedup variation).
38 - `LC_DYLD_INFO_ONLY` opcode streams (classic) or `LC_DYLD_CHAINED_FIXUPS` chains (chained).
39 - Export trie walk equivalence (may differ in byte layout but must export the same names with the same flags and addresses).
40 - `__unwind_info` byte-level.
41 - Code signature: ignored in diff (ld signs with sha256 hashes over its output's bytes; we sign over ours; different bytes, different hashes — expected).
42
43 ### 3. Tolerated-diff rules
44
45 Current Sprint 27 allowlist is intentionally small and explicit:
46 - UUID load-command bytes.
47 - Dylib timestamp fields.
48 - Code-signature load-command/blob bytes.
49 - Case-specific section-byte ranges declared in `notes.md`.
50 - String-table length drift within 5% for suffix-dedup variance.
51
52 Each tolerance has a precise predicate — no loose "any byte in __LINKEDIT". Unknown diffs fail.
53
54 ### 4. Harness structure
55
56 `afs-ld/tests/parity_matrix.rs` walks `tests/parity_corpus/` and runs each scenario:
57
58 ```rust
59 #[test]
60 fn parity_corpus() {
61 for case in load_corpus("tests/parity_corpus/") {
62 let ours = link_with_afs_ld(&case).unwrap();
63 let theirs = link_with_system_ld(&case).unwrap();
64 let diffs = diff_macho(&ours, &theirs);
65 let critical: Vec<_> = diffs.into_iter().filter(|d| !is_tolerated(d)).collect();
66 assert!(critical.is_empty(),
67 "{}: {} critical diff(s):\n{:#?}", case.name, critical.len(), critical);
68 }
69 }
70 ```
71
72 ### 5. CI gating
73
74 GitHub Actions job runs on every PR:
75 - `cargo test --test parity_matrix` green.
76 - Artifact uploaded: per-scenario HTML diff viewer for debugging.
77 - A failing scenario blocks merge.
78
79 ### 6. Per-scenario allowed-diff annotation
80
81 Some scenarios might have legitimate small differences we don't want to suppress globally. Each scenario's `notes.md` can declare case-specific tolerances:
82
83 ```yaml
84 tolerated:
85 - region: __LINKEDIT bytes 0x1000-0x1010 reason: "ld emits padding here"
86 ```
87
88 Use sparingly; each tolerance must be justified and date-stamped.
89
90 ### 7. Runtime parity
91
92 Beyond byte-level: each scenario that produces a runnable executable is also executed; stdout, stderr, and exit code must match between the two linked binaries.
93
94 ### 8. Parity budget
95
96 The goal is **zero** critical diffs across the corpus. Sprint 27 is not done until the harness is fully green; if a diff can't be resolved within the sprint, it must be filed as a bug blocking default-swap in Sprint 20.
97
98 ## Testing Strategy
99
100 - `cargo test --test parity_matrix` green.
101 - Intentional-regression: mutate one byte in afs-ld's writer, confirm the harness catches it.
102 - Scale test: full corpus runs in <2 minutes on a reasonable machine (gates Sprint 28 perf work).
103
104 ## Definition of Done
105
106 - 50+ corpus scenarios all pass with zero critical diffs.
107 - CI-enforced.
108 - Every tolerated-diff category has a justification and a test that proves it triggers.
109 - Intentional-regression canary detects any change outside the allowlist.
110 - Sprint 20's default-swap is unblocked.