afs-ld Public

Watch 0 Fork 0 Star 0

markdown · 4372 bytes Raw Blame History

Sprint 27: Differential Harness vs Apple ld

Prerequisites

All prior sprints, especially 18, 18.5, 21 — end-to-end milestones.

Goals

Industrial-strength parity harness. Automated byte-level comparison of afs-ld output against ld across a curated corpus. Explicit tolerance lists, regression-gated CI. This is the Sprint 20 default-swap gate: afs-ld becomes the armfortas default only after this sprint's corpus is green.

Deliverables

1. Corpus

tests/parity_corpus/ contains 50+ link scenarios, each a small test directory with:

inputs/ (the .o, .a, .tbd files).
args.txt (the afs-ld / ld command-line).
notes.md (what this exercises).

Scenarios cover:

Hello-world variants (classic vs chained, with/without -dead_strip, with/without -icf).
Every relocation type in isolation.
GOT and stub exercises.
TLV exercises.
Weak-def coalescing.
Common-symbol promotion.
Multi-archive resolution with order dependence.
Dylib-with-reexport chain.
LSystem links with real system SDK.
libarmfortas_rt.a + a 3-function Fortran program.

2. Diff dimensions

For each scenario, compare:

Load commands: count, order, contents (with tolerated-diff for UUID/timestamp).
Segment sizes and file offsets.
Section bytes (byte-level equality after reloc application).
Symbol table: same nlist entries in the same partition order.
String table: same content (byte-level is ideal, length within 5% is tolerated for suffix-dedup variation).
LC_DYLD_INFO_ONLY opcode streams (classic) or LC_DYLD_CHAINED_FIXUPS chains (chained).
Export trie walk equivalence (may differ in byte layout but must export the same names with the same flags and addresses).
__unwind_info byte-level.
Code signature: ignored in diff (ld signs with sha256 hashes over its output's bytes; we sign over ours; different bytes, different hashes — expected).

3. Tolerated-diff rules

pub enum ToleratedDiff {
    UuidBytes,
    Timestamp,
    PathHashInString(&'static str),      // e.g. temp path in stabs
    StringTableSuffixDedupVariance,
    CodeSignatureHashes,
}

Each tolerance has a precise predicate — no loose "any byte in __LINKEDIT". Unknown diffs fail.

4. Harness structure

afs-ld/tests/parity_matrix.rs walks tests/parity_corpus/ and runs each scenario:

#[test]
fn parity_corpus() {
    for case in load_corpus("tests/parity_corpus/") {
        let ours = link_with_afs_ld(&case).unwrap();
        let theirs = link_with_system_ld(&case).unwrap();
        let diffs = diff_macho(&ours, &theirs);
        let critical: Vec<_> = diffs.into_iter().filter(|d| !is_tolerated(d)).collect();
        assert!(critical.is_empty(),
            "{}: {} critical diff(s):\n{:#?}", case.name, critical.len(), critical);
    }
}

5. CI gating

GitHub Actions job runs on every PR:

cargo test --test parity_matrix green.
Artifact uploaded: per-scenario HTML diff viewer for debugging.
A failing scenario blocks merge.

6. Per-scenario allowed-diff annotation

Some scenarios might have legitimate small differences we don't want to suppress globally. Each scenario's notes.md can declare case-specific tolerances:

tolerated:
  - region: __LINKEDIT  bytes 0x1000-0x1010  reason: "ld emits padding here"

Use sparingly; each tolerance must be justified and date-stamped.

7. Runtime parity

Beyond byte-level: each scenario that produces a runnable executable is also executed; stdout, stderr, and exit code must match between the two linked binaries.

8. Parity budget

The goal is zero critical diffs across the corpus. Sprint 27 is not done until the harness is fully green; if a diff can't be resolved within the sprint, it must be filed as a bug blocking default-swap in Sprint 20.

Testing Strategy

cargo test --test parity_matrix green.
Intentional-regression: mutate one byte in afs-ld's writer, confirm the harness catches it.
Scale test: full corpus runs in <2 minutes on a reasonable machine (gates Sprint 28 perf work).

Definition of Done

50+ corpus scenarios all pass with zero critical diffs.
CI-enforced.
Every tolerated-diff category has a justification and a test that proves it triggers.
Intentional-regression canary detects any change outside the allowlist.
Sprint 20's default-swap is unblocked.

View source

  
        1
        # Sprint 27: Differential Harness vs Apple ld
      
        2
        
        3
        ## Prerequisites
      
        4
        All prior sprints, especially 18, 18.5, 21 — end-to-end milestones.
      
        5
        
        6
        ## Goals
      
        7
        Industrial-strength parity harness. Automated byte-level comparison of afs-ld output against `ld` across a curated corpus. Explicit tolerance lists, regression-gated CI. This is the Sprint 20 default-swap gate: afs-ld becomes the armfortas default only after this sprint's corpus is green.
      
        8
        
        9
        ## Deliverables
      
        10
        
        11
        ### 1. Corpus
      
        12
        
        13
        `tests/parity_corpus/` contains 50+ link scenarios, each a small test directory with:
      
        14
        - `inputs/` (the `.o`, `.a`, `.tbd` files).
      
        15
        - `args.txt` (the afs-ld / `ld` command-line).
      
        16
        - `notes.md` (what this exercises).
      
        17
        
        18
        Scenarios cover:
      
        19
        - Hello-world variants (classic vs chained, with/without `-dead_strip`, with/without `-icf`).
      
        20
        - Every relocation type in isolation.
      
        21
        - GOT and stub exercises.
      
        22
        - TLV exercises.
      
        23
        - Weak-def coalescing.
      
        24
        - Common-symbol promotion.
      
        25
        - Multi-archive resolution with order dependence.
      
        26
        - Dylib-with-reexport chain.
      
        27
        - LSystem links with real system SDK.
      
        28
        - `libarmfortas_rt.a` + a 3-function Fortran program.
      
        29
        
        30
        ### 2. Diff dimensions
      
        31
        
        32
        For each scenario, compare:
      
        33
        - Load commands: count, order, contents (with tolerated-diff for UUID/timestamp).
      
        34
        - Segment sizes and file offsets.
      
        35
        - Section bytes (byte-level equality after reloc application).
      
        36
        - Symbol table: same nlist entries in the same partition order.
      
        37
        - String table: same content (byte-level is ideal, length within 5% is tolerated for suffix-dedup variation).
      
        38
        - `LC_DYLD_INFO_ONLY` opcode streams (classic) or `LC_DYLD_CHAINED_FIXUPS` chains (chained).
      
        39
        - Export trie walk equivalence (may differ in byte layout but must export the same names with the same flags and addresses).
      
        40
        - `__unwind_info` byte-level.
      
        41
        - Code signature: ignored in diff (ld signs with sha256 hashes over its output's bytes; we sign over ours; different bytes, different hashes — expected).
      
        42
        
        43
        ### 3. Tolerated-diff rules
      
        44
        
        45
        ```rust
      
        46
        pub enum ToleratedDiff {
      
        47
            UuidBytes,
      
        48
            Timestamp,
      
        49
            PathHashInString(&'static str),      // e.g. temp path in stabs
      
        50
            StringTableSuffixDedupVariance,
      
        51
            CodeSignatureHashes,
      
        52
        }
      
        53
        ```
      
        54
        
        55
        Each tolerance has a precise predicate — no loose "any byte in __LINKEDIT". Unknown diffs fail.
      
        56
        
        57
        ### 4. Harness structure
      
        58
        
        59
        `afs-ld/tests/parity_matrix.rs` walks `tests/parity_corpus/` and runs each scenario:
      
        60
        
        61
        ```rust
      
        62
        #[test]
      
        63
        fn parity_corpus() {
      
        64
            for case in load_corpus("tests/parity_corpus/") {
      
        65
                let ours = link_with_afs_ld(&case).unwrap();
      
        66
                let theirs = link_with_system_ld(&case).unwrap();
      
        67
                let diffs = diff_macho(&ours, &theirs);
      
        68
                let critical: Vec<_> = diffs.into_iter().filter(|d| !is_tolerated(d)).collect();
      
        69
                assert!(critical.is_empty(),
      
        70
                    "{}: {} critical diff(s):\n{:#?}", case.name, critical.len(), critical);
      
        71
            }
      
        72
        }
      
        73
        ```
      
        74
        
        75
        ### 5. CI gating
      
        76
        
        77
        GitHub Actions job runs on every PR:
      
        78
        - `cargo test --test parity_matrix` green.
      
        79
        - Artifact uploaded: per-scenario HTML diff viewer for debugging.
      
        80
        - A failing scenario blocks merge.
      
        81
        
        82
        ### 6. Per-scenario allowed-diff annotation
      
        83
        
        84
        Some scenarios might have legitimate small differences we don't want to suppress globally. Each scenario's `notes.md` can declare case-specific tolerances:
      
        85
        
        86
        ```yaml
      
        87
        tolerated:
      
        88
          - region: __LINKEDIT  bytes 0x1000-0x1010  reason: "ld emits padding here"
      
        89
        ```
      
        90
        
        91
        Use sparingly; each tolerance must be justified and date-stamped.
      
        92
        
        93
        ### 7. Runtime parity
      
        94
        
        95
        Beyond byte-level: each scenario that produces a runnable executable is also executed; stdout, stderr, and exit code must match between the two linked binaries.
      
        96
        
        97
        ### 8. Parity budget
      
        98
        
        99
        The goal is **zero** critical diffs across the corpus. Sprint 27 is not done until the harness is fully green; if a diff can't be resolved within the sprint, it must be filed as a bug blocking default-swap in Sprint 20.
      
        100
        
        101
        ## Testing Strategy
      
        102
        
        103
        - `cargo test --test parity_matrix` green.
      
        104
        - Intentional-regression: mutate one byte in afs-ld's writer, confirm the harness catches it.
      
        105
        - Scale test: full corpus runs in <2 minutes on a reasonable machine (gates Sprint 28 perf work).
      
        106
        
        107
        ## Definition of Done
      
        108
        
        109
        - 50+ corpus scenarios all pass with zero critical diffs.
      
        110
        - CI-enforced.
      
        111
        - Every tolerated-diff category has a justification and a test that proves it triggers.
      
        112
        - Intentional-regression canary detects any change outside the allowlist.
      
        113
        - Sprint 20's default-swap is unblocked.

1	# Sprint 27: Differential Harness vs Apple ld
2
3	## Prerequisites
4	All prior sprints, especially 18, 18.5, 21 — end-to-end milestones.
5
6	## Goals
7	Industrial-strength parity harness. Automated byte-level comparison of afs-ld output against `ld` across a curated corpus. Explicit tolerance lists, regression-gated CI. This is the Sprint 20 default-swap gate: afs-ld becomes the armfortas default only after this sprint's corpus is green.
8
9	## Deliverables
10
11	### 1. Corpus
12
13	`tests/parity_corpus/` contains 50+ link scenarios, each a small test directory with:
14	- `inputs/` (the `.o`, `.a`, `.tbd` files).
15	- `args.txt` (the afs-ld / `ld` command-line).
16	- `notes.md` (what this exercises).
17
18	Scenarios cover:
19	- Hello-world variants (classic vs chained, with/without `-dead_strip`, with/without `-icf`).
20	- Every relocation type in isolation.
21	- GOT and stub exercises.
22	- TLV exercises.
23	- Weak-def coalescing.
24	- Common-symbol promotion.
25	- Multi-archive resolution with order dependence.
26	- Dylib-with-reexport chain.
27	- LSystem links with real system SDK.
28	- `libarmfortas_rt.a` + a 3-function Fortran program.
29
30	### 2. Diff dimensions
31
32	For each scenario, compare:
33	- Load commands: count, order, contents (with tolerated-diff for UUID/timestamp).
34	- Segment sizes and file offsets.
35	- Section bytes (byte-level equality after reloc application).
36	- Symbol table: same nlist entries in the same partition order.
37	- String table: same content (byte-level is ideal, length within 5% is tolerated for suffix-dedup variation).
38	- `LC_DYLD_INFO_ONLY` opcode streams (classic) or `LC_DYLD_CHAINED_FIXUPS` chains (chained).
39	- Export trie walk equivalence (may differ in byte layout but must export the same names with the same flags and addresses).
40	- `__unwind_info` byte-level.
41	- Code signature: ignored in diff (ld signs with sha256 hashes over its output's bytes; we sign over ours; different bytes, different hashes — expected).
42
43	### 3. Tolerated-diff rules
44
45	```rust
46	pub enum ToleratedDiff {
47	UuidBytes,
48	Timestamp,
49	PathHashInString(&'static str), // e.g. temp path in stabs
50	StringTableSuffixDedupVariance,
51	CodeSignatureHashes,
52	}
53	```
54
55	Each tolerance has a precise predicate — no loose "any byte in __LINKEDIT". Unknown diffs fail.
56
57	### 4. Harness structure
58
59	`afs-ld/tests/parity_matrix.rs` walks `tests/parity_corpus/` and runs each scenario:
60
61	```rust
62	#[test]
63	fn parity_corpus() {
64	for case in load_corpus("tests/parity_corpus/") {
65	let ours = link_with_afs_ld(&case).unwrap();
66	let theirs = link_with_system_ld(&case).unwrap();
67	let diffs = diff_macho(&ours, &theirs);
68	let critical: Vec<_> = diffs.into_iter().filter(\|d\| !is_tolerated(d)).collect();
69	assert!(critical.is_empty(),
70	"{}: {} critical diff(s):\n{:#?}", case.name, critical.len(), critical);
71	}
72	}
73	```
74
75	### 5. CI gating
76
77	GitHub Actions job runs on every PR:
78	- `cargo test --test parity_matrix` green.
79	- Artifact uploaded: per-scenario HTML diff viewer for debugging.
80	- A failing scenario blocks merge.
81
82	### 6. Per-scenario allowed-diff annotation
83
84	Some scenarios might have legitimate small differences we don't want to suppress globally. Each scenario's `notes.md` can declare case-specific tolerances:
85
86	```yaml
87	tolerated:
88	- region: __LINKEDIT bytes 0x1000-0x1010 reason: "ld emits padding here"
89	```
90
91	Use sparingly; each tolerance must be justified and date-stamped.
92
93	### 7. Runtime parity
94
95	Beyond byte-level: each scenario that produces a runnable executable is also executed; stdout, stderr, and exit code must match between the two linked binaries.
96
97	### 8. Parity budget
98
99	The goal is zero critical diffs across the corpus. Sprint 27 is not done until the harness is fully green; if a diff can't be resolved within the sprint, it must be filed as a bug blocking default-swap in Sprint 20.
100
101	## Testing Strategy
102
103	- `cargo test --test parity_matrix` green.
104	- Intentional-regression: mutate one byte in afs-ld's writer, confirm the harness catches it.
105	- Scale test: full corpus runs in <2 minutes on a reasonable machine (gates Sprint 28 perf work).
106
107	## Definition of Done
108
109	- 50+ corpus scenarios all pass with zero critical diffs.
110	- CI-enforced.
111	- Every tolerated-diff category has a justification and a test that proves it triggers.
112	- Intentional-regression canary detects any change outside the allowlist.
113	- Sprint 20's default-swap is unblocked.