# Sprint 27: Differential Harness vs Apple ld ## Prerequisites All prior sprints, especially 18, 18.5, 21 — end-to-end milestones. ## Goals Industrial-strength parity harness. Automated byte-level comparison of afs-ld output against `ld` across a curated corpus. Explicit tolerance lists, regression-gated CI. This is the Sprint 20 default-swap gate: afs-ld becomes the armfortas default only after this sprint's corpus is green. ## Deliverables ### 1. Corpus `tests/parity_corpus/` contains 50+ link scenarios, each a small test directory with: - `inputs/` (the `.o`, `.a`, `.tbd` files). - `args.txt` (the afs-ld / `ld` command-line). - `notes.md` (what this exercises). Scenarios cover: - Hello-world variants (classic vs chained, with/without `-dead_strip`, with/without `-icf`). - Every relocation type in isolation. - GOT and stub exercises. - TLV exercises. - Weak-def coalescing. - Common-symbol promotion. - Multi-archive resolution with order dependence. - Dylib-with-reexport chain. - LSystem links with real system SDK. - `libarmfortas_rt.a` + a 3-function Fortran program. ### 2. Diff dimensions For each scenario, compare: - Load commands: count, order, contents (with tolerated-diff for UUID/timestamp). - Segment sizes and file offsets. - Section bytes (byte-level equality after reloc application). - Symbol table: same nlist entries in the same partition order. - String table: same content (byte-level is ideal, length within 5% is tolerated for suffix-dedup variation). - `LC_DYLD_INFO_ONLY` opcode streams (classic) or `LC_DYLD_CHAINED_FIXUPS` chains (chained). - Export trie walk equivalence (may differ in byte layout but must export the same names with the same flags and addresses). - `__unwind_info` byte-level. - Code signature: ignored in diff (ld signs with sha256 hashes over its output's bytes; we sign over ours; different bytes, different hashes — expected). ### 3. Tolerated-diff rules Current Sprint 27 allowlist is intentionally small and explicit: - UUID load-command bytes. - Dylib timestamp fields. - Code-signature load-command/blob bytes. - Case-specific section-byte ranges declared in `notes.md`. - String-table length drift within 5% for suffix-dedup variance. Each tolerance has a precise predicate — no loose "any byte in __LINKEDIT". Unknown diffs fail. ### 4. Harness structure `afs-ld/tests/parity_matrix.rs` walks `tests/parity_corpus/` and runs each scenario: ```rust #[test] fn parity_corpus() { for case in load_corpus("tests/parity_corpus/") { let ours = link_with_afs_ld(&case).unwrap(); let theirs = link_with_system_ld(&case).unwrap(); let diffs = diff_macho(&ours, &theirs); let critical: Vec<_> = diffs.into_iter().filter(|d| !is_tolerated(d)).collect(); assert!(critical.is_empty(), "{}: {} critical diff(s):\n{:#?}", case.name, critical.len(), critical); } } ``` ### 5. CI gating GitHub Actions job runs on every PR: - `cargo test --test parity_matrix` green. - Artifact uploaded: per-scenario HTML diff viewer for debugging. - A failing scenario blocks merge. ### 6. Per-scenario allowed-diff annotation Some scenarios might have legitimate small differences we don't want to suppress globally. Each scenario's `notes.md` can declare case-specific tolerances: ```yaml tolerated: - region: __LINKEDIT bytes 0x1000-0x1010 reason: "ld emits padding here" ``` Use sparingly; each tolerance must be justified and date-stamped. ### 7. Runtime parity Beyond byte-level: each scenario that produces a runnable executable is also executed; stdout, stderr, and exit code must match between the two linked binaries. ### 8. Parity budget The goal is **zero** critical diffs across the corpus. Sprint 27 is not done until the harness is fully green; if a diff can't be resolved within the sprint, it must be filed as a bug blocking default-swap in Sprint 20. ## Testing Strategy - `cargo test --test parity_matrix` green. - Intentional-regression: mutate one byte in afs-ld's writer, confirm the harness catches it. - Scale test: full corpus runs in <2 minutes on a reasonable machine (gates Sprint 28 perf work). ## Definition of Done - 50+ corpus scenarios all pass with zero critical diffs. - CI-enforced. - Every tolerated-diff category has a justification and a test that proves it triggers. - Intentional-regression canary detects any change outside the allowlist. - Sprint 20's default-swap is unblocked.