Sprint 29.11: Full Sprint 29 Audit
Why This Sprint Exists
Sprint 29 no longer needs another cleanup/build-out sub-sprint. It needs an exhaustive audit.
29.10 closed the hidden implementation gaps that were still preventing an honest "mostly done" story. What remains now is proving that the optimizer surface we claim to have is actually correct:
- across IR, assembly, objects, linked binaries, and runtime behavior
- across
-O0/-O1/-O2/-O3/-Os/-Ofast - across real-world-style Fortran programs, not just isolated micro-reproducers
This sprint is the formal closeout audit for all of Sprint 29, including 29.5-29.10.
Audit Scope
Sources of truth:
Each promised item should end the audit in one of three states:
- proven working with living tests
- explicitly deferred with a written dependency
- captured by a living XFAIL or audit regression
Audit Rules
- Prefer real Fortran programs in
test_programs/over only unit-level IR toys. - Every new audit program should try to prove more than stdout:
- runtime result correctness
- IR/asm/object presence where relevant
- object or linked-binary determinism where relevant
- cross-opt equality unless the test is intentionally opt-sensitive
- Use
.refsas the sanity-check source for hotspot shapes and realistic coding patterns, especially stdlib/fpm-style loops, source scanners, and numerical kernels. - When the audit finds a real bug, land the smallest honest fix plus a regression.
- When the audit finds a real bug that is not fixed immediately, record it in
noted_items.mdand capture it as a livingXFAILwhere possible.
Initial Tranche
Kickoff items:
- source-level audit for helper-before-program entry lowering / dead-function root handling
- source-level audit for module procedure host-association over module globals
- real-world stdlib-style kernel:
- tridiagonal sparse matvec (
realworld_tridiag_spmv.f90)
- tridiagonal sparse matvec (
- real-world BLAS-style kernel:
- axpy + reduction (
realworld_axpy_reduce.f90)
- axpy + reduction (
- real-world fpm-style application logic:
- source suffix classification (
realworld_suffix_scan.f90)
- source suffix classification (
The passing real-world programs should prove:
- runtime correctness
- phase triangulation (
ir|asm|obj|repro) - cross-opt equality
- deterministic object snapshots
- deterministic linked binaries without Mach-O UUID noise
Current audit findings:
- fixed: helper-before-program lowering / dead-function rooting could drop the
true
__prog_*entry or make_maincall the wrong helper first - fixed: named-parameter local fixed-array extents were folded as
(1, 1)in lowering, which made real-world kernels likerealworld_axpy_reduce.f90andrealworld_tridiag_spmv.f90trip bogus bounds checks - fixed: ordinary load-bearing loops were entering an unsafe full-unroll path,
which miscompiled
realworld_axpy_reduce.f90at-O2/-O3/-Ofast; the unroller is now hardened to keep that shape out while preserving the provenDO CONCURRENTfull-unroll path - fixed: BCE only recognized the canonical bare loop IV, so real-world counted
loops with safe
iv +/- constarray accesses kept redundant bounds checks at-O2+; the audit kernelsrealworld_sasum_cleanup.f90andrealworld_three_point_apply.f90now prove the offset-IV case and keep SROA honest at the same time - fixed: cross-block LSF treated every call as a universal memory clobber, so
branch-join reuse through a noalias helper side path stayed as a reload at
-O2+;realworld_noalias_reuse.f90now proves the noalias-call case and also keeps the local same-block reuse path honest - fixed: contained procedures only partially inherited host-associated
parameterconstants during lowering, so dummy array extents and loop bounds likex(n),y(n), anddo i = 1, ncould degrade inside real-world helper kernels;realworld_seed_overwrite.f90now proves that host-param-backed dummy extents and loop bounds stay intact - fixed: backend
ICmplowering could emit mixed-width GP compares likecmp w26, x23when the IR compared a 32-bit induction value against a 64-bit bound;realworld_ipo_chain.f90now keeps the compare-width harmonization honest through a real helper-chain compile at-O2+ - fixed: module procedures were still missing host association over their own
module globals, so small cases like
call bump()could silently leave a shared module variable unchanged.module_global_host_assoc.f90is now a passing audit program with cross-opt equality plus asm/object/run reproducibility, andtests/module_host_audit.rsproves the raw IR resolves the shared module global inside the procedure body - fixed: extended
OPENlowering built the runtime control block by storing typed fields through a byte-pointer GEP, which first tripped IR verification forposition='append'and then, after the verifier fix, still wrote fields at scaled-by-element-size offsets.io_append_log.f90is now a passing file oracle with append rerun coverage plus asm/object reproducibility and cross-opt equality - fixed: descriptor-backed array query intrinsics (
SIZE,LBOUND,UBOUND) were lowered as rawi64runtime results even though Fortran default integer queries should materialize as default-kind scalars, and scalar/component assignment lowering skipped mixed-width coercion at ordinary store sites;realworld_shape_guard.f90now proves the default-kind runtime-shape path through real allocatable metadata, loop bounds, and deterministic objects - fixed: backend
MovRegemission did not handlex -> wtruncation views, so real-world default-kind array-query assignments could produce invalid asm likemov w21, x20; the new runtime-shape audit keeps that truncation surface honest - fixed: fpm-style suffix classification in
realworld_suffix_scan.f90first needed fixed-length character arrays to lower as real element-addressable arrays instead of scalar strings, then needed scalar character dummy intrinsics likeINDEX(name, suffix)to carry a real runtime length story throughafs_c_strlen, and finally exposed two optimizer-side bugs: mixed-width GEP offsets were being compared without pointee-size scaling in alias analysis, and SROA was scalarizing aggregates even when a GEP address escaped through the synthesized descriptor. The program is now a passing audit probe with cross-opt equality plus IR/asm/object/run reproducibility - fixed: dummy-array
SIZE(...)queries inrealworld_assumed_shape_size.f90were not actually using assumed-shape descriptors because bare(:)dummy declarations lowered through theDeferredarray-spec path, and then the synthesized descriptor lostrankandflagsat-O2+because mixed-width GEP offsets in alias analysis made descriptor field stores look like they overlapped. Assumed-shape dummies now classify as descriptor-backed, query results are kept in default-kind scalars, and the real-world canary now passes across optimization levels with deterministic artifacts - proven: LICM hoists invariant scalar dummy loads out of a real-world affine
update loop in
realworld_affine_shift.f90once BCE clears the loop body - proven: GVN reduces duplicated branch-join PURE helper calls in
realworld_join_bias_sum.f90at-O2+instead of recomputing the same affine helper result through the join - proven: DSE removes the dead seed store in
realworld_seed_overwrite.f90across the intervening noalias helper call while preserving the real fill - proven: SROA scalarizes the fixed tap buffer in
realworld_binomial_blend.f90and BCE clears the corresponding safe stencil bounds checks at-O2+, giving us another living real-world audit kernel for the small-aggregate path - proven: loop-legality audit kernels
realworld_inplace_prefix.f90andrealworld_inplace_symmix.f90stay runtime-correct, cross-opt-equal, and deterministic across IR/object/binary surfaces - proven: the 29.9 single-file story now has real-world audit coverage for
ELEMENTAL lowering plus DO CONCURRENT bulk redirection
(
realworld_elemental_stage.f90), intramodule IPO helper trimming (realworld_ipo_chain.f90), small-loop DO CONCURRENT exploitation (realworld_doconc_square.f90), and explicit-DO vectorization onto the bulk runtime kernels (realworld_vector_stage.f90) - separately deferred parser gap: typed character array constructors using an
explicit type-spec inside
[]
Current audit corpus snapshot:
183top-leveltest_programs/*.f90runtime corpus programs172programs withCHECK30programs withIR_CHECK6programs withIR_NOT0livingXFAILs
Brutal Audit Priorities
1. 29.8 Optimizer Proof
Grow adversarial real-world coverage for:
- GVN
- SROA
- BCE
- local and cross-block LSF
- LICM
- loop legality transforms
2. 29.9 Claims Audit
Prove the current single-file story honestly:
- PURE/ELEMENTAL exploitation
- DO CONCURRENT exploitation
- intramodule IPO
- vectorization/runtime-kernel redirection
Also keep proving what is still absent:
- cross-module IPO
- whole-program analysis
- general native vectorizer
3. Binary Correctness & Determinism
For representative real-world programs:
- object snapshots deterministic at optimized levels
- linked binaries byte-identical when rebuilt at the same output path
- no
LC_UUID - runtime behavior equal across optimization levels unless explicitly exempted
Success Condition
Sprint 29 closes when:
- the promised optimizer/runtime surface has living proof
- the remaining holes are few, explicit, and written down
- the test suite gives us confidence in IR, binary, determinism, and integration rather than only stdout
View source
| 1 | # Sprint 29.11: Full Sprint 29 Audit |
| 2 | |
| 3 | ## Why This Sprint Exists |
| 4 | |
| 5 | Sprint 29 no longer needs another cleanup/build-out sub-sprint. It needs an |
| 6 | exhaustive audit. |
| 7 | |
| 8 | 29.10 closed the hidden implementation gaps that were still preventing an honest |
| 9 | "mostly done" story. What remains now is proving that the optimizer surface we |
| 10 | claim to have is actually correct: |
| 11 | - across IR, assembly, objects, linked binaries, and runtime behavior |
| 12 | - across `-O0/-O1/-O2/-O3/-Os/-Ofast` |
| 13 | - across real-world-style Fortran programs, not just isolated micro-reproducers |
| 14 | |
| 15 | This sprint is the formal closeout audit for all of Sprint 29, including |
| 16 | 29.5-29.10. |
| 17 | |
| 18 | ## Audit Scope |
| 19 | |
| 20 | Sources of truth: |
| 21 | - [Sprint 29](sprint29.md) |
| 22 | - [Sprint 29.5](sprint29_5.md) |
| 23 | - [Sprint 29.6](sprint29_6.md) |
| 24 | - [Sprint 29.7](sprint29_7.md) |
| 25 | - [Sprint 29.8](sprint29_8.md) |
| 26 | - [Sprint 29.9](sprint29_9.md) |
| 27 | - [Sprint 29.10](sprint29_10.md) |
| 28 | |
| 29 | Each promised item should end the audit in one of three states: |
| 30 | 1. proven working with living tests |
| 31 | 2. explicitly deferred with a written dependency |
| 32 | 3. captured by a living XFAIL or audit regression |
| 33 | |
| 34 | ## Audit Rules |
| 35 | |
| 36 | - Prefer real Fortran programs in `test_programs/` over only unit-level IR toys. |
| 37 | - Every new audit program should try to prove more than stdout: |
| 38 | - runtime result correctness |
| 39 | - IR/asm/object presence where relevant |
| 40 | - object or linked-binary determinism where relevant |
| 41 | - cross-opt equality unless the test is intentionally opt-sensitive |
| 42 | - Use `.refs` as the sanity-check source for hotspot shapes and realistic coding |
| 43 | patterns, especially stdlib/fpm-style loops, source scanners, and numerical kernels. |
| 44 | - When the audit finds a real bug, land the smallest honest fix plus a regression. |
| 45 | - When the audit finds a real bug that is not fixed immediately, record it in |
| 46 | `noted_items.md` and capture it as a living `XFAIL` where possible. |
| 47 | |
| 48 | ## Initial Tranche |
| 49 | |
| 50 | Kickoff items: |
| 51 | - source-level audit for helper-before-program entry lowering / dead-function root handling |
| 52 | - source-level audit for module procedure host-association over module globals |
| 53 | - real-world stdlib-style kernel: |
| 54 | - tridiagonal sparse matvec (`realworld_tridiag_spmv.f90`) |
| 55 | - real-world BLAS-style kernel: |
| 56 | - axpy + reduction (`realworld_axpy_reduce.f90`) |
| 57 | - real-world fpm-style application logic: |
| 58 | - source suffix classification (`realworld_suffix_scan.f90`) |
| 59 | |
| 60 | The passing real-world programs should prove: |
| 61 | - runtime correctness |
| 62 | - phase triangulation (`ir|asm|obj|repro`) |
| 63 | - cross-opt equality |
| 64 | - deterministic object snapshots |
| 65 | - deterministic linked binaries without Mach-O UUID noise |
| 66 | |
| 67 | Current audit findings: |
| 68 | - fixed: helper-before-program lowering / dead-function rooting could drop the |
| 69 | true `__prog_*` entry or make `_main` call the wrong helper first |
| 70 | - fixed: named-parameter local fixed-array extents were folded as `(1, 1)` in |
| 71 | lowering, which made real-world kernels like `realworld_axpy_reduce.f90` and |
| 72 | `realworld_tridiag_spmv.f90` trip bogus bounds checks |
| 73 | - fixed: ordinary load-bearing loops were entering an unsafe full-unroll path, |
| 74 | which miscompiled `realworld_axpy_reduce.f90` at `-O2/-O3/-Ofast`; the |
| 75 | unroller is now hardened to keep that shape out while preserving the proven |
| 76 | `DO CONCURRENT` full-unroll path |
| 77 | - fixed: BCE only recognized the canonical bare loop IV, so real-world counted |
| 78 | loops with safe `iv +/- const` array accesses kept redundant bounds checks at |
| 79 | `-O2+`; the audit kernels `realworld_sasum_cleanup.f90` and |
| 80 | `realworld_three_point_apply.f90` now prove the offset-IV case and keep SROA |
| 81 | honest at the same time |
| 82 | - fixed: cross-block LSF treated every call as a universal memory clobber, so |
| 83 | branch-join reuse through a noalias helper side path stayed as a reload at |
| 84 | `-O2+`; `realworld_noalias_reuse.f90` now proves the noalias-call case and |
| 85 | also keeps the local same-block reuse path honest |
| 86 | - fixed: contained procedures only partially inherited host-associated |
| 87 | `parameter` constants during lowering, so dummy array extents and loop bounds |
| 88 | like `x(n)`, `y(n)`, and `do i = 1, n` could degrade inside real-world helper |
| 89 | kernels; `realworld_seed_overwrite.f90` now proves that host-param-backed |
| 90 | dummy extents and loop bounds stay intact |
| 91 | - fixed: backend `ICmp` lowering could emit mixed-width GP compares like |
| 92 | `cmp w26, x23` when the IR compared a 32-bit induction value against a 64-bit |
| 93 | bound; `realworld_ipo_chain.f90` now keeps the compare-width harmonization |
| 94 | honest through a real helper-chain compile at `-O2+` |
| 95 | - fixed: module procedures were still missing host association over their own |
| 96 | module globals, so small cases like `call bump()` could silently leave a |
| 97 | shared module variable unchanged. `module_global_host_assoc.f90` is now a |
| 98 | passing audit program with cross-opt equality plus asm/object/run reproducibility, |
| 99 | and `tests/module_host_audit.rs` proves the raw IR resolves the shared module |
| 100 | global inside the procedure body |
| 101 | - fixed: extended `OPEN` lowering built the runtime control block by storing |
| 102 | typed fields through a byte-pointer GEP, which first tripped IR verification |
| 103 | for `position='append'` and then, after the verifier fix, still wrote fields |
| 104 | at scaled-by-element-size offsets. `io_append_log.f90` is now a passing file |
| 105 | oracle with append rerun coverage plus asm/object reproducibility and |
| 106 | cross-opt equality |
| 107 | - fixed: descriptor-backed array query intrinsics (`SIZE`, `LBOUND`, `UBOUND`) |
| 108 | were lowered as raw `i64` runtime results even though Fortran default integer |
| 109 | queries should materialize as default-kind scalars, and scalar/component |
| 110 | assignment lowering skipped mixed-width coercion at ordinary store sites; |
| 111 | `realworld_shape_guard.f90` now proves the default-kind runtime-shape path |
| 112 | through real allocatable metadata, loop bounds, and deterministic objects |
| 113 | - fixed: backend `MovReg` emission did not handle `x -> w` truncation views, so |
| 114 | real-world default-kind array-query assignments could produce invalid asm like |
| 115 | `mov w21, x20`; the new runtime-shape audit keeps that truncation surface |
| 116 | honest |
| 117 | - fixed: fpm-style suffix classification in `realworld_suffix_scan.f90` first |
| 118 | needed fixed-length character arrays to lower as real element-addressable |
| 119 | arrays instead of scalar strings, then needed scalar character dummy intrinsics |
| 120 | like `INDEX(name, suffix)` to carry a real runtime length story through |
| 121 | `afs_c_strlen`, and finally exposed two optimizer-side bugs: mixed-width GEP |
| 122 | offsets were being compared without pointee-size scaling in alias analysis, |
| 123 | and SROA was scalarizing aggregates even when a GEP address escaped through |
| 124 | the synthesized descriptor. The program is now a passing audit probe with |
| 125 | cross-opt equality plus IR/asm/object/run reproducibility |
| 126 | - fixed: dummy-array `SIZE(...)` queries in `realworld_assumed_shape_size.f90` |
| 127 | were not actually using assumed-shape descriptors because bare `(:)` dummy |
| 128 | declarations lowered through the `Deferred` array-spec path, and then the |
| 129 | synthesized descriptor lost `rank` and `flags` at `-O2+` because mixed-width |
| 130 | GEP offsets in alias analysis made descriptor field stores look like they |
| 131 | overlapped. Assumed-shape dummies now classify as descriptor-backed, query |
| 132 | results are kept in default-kind scalars, and the real-world canary now |
| 133 | passes across optimization levels with deterministic artifacts |
| 134 | - proven: LICM hoists invariant scalar dummy loads out of a real-world affine |
| 135 | update loop in `realworld_affine_shift.f90` once BCE clears the loop body |
| 136 | - proven: GVN reduces duplicated branch-join PURE helper calls in |
| 137 | `realworld_join_bias_sum.f90` at `-O2+` instead of recomputing the same affine |
| 138 | helper result through the join |
| 139 | - proven: DSE removes the dead seed store in `realworld_seed_overwrite.f90` |
| 140 | across the intervening noalias helper call while preserving the real fill |
| 141 | - proven: SROA scalarizes the fixed tap buffer in `realworld_binomial_blend.f90` |
| 142 | and BCE clears the corresponding safe stencil bounds checks at `-O2+`, giving |
| 143 | us another living real-world audit kernel for the small-aggregate path |
| 144 | - proven: loop-legality audit kernels `realworld_inplace_prefix.f90` and |
| 145 | `realworld_inplace_symmix.f90` stay runtime-correct, cross-opt-equal, and |
| 146 | deterministic across IR/object/binary surfaces |
| 147 | - proven: the 29.9 single-file story now has real-world audit coverage for |
| 148 | ELEMENTAL lowering plus DO CONCURRENT bulk redirection |
| 149 | (`realworld_elemental_stage.f90`), intramodule IPO helper trimming |
| 150 | (`realworld_ipo_chain.f90`), small-loop DO CONCURRENT exploitation |
| 151 | (`realworld_doconc_square.f90`), and explicit-DO vectorization onto the bulk |
| 152 | runtime kernels (`realworld_vector_stage.f90`) |
| 153 | - separately deferred parser gap: typed character array constructors using an |
| 154 | explicit type-spec inside `[]` |
| 155 | |
| 156 | Current audit corpus snapshot: |
| 157 | - `183` top-level `test_programs/*.f90` runtime corpus programs |
| 158 | - `172` programs with `CHECK` |
| 159 | - `30` programs with `IR_CHECK` |
| 160 | - `6` programs with `IR_NOT` |
| 161 | - `0` living `XFAIL`s |
| 162 | |
| 163 | ## Brutal Audit Priorities |
| 164 | |
| 165 | ### 1. 29.8 Optimizer Proof |
| 166 | |
| 167 | Grow adversarial real-world coverage for: |
| 168 | - GVN |
| 169 | - SROA |
| 170 | - BCE |
| 171 | - local and cross-block LSF |
| 172 | - LICM |
| 173 | - loop legality transforms |
| 174 | |
| 175 | ### 2. 29.9 Claims Audit |
| 176 | |
| 177 | Prove the current single-file story honestly: |
| 178 | - PURE/ELEMENTAL exploitation |
| 179 | - DO CONCURRENT exploitation |
| 180 | - intramodule IPO |
| 181 | - vectorization/runtime-kernel redirection |
| 182 | |
| 183 | Also keep proving what is still absent: |
| 184 | - cross-module IPO |
| 185 | - whole-program analysis |
| 186 | - general native vectorizer |
| 187 | |
| 188 | ### 3. Binary Correctness & Determinism |
| 189 | |
| 190 | For representative real-world programs: |
| 191 | - object snapshots deterministic at optimized levels |
| 192 | - linked binaries byte-identical when rebuilt at the same output path |
| 193 | - no `LC_UUID` |
| 194 | - runtime behavior equal across optimization levels unless explicitly exempted |
| 195 | |
| 196 | ## Success Condition |
| 197 | |
| 198 | Sprint 29 closes when: |
| 199 | - the promised optimizer/runtime surface has living proof |
| 200 | - the remaining holes are few, explicit, and written down |
| 201 | - the test suite gives us confidence in IR, binary, determinism, and integration |
| 202 | rather than only stdout |