armfortas Public

Watch 0 Fork 0 Star 0

markdown · 9968 bytes Raw Blame History

Sprint 29.11: Full Sprint 29 Audit

Why This Sprint Exists

Sprint 29 no longer needs another cleanup/build-out sub-sprint. It needs an exhaustive audit.

29.10 closed the hidden implementation gaps that were still preventing an honest "mostly done" story. What remains now is proving that the optimizer surface we claim to have is actually correct:

across IR, assembly, objects, linked binaries, and runtime behavior
across -O0/-O1/-O2/-O3/-Os/-Ofast
across real-world-style Fortran programs, not just isolated micro-reproducers

This sprint is the formal closeout audit for all of Sprint 29, including 29.5-29.10.

Audit Scope

Sources of truth:

Each promised item should end the audit in one of three states:

proven working with living tests
explicitly deferred with a written dependency
captured by a living XFAIL or audit regression

Audit Rules

Prefer real Fortran programs in test_programs/ over only unit-level IR toys.
Every new audit program should try to prove more than stdout:
- runtime result correctness
- IR/asm/object presence where relevant
- object or linked-binary determinism where relevant
- cross-opt equality unless the test is intentionally opt-sensitive
Use .refs as the sanity-check source for hotspot shapes and realistic coding patterns, especially stdlib/fpm-style loops, source scanners, and numerical kernels.
When the audit finds a real bug, land the smallest honest fix plus a regression.
When the audit finds a real bug that is not fixed immediately, record it in noted_items.md and capture it as a living XFAIL where possible.

Initial Tranche

Kickoff items:

source-level audit for helper-before-program entry lowering / dead-function root handling
source-level audit for module procedure host-association over module globals
real-world stdlib-style kernel:
- tridiagonal sparse matvec (realworld_tridiag_spmv.f90)
real-world BLAS-style kernel:
- axpy + reduction (realworld_axpy_reduce.f90)
real-world fpm-style application logic:
- source suffix classification (realworld_suffix_scan.f90)

The passing real-world programs should prove:

runtime correctness
phase triangulation (ir|asm|obj|repro)
cross-opt equality
deterministic object snapshots
deterministic linked binaries without Mach-O UUID noise

Current audit findings:

fixed: helper-before-program lowering / dead-function rooting could drop the true __prog_* entry or make _main call the wrong helper first
fixed: named-parameter local fixed-array extents were folded as (1, 1) in lowering, which made real-world kernels like realworld_axpy_reduce.f90 and realworld_tridiag_spmv.f90 trip bogus bounds checks
fixed: ordinary load-bearing loops were entering an unsafe full-unroll path, which miscompiled realworld_axpy_reduce.f90 at -O2/-O3/-Ofast; the unroller is now hardened to keep that shape out while preserving the proven DO CONCURRENT full-unroll path
fixed: BCE only recognized the canonical bare loop IV, so real-world counted loops with safe iv +/- const array accesses kept redundant bounds checks at -O2+; the audit kernels realworld_sasum_cleanup.f90 and realworld_three_point_apply.f90 now prove the offset-IV case and keep SROA honest at the same time
fixed: cross-block LSF treated every call as a universal memory clobber, so branch-join reuse through a noalias helper side path stayed as a reload at -O2+; realworld_noalias_reuse.f90 now proves the noalias-call case and also keeps the local same-block reuse path honest
fixed: contained procedures only partially inherited host-associated parameter constants during lowering, so dummy array extents and loop bounds like x(n), y(n), and do i = 1, n could degrade inside real-world helper kernels; realworld_seed_overwrite.f90 now proves that host-param-backed dummy extents and loop bounds stay intact
fixed: backend ICmp lowering could emit mixed-width GP compares like cmp w26, x23 when the IR compared a 32-bit induction value against a 64-bit bound; realworld_ipo_chain.f90 now keeps the compare-width harmonization honest through a real helper-chain compile at -O2+
fixed: module procedures were still missing host association over their own module globals, so small cases like call bump() could silently leave a shared module variable unchanged. module_global_host_assoc.f90 is now a passing audit program with cross-opt equality plus asm/object/run reproducibility, and tests/module_host_audit.rs proves the raw IR resolves the shared module global inside the procedure body
fixed: extended OPEN lowering built the runtime control block by storing typed fields through a byte-pointer GEP, which first tripped IR verification for position='append' and then, after the verifier fix, still wrote fields at scaled-by-element-size offsets. io_append_log.f90 is now a passing file oracle with append rerun coverage plus asm/object reproducibility and cross-opt equality
fixed: descriptor-backed array query intrinsics (SIZE, LBOUND, UBOUND) were lowered as raw i64 runtime results even though Fortran default integer queries should materialize as default-kind scalars, and scalar/component assignment lowering skipped mixed-width coercion at ordinary store sites; realworld_shape_guard.f90 now proves the default-kind runtime-shape path through real allocatable metadata, loop bounds, and deterministic objects
fixed: backend MovReg emission did not handle x -> w truncation views, so real-world default-kind array-query assignments could produce invalid asm like mov w21, x20; the new runtime-shape audit keeps that truncation surface honest
fixed: fpm-style suffix classification in realworld_suffix_scan.f90 first needed fixed-length character arrays to lower as real element-addressable arrays instead of scalar strings, then needed scalar character dummy intrinsics like INDEX(name, suffix) to carry a real runtime length story through afs_c_strlen, and finally exposed two optimizer-side bugs: mixed-width GEP offsets were being compared without pointee-size scaling in alias analysis, and SROA was scalarizing aggregates even when a GEP address escaped through the synthesized descriptor. The program is now a passing audit probe with cross-opt equality plus IR/asm/object/run reproducibility
fixed: dummy-array SIZE(...) queries in realworld_assumed_shape_size.f90 were not actually using assumed-shape descriptors because bare (:) dummy declarations lowered through the Deferred array-spec path, and then the synthesized descriptor lost rank and flags at -O2+ because mixed-width GEP offsets in alias analysis made descriptor field stores look like they overlapped. Assumed-shape dummies now classify as descriptor-backed, query results are kept in default-kind scalars, and the real-world canary now passes across optimization levels with deterministic artifacts
proven: LICM hoists invariant scalar dummy loads out of a real-world affine update loop in realworld_affine_shift.f90 once BCE clears the loop body
proven: GVN reduces duplicated branch-join PURE helper calls in realworld_join_bias_sum.f90 at -O2+ instead of recomputing the same affine helper result through the join
proven: DSE removes the dead seed store in realworld_seed_overwrite.f90 across the intervening noalias helper call while preserving the real fill
proven: SROA scalarizes the fixed tap buffer in realworld_binomial_blend.f90 and BCE clears the corresponding safe stencil bounds checks at -O2+, giving us another living real-world audit kernel for the small-aggregate path
proven: loop-legality audit kernels realworld_inplace_prefix.f90 and realworld_inplace_symmix.f90 stay runtime-correct, cross-opt-equal, and deterministic across IR/object/binary surfaces
proven: the 29.9 single-file story now has real-world audit coverage for ELEMENTAL lowering plus DO CONCURRENT bulk redirection (realworld_elemental_stage.f90), intramodule IPO helper trimming (realworld_ipo_chain.f90), small-loop DO CONCURRENT exploitation (realworld_doconc_square.f90), and explicit-DO vectorization onto the bulk runtime kernels (realworld_vector_stage.f90)
separately deferred parser gap: typed character array constructors using an explicit type-spec inside []

Current audit corpus snapshot:

183 top-level test_programs/*.f90 runtime corpus programs
172 programs with CHECK
30 programs with IR_CHECK
6 programs with IR_NOT
0 living XFAILs

Brutal Audit Priorities

1. 29.8 Optimizer Proof

Grow adversarial real-world coverage for:

GVN
SROA
BCE
local and cross-block LSF
LICM
loop legality transforms

2. 29.9 Claims Audit

Prove the current single-file story honestly:

PURE/ELEMENTAL exploitation
DO CONCURRENT exploitation
intramodule IPO
vectorization/runtime-kernel redirection

Also keep proving what is still absent:

cross-module IPO
whole-program analysis
general native vectorizer

3. Binary Correctness & Determinism

For representative real-world programs:

object snapshots deterministic at optimized levels
linked binaries byte-identical when rebuilt at the same output path
no LC_UUID
runtime behavior equal across optimization levels unless explicitly exempted

Success Condition

Sprint 29 closes when:

the promised optimizer/runtime surface has living proof
the remaining holes are few, explicit, and written down
the test suite gives us confidence in IR, binary, determinism, and integration rather than only stdout

View source

  
        1
        # Sprint 29.11: Full Sprint 29 Audit
      
        2
        
        3
        ## Why This Sprint Exists
      
        4
        
        5
        Sprint 29 no longer needs another cleanup/build-out sub-sprint. It needs an
      
        6
        exhaustive audit.
      
        7
        
        8
        29.10 closed the hidden implementation gaps that were still preventing an honest
      
        9
        "mostly done" story. What remains now is proving that the optimizer surface we
      
        10
        claim to have is actually correct:
      
        11
        - across IR, assembly, objects, linked binaries, and runtime behavior
      
        12
        - across `-O0/-O1/-O2/-O3/-Os/-Ofast`
      
        13
        - across real-world-style Fortran programs, not just isolated micro-reproducers
      
        14
        
        15
        This sprint is the formal closeout audit for all of Sprint 29, including
      
        16
        29.5-29.10.
      
        17
        
        18
        ## Audit Scope
      
        19
        
        20
        Sources of truth:
      
        21
        - [Sprint 29](sprint29.md)
      
        22
        - [Sprint 29.5](sprint29_5.md)
      
        23
        - [Sprint 29.6](sprint29_6.md)
      
        24
        - [Sprint 29.7](sprint29_7.md)
      
        25
        - [Sprint 29.8](sprint29_8.md)
      
        26
        - [Sprint 29.9](sprint29_9.md)
      
        27
        - [Sprint 29.10](sprint29_10.md)
      
        28
        
        29
        Each promised item should end the audit in one of three states:
      
        30
        1. proven working with living tests
      
        31
        2. explicitly deferred with a written dependency
      
        32
        3. captured by a living XFAIL or audit regression
      
        33
        
        34
        ## Audit Rules
      
        35
        
        36
        - Prefer real Fortran programs in `test_programs/` over only unit-level IR toys.
      
        37
        - Every new audit program should try to prove more than stdout:
      
        38
          - runtime result correctness
      
        39
          - IR/asm/object presence where relevant
      
        40
          - object or linked-binary determinism where relevant
      
        41
          - cross-opt equality unless the test is intentionally opt-sensitive
      
        42
        - Use `.refs` as the sanity-check source for hotspot shapes and realistic coding
      
        43
          patterns, especially stdlib/fpm-style loops, source scanners, and numerical kernels.
      
        44
        - When the audit finds a real bug, land the smallest honest fix plus a regression.
      
        45
        - When the audit finds a real bug that is not fixed immediately, record it in
      
        46
          `noted_items.md` and capture it as a living `XFAIL` where possible.
      
        47
        
        48
        ## Initial Tranche
      
        49
        
        50
        Kickoff items:
      
        51
        - source-level audit for helper-before-program entry lowering / dead-function root handling
      
        52
        - source-level audit for module procedure host-association over module globals
      
        53
        - real-world stdlib-style kernel:
      
        54
          - tridiagonal sparse matvec (`realworld_tridiag_spmv.f90`)
      
        55
        - real-world BLAS-style kernel:
      
        56
          - axpy + reduction (`realworld_axpy_reduce.f90`)
      
        57
        - real-world fpm-style application logic:
      
        58
          - source suffix classification (`realworld_suffix_scan.f90`)
      
        59
        
        60
        The passing real-world programs should prove:
      
        61
        - runtime correctness
      
        62
        - phase triangulation (`ir|asm|obj|repro`)
      
        63
        - cross-opt equality
      
        64
        - deterministic object snapshots
      
        65
        - deterministic linked binaries without Mach-O UUID noise
      
        66
        
        67
        Current audit findings:
      
        68
        - fixed: helper-before-program lowering / dead-function rooting could drop the
      
        69
          true `__prog_*` entry or make `_main` call the wrong helper first
      
        70
        - fixed: named-parameter local fixed-array extents were folded as `(1, 1)` in
      
        71
          lowering, which made real-world kernels like `realworld_axpy_reduce.f90` and
      
        72
          `realworld_tridiag_spmv.f90` trip bogus bounds checks
      
        73
        - fixed: ordinary load-bearing loops were entering an unsafe full-unroll path,
      
        74
          which miscompiled `realworld_axpy_reduce.f90` at `-O2/-O3/-Ofast`; the
      
        75
          unroller is now hardened to keep that shape out while preserving the proven
      
        76
          `DO CONCURRENT` full-unroll path
      
        77
        - fixed: BCE only recognized the canonical bare loop IV, so real-world counted
      
        78
          loops with safe `iv +/- const` array accesses kept redundant bounds checks at
      
        79
          `-O2+`; the audit kernels `realworld_sasum_cleanup.f90` and
      
        80
          `realworld_three_point_apply.f90` now prove the offset-IV case and keep SROA
      
        81
          honest at the same time
      
        82
        - fixed: cross-block LSF treated every call as a universal memory clobber, so
      
        83
          branch-join reuse through a noalias helper side path stayed as a reload at
      
        84
          `-O2+`; `realworld_noalias_reuse.f90` now proves the noalias-call case and
      
        85
          also keeps the local same-block reuse path honest
      
        86
        - fixed: contained procedures only partially inherited host-associated
      
        87
          `parameter` constants during lowering, so dummy array extents and loop bounds
      
        88
          like `x(n)`, `y(n)`, and `do i = 1, n` could degrade inside real-world helper
      
        89
          kernels; `realworld_seed_overwrite.f90` now proves that host-param-backed
      
        90
          dummy extents and loop bounds stay intact
      
        91
        - fixed: backend `ICmp` lowering could emit mixed-width GP compares like
      
        92
          `cmp w26, x23` when the IR compared a 32-bit induction value against a 64-bit
      
        93
          bound; `realworld_ipo_chain.f90` now keeps the compare-width harmonization
      
        94
          honest through a real helper-chain compile at `-O2+`
      
        95
        - fixed: module procedures were still missing host association over their own
      
        96
          module globals, so small cases like `call bump()` could silently leave a
      
        97
          shared module variable unchanged. `module_global_host_assoc.f90` is now a
      
        98
          passing audit program with cross-opt equality plus asm/object/run reproducibility,
      
        99
          and `tests/module_host_audit.rs` proves the raw IR resolves the shared module
      
        100
          global inside the procedure body
      
        101
        - fixed: extended `OPEN` lowering built the runtime control block by storing
      
        102
          typed fields through a byte-pointer GEP, which first tripped IR verification
      
        103
          for `position='append'` and then, after the verifier fix, still wrote fields
      
        104
          at scaled-by-element-size offsets. `io_append_log.f90` is now a passing file
      
        105
          oracle with append rerun coverage plus asm/object reproducibility and
      
        106
          cross-opt equality
      
        107
        - fixed: descriptor-backed array query intrinsics (`SIZE`, `LBOUND`, `UBOUND`)
      
        108
          were lowered as raw `i64` runtime results even though Fortran default integer
      
        109
          queries should materialize as default-kind scalars, and scalar/component
      
        110
          assignment lowering skipped mixed-width coercion at ordinary store sites;
      
        111
          `realworld_shape_guard.f90` now proves the default-kind runtime-shape path
      
        112
          through real allocatable metadata, loop bounds, and deterministic objects
      
        113
        - fixed: backend `MovReg` emission did not handle `x -> w` truncation views, so
      
        114
          real-world default-kind array-query assignments could produce invalid asm like
      
        115
          `mov w21, x20`; the new runtime-shape audit keeps that truncation surface
      
        116
          honest
      
        117
        - fixed: fpm-style suffix classification in `realworld_suffix_scan.f90` first
      
        118
          needed fixed-length character arrays to lower as real element-addressable
      
        119
          arrays instead of scalar strings, then needed scalar character dummy intrinsics
      
        120
          like `INDEX(name, suffix)` to carry a real runtime length story through
      
        121
          `afs_c_strlen`, and finally exposed two optimizer-side bugs: mixed-width GEP
      
        122
          offsets were being compared without pointee-size scaling in alias analysis,
      
        123
          and SROA was scalarizing aggregates even when a GEP address escaped through
      
        124
          the synthesized descriptor. The program is now a passing audit probe with
      
        125
          cross-opt equality plus IR/asm/object/run reproducibility
      
        126
        - fixed: dummy-array `SIZE(...)` queries in `realworld_assumed_shape_size.f90`
      
        127
          were not actually using assumed-shape descriptors because bare `(:)` dummy
      
        128
          declarations lowered through the `Deferred` array-spec path, and then the
      
        129
          synthesized descriptor lost `rank` and `flags` at `-O2+` because mixed-width
      
        130
          GEP offsets in alias analysis made descriptor field stores look like they
      
        131
          overlapped. Assumed-shape dummies now classify as descriptor-backed, query
      
        132
          results are kept in default-kind scalars, and the real-world canary now
      
        133
          passes across optimization levels with deterministic artifacts
      
        134
        - proven: LICM hoists invariant scalar dummy loads out of a real-world affine
      
        135
          update loop in `realworld_affine_shift.f90` once BCE clears the loop body
      
        136
        - proven: GVN reduces duplicated branch-join PURE helper calls in
      
        137
          `realworld_join_bias_sum.f90` at `-O2+` instead of recomputing the same affine
      
        138
          helper result through the join
      
        139
        - proven: DSE removes the dead seed store in `realworld_seed_overwrite.f90`
      
        140
          across the intervening noalias helper call while preserving the real fill
      
        141
        - proven: SROA scalarizes the fixed tap buffer in `realworld_binomial_blend.f90`
      
        142
          and BCE clears the corresponding safe stencil bounds checks at `-O2+`, giving
      
        143
          us another living real-world audit kernel for the small-aggregate path
      
        144
        - proven: loop-legality audit kernels `realworld_inplace_prefix.f90` and
      
        145
          `realworld_inplace_symmix.f90` stay runtime-correct, cross-opt-equal, and
      
        146
          deterministic across IR/object/binary surfaces
      
        147
        - proven: the 29.9 single-file story now has real-world audit coverage for
      
        148
          ELEMENTAL lowering plus DO CONCURRENT bulk redirection
      
        149
          (`realworld_elemental_stage.f90`), intramodule IPO helper trimming
      
        150
          (`realworld_ipo_chain.f90`), small-loop DO CONCURRENT exploitation
      
        151
          (`realworld_doconc_square.f90`), and explicit-DO vectorization onto the bulk
      
        152
          runtime kernels (`realworld_vector_stage.f90`)
      
        153
        - separately deferred parser gap: typed character array constructors using an
      
        154
          explicit type-spec inside `[]`
      
        155
        
        156
        Current audit corpus snapshot:
      
        157
        - `183` top-level `test_programs/*.f90` runtime corpus programs
      
        158
        - `172` programs with `CHECK`
      
        159
        - `30` programs with `IR_CHECK`
      
        160
        - `6` programs with `IR_NOT`
      
        161
        - `0` living `XFAIL`s
      
        162
        
        163
        ## Brutal Audit Priorities
      
        164
        
        165
        ### 1. 29.8 Optimizer Proof
      
        166
        
        167
        Grow adversarial real-world coverage for:
      
        168
        - GVN
      
        169
        - SROA
      
        170
        - BCE
      
        171
        - local and cross-block LSF
      
        172
        - LICM
      
        173
        - loop legality transforms
      
        174
        
        175
        ### 2. 29.9 Claims Audit
      
        176
        
        177
        Prove the current single-file story honestly:
      
        178
        - PURE/ELEMENTAL exploitation
      
        179
        - DO CONCURRENT exploitation
      
        180
        - intramodule IPO
      
        181
        - vectorization/runtime-kernel redirection
      
        182
        
        183
        Also keep proving what is still absent:
      
        184
        - cross-module IPO
      
        185
        - whole-program analysis
      
        186
        - general native vectorizer
      
        187
        
        188
        ### 3. Binary Correctness & Determinism
      
        189
        
        190
        For representative real-world programs:
      
        191
        - object snapshots deterministic at optimized levels
      
        192
        - linked binaries byte-identical when rebuilt at the same output path
      
        193
        - no `LC_UUID`
      
        194
        - runtime behavior equal across optimization levels unless explicitly exempted
      
        195
        
        196
        ## Success Condition
      
        197
        
        198
        Sprint 29 closes when:
      
        199
        - the promised optimizer/runtime surface has living proof
      
        200
        - the remaining holes are few, explicit, and written down
      
        201
        - the test suite gives us confidence in IR, binary, determinism, and integration
      
        202
          rather than only stdout

1	# Sprint 29.11: Full Sprint 29 Audit
2
3	## Why This Sprint Exists
4
5	Sprint 29 no longer needs another cleanup/build-out sub-sprint. It needs an
6	exhaustive audit.
7
8	29.10 closed the hidden implementation gaps that were still preventing an honest
9	"mostly done" story. What remains now is proving that the optimizer surface we
10	claim to have is actually correct:
11	- across IR, assembly, objects, linked binaries, and runtime behavior
12	- across `-O0/-O1/-O2/-O3/-Os/-Ofast`
13	- across real-world-style Fortran programs, not just isolated micro-reproducers
14
15	This sprint is the formal closeout audit for all of Sprint 29, including
16	29.5-29.10.
17
18	## Audit Scope
19
20	Sources of truth:
21	- [Sprint 29](sprint29.md)
22	- [Sprint 29.5](sprint29_5.md)
23	- [Sprint 29.6](sprint29_6.md)
24	- [Sprint 29.7](sprint29_7.md)
25	- [Sprint 29.8](sprint29_8.md)
26	- [Sprint 29.9](sprint29_9.md)
27	- [Sprint 29.10](sprint29_10.md)
28
29	Each promised item should end the audit in one of three states:
30	1. proven working with living tests
31	2. explicitly deferred with a written dependency
32	3. captured by a living XFAIL or audit regression
33
34	## Audit Rules
35
36	- Prefer real Fortran programs in `test_programs/` over only unit-level IR toys.
37	- Every new audit program should try to prove more than stdout:
38	- runtime result correctness
39	- IR/asm/object presence where relevant
40	- object or linked-binary determinism where relevant
41	- cross-opt equality unless the test is intentionally opt-sensitive
42	- Use `.refs` as the sanity-check source for hotspot shapes and realistic coding
43	patterns, especially stdlib/fpm-style loops, source scanners, and numerical kernels.
44	- When the audit finds a real bug, land the smallest honest fix plus a regression.
45	- When the audit finds a real bug that is not fixed immediately, record it in
46	`noted_items.md` and capture it as a living `XFAIL` where possible.
47
48	## Initial Tranche
49
50	Kickoff items:
51	- source-level audit for helper-before-program entry lowering / dead-function root handling
52	- source-level audit for module procedure host-association over module globals
53	- real-world stdlib-style kernel:
54	- tridiagonal sparse matvec (`realworld_tridiag_spmv.f90`)
55	- real-world BLAS-style kernel:
56	- axpy + reduction (`realworld_axpy_reduce.f90`)
57	- real-world fpm-style application logic:
58	- source suffix classification (`realworld_suffix_scan.f90`)
59
60	The passing real-world programs should prove:
61	- runtime correctness
62	- phase triangulation (`ir\|asm\|obj\|repro`)
63	- cross-opt equality
64	- deterministic object snapshots
65	- deterministic linked binaries without Mach-O UUID noise
66
67	Current audit findings:
68	- fixed: helper-before-program lowering / dead-function rooting could drop the
69	true `__prog_*` entry or make `_main` call the wrong helper first
70	- fixed: named-parameter local fixed-array extents were folded as `(1, 1)` in
71	lowering, which made real-world kernels like `realworld_axpy_reduce.f90` and
72	`realworld_tridiag_spmv.f90` trip bogus bounds checks
73	- fixed: ordinary load-bearing loops were entering an unsafe full-unroll path,
74	which miscompiled `realworld_axpy_reduce.f90` at `-O2/-O3/-Ofast`; the
75	unroller is now hardened to keep that shape out while preserving the proven
76	`DO CONCURRENT` full-unroll path
77	- fixed: BCE only recognized the canonical bare loop IV, so real-world counted
78	loops with safe `iv +/- const` array accesses kept redundant bounds checks at
79	`-O2+`; the audit kernels `realworld_sasum_cleanup.f90` and
80	`realworld_three_point_apply.f90` now prove the offset-IV case and keep SROA
81	honest at the same time
82	- fixed: cross-block LSF treated every call as a universal memory clobber, so
83	branch-join reuse through a noalias helper side path stayed as a reload at
84	`-O2+`; `realworld_noalias_reuse.f90` now proves the noalias-call case and
85	also keeps the local same-block reuse path honest
86	- fixed: contained procedures only partially inherited host-associated
87	`parameter` constants during lowering, so dummy array extents and loop bounds
88	like `x(n)`, `y(n)`, and `do i = 1, n` could degrade inside real-world helper
89	kernels; `realworld_seed_overwrite.f90` now proves that host-param-backed
90	dummy extents and loop bounds stay intact
91	- fixed: backend `ICmp` lowering could emit mixed-width GP compares like
92	`cmp w26, x23` when the IR compared a 32-bit induction value against a 64-bit
93	bound; `realworld_ipo_chain.f90` now keeps the compare-width harmonization
94	honest through a real helper-chain compile at `-O2+`
95	- fixed: module procedures were still missing host association over their own
96	module globals, so small cases like `call bump()` could silently leave a
97	shared module variable unchanged. `module_global_host_assoc.f90` is now a
98	passing audit program with cross-opt equality plus asm/object/run reproducibility,
99	and `tests/module_host_audit.rs` proves the raw IR resolves the shared module
100	global inside the procedure body
101	- fixed: extended `OPEN` lowering built the runtime control block by storing
102	typed fields through a byte-pointer GEP, which first tripped IR verification
103	for `position='append'` and then, after the verifier fix, still wrote fields
104	at scaled-by-element-size offsets. `io_append_log.f90` is now a passing file
105	oracle with append rerun coverage plus asm/object reproducibility and
106	cross-opt equality
107	- fixed: descriptor-backed array query intrinsics (`SIZE`, `LBOUND`, `UBOUND`)
108	were lowered as raw `i64` runtime results even though Fortran default integer
109	queries should materialize as default-kind scalars, and scalar/component
110	assignment lowering skipped mixed-width coercion at ordinary store sites;
111	`realworld_shape_guard.f90` now proves the default-kind runtime-shape path
112	through real allocatable metadata, loop bounds, and deterministic objects
113	- fixed: backend `MovReg` emission did not handle `x -> w` truncation views, so
114	real-world default-kind array-query assignments could produce invalid asm like
115	`mov w21, x20`; the new runtime-shape audit keeps that truncation surface
116	honest
117	- fixed: fpm-style suffix classification in `realworld_suffix_scan.f90` first
118	needed fixed-length character arrays to lower as real element-addressable
119	arrays instead of scalar strings, then needed scalar character dummy intrinsics
120	like `INDEX(name, suffix)` to carry a real runtime length story through
121	`afs_c_strlen`, and finally exposed two optimizer-side bugs: mixed-width GEP
122	offsets were being compared without pointee-size scaling in alias analysis,
123	and SROA was scalarizing aggregates even when a GEP address escaped through
124	the synthesized descriptor. The program is now a passing audit probe with
125	cross-opt equality plus IR/asm/object/run reproducibility
126	- fixed: dummy-array `SIZE(...)` queries in `realworld_assumed_shape_size.f90`
127	were not actually using assumed-shape descriptors because bare `(:)` dummy
128	declarations lowered through the `Deferred` array-spec path, and then the
129	synthesized descriptor lost `rank` and `flags` at `-O2+` because mixed-width
130	GEP offsets in alias analysis made descriptor field stores look like they
131	overlapped. Assumed-shape dummies now classify as descriptor-backed, query
132	results are kept in default-kind scalars, and the real-world canary now
133	passes across optimization levels with deterministic artifacts
134	- proven: LICM hoists invariant scalar dummy loads out of a real-world affine
135	update loop in `realworld_affine_shift.f90` once BCE clears the loop body
136	- proven: GVN reduces duplicated branch-join PURE helper calls in
137	`realworld_join_bias_sum.f90` at `-O2+` instead of recomputing the same affine
138	helper result through the join
139	- proven: DSE removes the dead seed store in `realworld_seed_overwrite.f90`
140	across the intervening noalias helper call while preserving the real fill
141	- proven: SROA scalarizes the fixed tap buffer in `realworld_binomial_blend.f90`
142	and BCE clears the corresponding safe stencil bounds checks at `-O2+`, giving
143	us another living real-world audit kernel for the small-aggregate path
144	- proven: loop-legality audit kernels `realworld_inplace_prefix.f90` and
145	`realworld_inplace_symmix.f90` stay runtime-correct, cross-opt-equal, and
146	deterministic across IR/object/binary surfaces
147	- proven: the 29.9 single-file story now has real-world audit coverage for
148	ELEMENTAL lowering plus DO CONCURRENT bulk redirection
149	(`realworld_elemental_stage.f90`), intramodule IPO helper trimming
150	(`realworld_ipo_chain.f90`), small-loop DO CONCURRENT exploitation
151	(`realworld_doconc_square.f90`), and explicit-DO vectorization onto the bulk
152	runtime kernels (`realworld_vector_stage.f90`)
153	- separately deferred parser gap: typed character array constructors using an
154	explicit type-spec inside `[]`
155
156	Current audit corpus snapshot:
157	- `183` top-level `test_programs/*.f90` runtime corpus programs
158	- `172` programs with `CHECK`
159	- `30` programs with `IR_CHECK`
160	- `6` programs with `IR_NOT`
161	- `0` living `XFAIL`s
162
163	## Brutal Audit Priorities
164
165	### 1. 29.8 Optimizer Proof
166
167	Grow adversarial real-world coverage for:
168	- GVN
169	- SROA
170	- BCE
171	- local and cross-block LSF
172	- LICM
173	- loop legality transforms
174
175	### 2. 29.9 Claims Audit
176
177	Prove the current single-file story honestly:
178	- PURE/ELEMENTAL exploitation
179	- DO CONCURRENT exploitation
180	- intramodule IPO
181	- vectorization/runtime-kernel redirection
182
183	Also keep proving what is still absent:
184	- cross-module IPO
185	- whole-program analysis
186	- general native vectorizer
187
188	### 3. Binary Correctness & Determinism
189
190	For representative real-world programs:
191	- object snapshots deterministic at optimized levels
192	- linked binaries byte-identical when rebuilt at the same output path
193	- no `LC_UUID`
194	- runtime behavior equal across optimization levels unless explicitly exempted
195
196	## Success Condition
197
198	Sprint 29 closes when:
199	- the promised optimizer/runtime surface has living proof
200	- the remaining holes are few, explicit, and written down
201	- the test suite gives us confidence in IR, binary, determinism, and integration
202	rather than only stdout