markdown · 16954 bytes Raw Blame History

bencch

Generic compiler bench, with armfortas as the first rich adapter.

This repo holds:

  • bench-core/ — bench-owned compiler-facing types
  • bench/ — the bencch / afs-tests runner
  • suites/ — authored bench suites
  • fixtures/ — reusable fixture programs
  • reports/ — failure and consistency bundles

Current Setup

bencch now has its own workspace manifest and public CLI.

CLI-side compiler and tool paths are overridable. Rich linked armfortas capture still needs an armfortas checkout, but Sprint 13 now gives that a real bootstrap path instead of assuming bencch is embedded as a submodule.

Embedded usage still works:

cargo run -p afs-tests --bin bencch -- list
cargo run -p afs-tests --bin bencch -- run --suite frontend

Standalone linked usage now works through a generated local workspace:

scripts/bootstrap-linked-armfortas.sh /path/to/armfortas
cargo run --manifest-path .bencch-local/Cargo.toml -p afs-tests --bin bencch -- doctor

That generated path keeps linked capture working and makes doctor report the actual linked armfortas checkout instead of assuming bencch is embedded.

Standalone external-only usage now works through a second generated workspace:

scripts/bootstrap-standalone-external.sh
cargo run --manifest-path .bencch-external/Cargo.toml -p afs-tests --bin bencch -- doctor

That mode drops linked capture entirely and keeps the generic external-driver surface available for compare, introspect, and external-facing run work.

Example external-only introspection:

cargo run --manifest-path .bencch-external/Cargo.toml -p afs-tests --bin bencch -- introspect fixtures/fake_compilers/match_42_a.sh fixtures/runtime/if_else.f90 --artifact asm,runtime

Example external-only authored suite run:

cargo run --manifest-path .bencch-external/Cargo.toml -p afs-tests --bin bencch -- run --suite v2/generic-introspect --case fake_compiler_runtime --all

Example external-only authored compare matrix:

cargo run --manifest-path .bencch-external/Cargo.toml -p afs-tests --bin bencch -- run --suite v2/generic-compare --case fake_compilers_match_matrix --all

Example external-only authored differential matrix:

cargo run --manifest-path .bencch-external/Cargo.toml -p afs-tests --bin bencch -- run --suite v2/generic-differential --all

Example external-only authored consistency matrix:

cargo run --manifest-path .bencch-external/Cargo.toml -p afs-tests --bin bencch -- run --suite v2/generic-consistency --all

Example external-only authored failure matrix:

cargo run --manifest-path .bencch-external/Cargo.toml -p afs-tests --bin bencch -- run --suite v2/generic-failure-matrix --case fake_compiler_expected_diagnostic_matrix --all

Legacy rich-stage suites still need linked capture. In an external-only build, they now fail early with a direct message telling you to use scripts/bootstrap-linked-armfortas.sh.

Usage

List suites:

cargo run -p afs-tests --bin bencch -- list

List suites with case-level capability discovery:

cargo run -p afs-tests --bin bencch -- list --suite v2/generic --verbose

Run one suite family:

cargo run -p afs-tests --bin bencch -- run --suite consistency/runtime

Inspect the current embedded/standalone posture:

cargo run -p afs-tests --bin bencch -- doctor

doctor now also lists the generic artifacts and namespaced adapter extras that each named compiler surface can provide in the current build.

The built-in named compiler set now includes armfortas, gfortran, flang-new, lfortran, ifort, ifx, and nvfortran / pgfortran, plus any explicit compiler path you pass to compare or introspect.

It also probes each named compiler surface a bit more deeply now:

  • probe_status like linked, invokable, or missing
  • probe_resolved_path
  • probe_banner when a version/help probe returns something useful

Write the same doctor snapshot to JSON and Markdown:

cargo run -p afs-tests --bin bencch -- doctor --json-report reports/doctor.json --markdown-report reports/doctor.md

The JSON report now includes structured sections for workspace, named compiler surfaces, tools, and mode, while keeping the flat field map too.

list --verbose now echoes the same probe posture for suite-v2 generic cases, so capability-blocked authored cases show both:

  • why the request is blocked or deferred
  • what compiler binary or linked surface would have been used

Generate a local linked workspace against an external armfortas checkout:

scripts/bootstrap-linked-armfortas.sh /path/to/armfortas

Then run bencch through that generated workspace:

cargo run --manifest-path .bencch-local/Cargo.toml -p afs-tests --bin bencch -- list

Compare two compilers on one program:

cargo run -p afs-tests --bin bencch -- compare armfortas gfortran --program fixtures/runtime/mixed_types.f90

Compare named compilers with an explicit armfortas binary:

cargo run -p afs-tests --bin bencch -- compare armfortas gfortran --program fixtures/runtime/if_else.f90 --armfortas-bin ../target/debug/armfortas

The same compare surface works across opt levels too:

cargo run -p afs-tests --bin bencch -- compare armfortas gfortran --opt O2 --program fixtures/runtime/mixed_types.f90 --armfortas-bin ../target/debug/armfortas

Compare with an extra artifact diff:

cargo run -p afs-tests --bin bencch -- compare armfortas gfortran --program fixtures/runtime/mixed_types.f90 --artifact asm

Compare two explicit compiler binaries:

cargo run -p afs-tests --bin bencch -- compare /path/to/one /path/to/other --program fixtures/runtime/mixed_types.f90 --artifact asm,obj

Namespaced adapter artifacts are allowed in compare too, but only when both compiler surfaces can actually provide them. If not, bencch fails early with an explicit capability message.

Introspect one compiler on one program:

cargo run -p afs-tests --bin bencch -- introspect armfortas fixtures/runtime/mixed_types.f90

Introspect a rich armfortas stage explicitly:

cargo run -p afs-tests --bin bencch -- introspect armfortas fixtures/runtime/mixed_types.f90 --artifact armfortas.ir,asm

Introspect the full linked armfortas stage surface:

cargo run -p afs-tests --bin bencch -- introspect armfortas fixtures/runtime/mixed_types.f90 --all

Trim large introspection sections to a readable preview:

cargo run -p afs-tests --bin bencch -- introspect armfortas fixtures/runtime/if_else.f90 --all --max-artifact-lines 12

Keep only section summaries and omit artifact bodies:

cargo run -p afs-tests --bin bencch -- introspect armfortas fixtures/runtime/if_else.f90 --all --summary-only

Introspect a named external compiler on the generic surface:

cargo run -p afs-tests --bin bencch -- introspect gfortran fixtures/runtime/if_else.f90 --artifact asm,obj,runtime

If you request artifacts that a compiler surface cannot provide, bencch fails early with a capability message instead of pretending the compiler failed mid-pipeline.

Introspect an explicit compiler path on that same generic surface:

cargo run -p afs-tests --bin bencch -- introspect /path/to/compiler fixtures/runtime/if_else.f90 --artifact asm,obj,runtime

Introspect a failing armfortas source and keep the partial capture:

cargo run -p afs-tests --bin bencch -- introspect armfortas fixtures/invalid/parse_error.f90 --artifact armfortas.tokens,armfortas.ir,asm

Run against an explicit compiler binary:

cargo run -p afs-tests --bin bencch -- run --suite consistency/runtime-control-flow --armfortas-bin ./target/debug/armfortas

Run an asm/object surface through an explicit compiler binary:

cargo run -p afs-tests --bin bencch -- run --suite backend/asm --case runtime_wrapper_and_calls --armfortas-bin ./target/debug/armfortas

Run differential checks with explicit reference compiler paths:

cargo run -p afs-tests --bin bencch -- run --suite differential/runtime-control-flow --gfortran-bin /opt/homebrew/bin/gfortran --flang-bin /opt/homebrew/bin/flang-new

Run one case with full stage capture:

cargo run -p afs-tests --bin bencch -- run --suite frontend --case stage_walk --all --verbose

Write machine-readable reports:

cargo run -p afs-tests --bin bencch -- run --suite modules --all --json-report reports/modules.json --markdown-report reports/modules.md

Run consistency coverage:

cargo run -p afs-tests --bin bencch -- run --suite consistency --all

Run differential coverage:

cargo run -p afs-tests --bin bencch -- run --suite differential

Reports are written under reports/.

compare now prints a short summary block with status, divergence classification, basis, difference count, changed artifacts, and the backend used on each side before any per-artifact diffs.

introspect now groups portable outputs like asm, obj, and runtime separately from adapter extras like armfortas.ir and armfortas.tokens in text, JSON, and Markdown output, and it now reports requested, captured, and missing artifacts at the top of the report. Failure-side introspection also surfaces the failure stage when the adapter knows it, plus a short diagnostic excerpt before the full diagnostics block. For large captures, --summary-only and --max-artifact-lines <n> keep the text and Markdown surfaces readable. JSON reports keep the full artifact bodies and now add compact artifact_summaries alongside them for quick scanning.

Environment overrides work too:

BENCCH_ARMFORTAS_BIN=./target/debug/armfortas cargo run -p afs-tests --bin bencch -- run --suite consistency/object

Backend choice is visible in:

  • cargo run -p afs-tests --bin bencch -- doctor
  • --verbose case runs
  • JSON and Markdown reports as primary_backend
  • bundle metadata.txt and armfortas/metadata.txt

Suite Format

Suites are plain text files under suites/.

suite "consistency/runtime"

case "mixed_types_cli_run_reproducible"
source "../../fixtures/runtime/mixed_types.f90"
opts => all
armfortas => run
repeat => 3
consistency => cli_run_reproducible
expect run.stdout check-comments
expect run.exit_code equals 0
end

The new suite-v2 generic surface can target any compiler spec the same way bencch introspect does:

suite "v2/generic-introspect"

case "fake_compiler_runtime_matrix"
source "../../fixtures/runtime/if_else.f90"
opts => O0, O1, O2
compiler "../../fixtures/fake_compilers/match_42_a.sh" => asm, runtime
expect asm contains ".globl _main"
expect run.stdout contains "42"
expect run.exit_code equals 0
end

Generic compiler cases can also lean on references and CLI-style reproducibility checks:

suite "v2/generic-differential"

case "gfortran_runtime_matrix"
source "../../fixtures/runtime/if_else.f90"
opts => O0, O1, O2
compiler gfortran => runtime
differential => flang-new
expect run.stdout check-comments
expect run.exit_code equals 0
end
suite "v2/generic-consistency"

case "fake_compiler_runtime_matrix"
source "../../fixtures/runtime/if_else.f90"
opts => O0, O1, O2
repeat => 3
compiler "../../fixtures/fake_compilers/match_42_a.sh" => asm, runtime
consistency => cli_asm_reproducible, cli_run_reproducible
expect asm contains ".globl _main"
expect run.stdout contains "42"
expect run.exit_code equals 0
end

For mem2reg-branch compatibility, check-comments on armfortas.ir understands inline ! IR_CHECK: and ! IR_NOT: annotations, while run.stdout check-comments keeps using the usual ! CHECK: lines.

Two more opt-in bridges exist for imported mem2reg-style audits:

  • expect-fail comments reads ! ERROR_EXPECTED: lines from the source
  • xfail comments reads the first ! XFAIL: line from the source

Those compose the same way the old mem2reg harness did: a case can keep a source-owned expected diagnostic and still remain xfail until trunk starts producing that diagnostic correctly.

Suite-v2 can also drive the generic compare engine:

suite "v2/generic-compare"

case "fake_compilers_match_matrix"
source "../../fixtures/runtime/if_else.f90"
opts => O0, O1, O2
compare "../../fixtures/fake_compilers/match_42_a.sh" "../../fixtures/fake_compilers/match_42_b.sh" => asm
expect compare.status equals "match"
expect compare.classification equals "match"
expect compare.difference_count equals 0
end

Suite-v2 unhappy paths can use the same generic engine too:

suite "v2/generic-failures"

case "fake_compiler_expected_diagnostic"
source "../../fixtures/invalid/fake_compile_fail_expected.f90"
compiler "../../fixtures/fake_compilers/compile_fail.sh" => diagnostics
expect-fail comments
end

case "armfortas_parse_error"
source "../../fixtures/invalid/parse_error.f90"
compiler armfortas => diagnostics
expect-fail parser contains "expected entity name"
end

And they can be matrixed the same way as the happy-path suites:

suite "v2/generic-failure-matrix"

case "fake_compilers_compile_divergence_matrix"
source "../../fixtures/runtime/if_else.f90"
opts => O0, O1, O2
compare "../../fixtures/fake_compilers/compile_fail.sh" "../../fixtures/fake_compilers/match_42_a.sh" => diagnostics
expect compare.status equals "diff"
expect compare.classification equals "compile divergence"
expect compare.difference_count equals 2
end

If a suite-v2 case is intentionally blocked by adapter capability limits, you can say so directly:

suite "v2/capability-policy"

case "gfortran_armfortas_ir_future"
source "../../fixtures/runtime/if_else.f90"
compiler gfortran => armfortas.ir
future capability "generic gfortran surface has no armfortas extras"
end

case "mixed_surface_ir_compare_xfail"
source "../../fixtures/runtime/if_else.f90"
compare armfortas gfortran => armfortas.ir
xfail capability "mixed-surface namespaced compare stays soft for now"
end

Namespaced armfortas artifacts can be matrixed too:

suite "v2/armfortas-namespace-matrix"

case "if_else_frontend_matrix"
source "../../fixtures/runtime/if_else.f90"
opts => O0, O1, O2
compiler armfortas => armfortas.tokens, armfortas.ast, armfortas.sema
expect armfortas.tokens contains "\"then\""
expect armfortas.ast contains "node: IfConstruct"
expect armfortas.sema contains "diagnostics: none"
end

Graph cases use entry plus ordered file lines:

suite "v2/generic-graphs"

case "module_chain_frontend"
entry "../../fixtures/modules/module_chain/main.f90"
file "../../fixtures/modules/module_chain/math_seed.f90"
file "../../fixtures/modules/module_chain/math_values.f90"
file "../../fixtures/modules/module_chain/main.f90"
compiler armfortas => armfortas.ast, armfortas.sema
expect armfortas.ast contains "name: \"math_seed\""
expect armfortas.sema contains "local_name: \"doubled\""
end

Today the armfortas adapter materializes graph cases into one generated source in declared file order before capture/compile. The authored files still stay in the failure bundle.

Common things the runner understands:

  • stage capture like armfortas => tokens, ir, asm, obj, run
  • generic compiler capture like compiler gfortran => asm, obj, runtime or compiler "/path/to/compiler" => asm, obj, runtime
  • suite-v2 generic compiler cases can also use opt matrices, differential => ..., and CLI-style reproducibility checks
  • check-comments on armfortas.ir / ir uses ! IR_CHECK: and ! IR_NOT:
  • expect-fail comments uses inline ! ERROR_EXPECTED: source comments
  • xfail comments uses the first inline ! XFAIL: source comment
  • generic compare cases like compare gfortran flang-new => asm, including opt matrices
  • capability-aware authored softening with future capability "..." and xfail capability "..."
  • suite-v2 graph cases with entry plus ordered file lines
  • opt matrices like opts => O0, O1, O2
  • references like differential => gfortran, flang-new
  • expected failures like xfail "reason"
  • per-opt status like xfail when O1, O2 because "reason"
  • consistency checks like cli_obj_vs_system_as and capture_run_reproducible
  • report outputs like --json-report path/to/report.json and --markdown-report path/to/report.md
  • environment and adapter inspection with doctor
  • direct one-shot compare with compare
  • direct one-shot artifact/stage inspection with introspect

Notes

  • .docs/ is local and gitignored.
  • bencch is now the public CLI story; afs-tests remains as a compatibility alias.
  • The product is now centered on compare, introspect, run, and doctor.
  • The runner is currently strongest on stage capture, differential behavior, and consistency work around reproducibility and cross-path mismatches.
View source
1 # bencch
2
3 Generic compiler bench, with `armfortas` as the first rich adapter.
4
5 This repo holds:
6
7 - `bench-core/` — bench-owned compiler-facing types
8 - `bench/` — the `bencch` / `afs-tests` runner
9 - `suites/` — authored bench suites
10 - `fixtures/` — reusable fixture programs
11 - `reports/` — failure and consistency bundles
12
13 ## Current Setup
14
15 `bencch` now has its own workspace manifest and public CLI.
16
17 CLI-side compiler and tool paths are overridable. Rich linked `armfortas`
18 capture still needs an `armfortas` checkout, but Sprint 13 now gives that a
19 real bootstrap path instead of assuming `bencch` is embedded as a submodule.
20
21 Embedded usage still works:
22
23 ```bash
24 cargo run -p afs-tests --bin bencch -- list
25 cargo run -p afs-tests --bin bencch -- run --suite frontend
26 ```
27
28 Standalone linked usage now works through a generated local workspace:
29
30 ```bash
31 scripts/bootstrap-linked-armfortas.sh /path/to/armfortas
32 cargo run --manifest-path .bencch-local/Cargo.toml -p afs-tests --bin bencch -- doctor
33 ```
34
35 That generated path keeps linked capture working and makes `doctor` report the
36 actual linked `armfortas` checkout instead of assuming `bencch` is embedded.
37
38 Standalone external-only usage now works through a second generated workspace:
39
40 ```bash
41 scripts/bootstrap-standalone-external.sh
42 cargo run --manifest-path .bencch-external/Cargo.toml -p afs-tests --bin bencch -- doctor
43 ```
44
45 That mode drops linked capture entirely and keeps the generic external-driver
46 surface available for `compare`, `introspect`, and external-facing `run` work.
47
48 Example external-only introspection:
49
50 ```bash
51 cargo run --manifest-path .bencch-external/Cargo.toml -p afs-tests --bin bencch -- introspect fixtures/fake_compilers/match_42_a.sh fixtures/runtime/if_else.f90 --artifact asm,runtime
52 ```
53
54 Example external-only authored suite run:
55
56 ```bash
57 cargo run --manifest-path .bencch-external/Cargo.toml -p afs-tests --bin bencch -- run --suite v2/generic-introspect --case fake_compiler_runtime --all
58 ```
59
60 Example external-only authored compare matrix:
61
62 ```bash
63 cargo run --manifest-path .bencch-external/Cargo.toml -p afs-tests --bin bencch -- run --suite v2/generic-compare --case fake_compilers_match_matrix --all
64 ```
65
66 Example external-only authored differential matrix:
67
68 ```bash
69 cargo run --manifest-path .bencch-external/Cargo.toml -p afs-tests --bin bencch -- run --suite v2/generic-differential --all
70 ```
71
72 Example external-only authored consistency matrix:
73
74 ```bash
75 cargo run --manifest-path .bencch-external/Cargo.toml -p afs-tests --bin bencch -- run --suite v2/generic-consistency --all
76 ```
77
78 Example external-only authored failure matrix:
79
80 ```bash
81 cargo run --manifest-path .bencch-external/Cargo.toml -p afs-tests --bin bencch -- run --suite v2/generic-failure-matrix --case fake_compiler_expected_diagnostic_matrix --all
82 ```
83
84 Legacy rich-stage suites still need linked capture. In an external-only build,
85 they now fail early with a direct message telling you to use
86 `scripts/bootstrap-linked-armfortas.sh`.
87
88 ## Usage
89
90 List suites:
91
92 ```bash
93 cargo run -p afs-tests --bin bencch -- list
94 ```
95
96 List suites with case-level capability discovery:
97
98 ```bash
99 cargo run -p afs-tests --bin bencch -- list --suite v2/generic --verbose
100 ```
101
102 Run one suite family:
103
104 ```bash
105 cargo run -p afs-tests --bin bencch -- run --suite consistency/runtime
106 ```
107
108 Inspect the current embedded/standalone posture:
109
110 ```bash
111 cargo run -p afs-tests --bin bencch -- doctor
112 ```
113
114 `doctor` now also lists the generic artifacts and namespaced adapter extras
115 that each named compiler surface can provide in the current build.
116
117 The built-in named compiler set now includes `armfortas`, `gfortran`,
118 `flang-new`, `lfortran`, `ifort`, `ifx`, and `nvfortran` / `pgfortran`, plus
119 any explicit compiler path you pass to `compare` or `introspect`.
120
121 It also probes each named compiler surface a bit more deeply now:
122
123 - `probe_status` like `linked`, `invokable`, or `missing`
124 - `probe_resolved_path`
125 - `probe_banner` when a version/help probe returns something useful
126
127 Write the same `doctor` snapshot to JSON and Markdown:
128
129 ```bash
130 cargo run -p afs-tests --bin bencch -- doctor --json-report reports/doctor.json --markdown-report reports/doctor.md
131 ```
132
133 The JSON report now includes structured sections for workspace, named compiler
134 surfaces, tools, and mode, while keeping the flat field map too.
135
136 `list --verbose` now echoes the same probe posture for suite-v2 generic cases,
137 so capability-blocked authored cases show both:
138
139 - why the request is blocked or deferred
140 - what compiler binary or linked surface would have been used
141
142 Generate a local linked workspace against an external `armfortas` checkout:
143
144 ```bash
145 scripts/bootstrap-linked-armfortas.sh /path/to/armfortas
146 ```
147
148 Then run `bencch` through that generated workspace:
149
150 ```bash
151 cargo run --manifest-path .bencch-local/Cargo.toml -p afs-tests --bin bencch -- list
152 ```
153
154 Compare two compilers on one program:
155
156 ```bash
157 cargo run -p afs-tests --bin bencch -- compare armfortas gfortran --program fixtures/runtime/mixed_types.f90
158 ```
159
160 Compare named compilers with an explicit armfortas binary:
161
162 ```bash
163 cargo run -p afs-tests --bin bencch -- compare armfortas gfortran --program fixtures/runtime/if_else.f90 --armfortas-bin ../target/debug/armfortas
164 ```
165
166 The same compare surface works across opt levels too:
167
168 ```bash
169 cargo run -p afs-tests --bin bencch -- compare armfortas gfortran --opt O2 --program fixtures/runtime/mixed_types.f90 --armfortas-bin ../target/debug/armfortas
170 ```
171
172 Compare with an extra artifact diff:
173
174 ```bash
175 cargo run -p afs-tests --bin bencch -- compare armfortas gfortran --program fixtures/runtime/mixed_types.f90 --artifact asm
176 ```
177
178 Compare two explicit compiler binaries:
179
180 ```bash
181 cargo run -p afs-tests --bin bencch -- compare /path/to/one /path/to/other --program fixtures/runtime/mixed_types.f90 --artifact asm,obj
182 ```
183
184 Namespaced adapter artifacts are allowed in `compare` too, but only when both
185 compiler surfaces can actually provide them. If not, `bencch` fails early with
186 an explicit capability message.
187
188 Introspect one compiler on one program:
189
190 ```bash
191 cargo run -p afs-tests --bin bencch -- introspect armfortas fixtures/runtime/mixed_types.f90
192 ```
193
194 Introspect a rich armfortas stage explicitly:
195
196 ```bash
197 cargo run -p afs-tests --bin bencch -- introspect armfortas fixtures/runtime/mixed_types.f90 --artifact armfortas.ir,asm
198 ```
199
200 Introspect the full linked armfortas stage surface:
201
202 ```bash
203 cargo run -p afs-tests --bin bencch -- introspect armfortas fixtures/runtime/mixed_types.f90 --all
204 ```
205
206 Trim large introspection sections to a readable preview:
207
208 ```bash
209 cargo run -p afs-tests --bin bencch -- introspect armfortas fixtures/runtime/if_else.f90 --all --max-artifact-lines 12
210 ```
211
212 Keep only section summaries and omit artifact bodies:
213
214 ```bash
215 cargo run -p afs-tests --bin bencch -- introspect armfortas fixtures/runtime/if_else.f90 --all --summary-only
216 ```
217
218 Introspect a named external compiler on the generic surface:
219
220 ```bash
221 cargo run -p afs-tests --bin bencch -- introspect gfortran fixtures/runtime/if_else.f90 --artifact asm,obj,runtime
222 ```
223
224 If you request artifacts that a compiler surface cannot provide, `bencch`
225 fails early with a capability message instead of pretending the compiler
226 failed mid-pipeline.
227
228 Introspect an explicit compiler path on that same generic surface:
229
230 ```bash
231 cargo run -p afs-tests --bin bencch -- introspect /path/to/compiler fixtures/runtime/if_else.f90 --artifact asm,obj,runtime
232 ```
233
234 Introspect a failing armfortas source and keep the partial capture:
235
236 ```bash
237 cargo run -p afs-tests --bin bencch -- introspect armfortas fixtures/invalid/parse_error.f90 --artifact armfortas.tokens,armfortas.ir,asm
238 ```
239
240 Run against an explicit compiler binary:
241
242 ```bash
243 cargo run -p afs-tests --bin bencch -- run --suite consistency/runtime-control-flow --armfortas-bin ./target/debug/armfortas
244 ```
245
246 Run an asm/object surface through an explicit compiler binary:
247
248 ```bash
249 cargo run -p afs-tests --bin bencch -- run --suite backend/asm --case runtime_wrapper_and_calls --armfortas-bin ./target/debug/armfortas
250 ```
251
252 Run differential checks with explicit reference compiler paths:
253
254 ```bash
255 cargo run -p afs-tests --bin bencch -- run --suite differential/runtime-control-flow --gfortran-bin /opt/homebrew/bin/gfortran --flang-bin /opt/homebrew/bin/flang-new
256 ```
257
258 Run one case with full stage capture:
259
260 ```bash
261 cargo run -p afs-tests --bin bencch -- run --suite frontend --case stage_walk --all --verbose
262 ```
263
264 Write machine-readable reports:
265
266 ```bash
267 cargo run -p afs-tests --bin bencch -- run --suite modules --all --json-report reports/modules.json --markdown-report reports/modules.md
268 ```
269
270 Run consistency coverage:
271
272 ```bash
273 cargo run -p afs-tests --bin bencch -- run --suite consistency --all
274 ```
275
276 Run differential coverage:
277
278 ```bash
279 cargo run -p afs-tests --bin bencch -- run --suite differential
280 ```
281
282 Reports are written under `reports/`.
283
284 `compare` now prints a short summary block with status, divergence
285 classification, basis, difference count, changed artifacts, and the backend
286 used on each side before any per-artifact diffs.
287
288 `introspect` now groups portable outputs like `asm`, `obj`, and `runtime`
289 separately from adapter extras like `armfortas.ir` and `armfortas.tokens` in
290 text, JSON, and Markdown output, and it now reports requested, captured, and
291 missing artifacts at the top of the report. Failure-side introspection also
292 surfaces the failure stage when the adapter knows it, plus a short diagnostic
293 excerpt before the full diagnostics block. For large captures, `--summary-only`
294 and `--max-artifact-lines <n>` keep the text and Markdown surfaces readable.
295 JSON reports keep the full artifact bodies and now add compact
296 `artifact_summaries` alongside them for quick scanning.
297
298 Environment overrides work too:
299
300 ```bash
301 BENCCH_ARMFORTAS_BIN=./target/debug/armfortas cargo run -p afs-tests --bin bencch -- run --suite consistency/object
302 ```
303
304 Backend choice is visible in:
305
306 - `cargo run -p afs-tests --bin bencch -- doctor`
307 - `--verbose` case runs
308 - JSON and Markdown reports as `primary_backend`
309 - bundle `metadata.txt` and `armfortas/metadata.txt`
310
311 ## Suite Format
312
313 Suites are plain text files under `suites/`.
314
315 ```text
316 suite "consistency/runtime"
317
318 case "mixed_types_cli_run_reproducible"
319 source "../../fixtures/runtime/mixed_types.f90"
320 opts => all
321 armfortas => run
322 repeat => 3
323 consistency => cli_run_reproducible
324 expect run.stdout check-comments
325 expect run.exit_code equals 0
326 end
327 ```
328
329 The new suite-v2 generic surface can target any compiler spec the same way
330 `bencch introspect` does:
331
332 ```text
333 suite "v2/generic-introspect"
334
335 case "fake_compiler_runtime_matrix"
336 source "../../fixtures/runtime/if_else.f90"
337 opts => O0, O1, O2
338 compiler "../../fixtures/fake_compilers/match_42_a.sh" => asm, runtime
339 expect asm contains ".globl _main"
340 expect run.stdout contains "42"
341 expect run.exit_code equals 0
342 end
343 ```
344
345 Generic compiler cases can also lean on references and CLI-style
346 reproducibility checks:
347
348 ```text
349 suite "v2/generic-differential"
350
351 case "gfortran_runtime_matrix"
352 source "../../fixtures/runtime/if_else.f90"
353 opts => O0, O1, O2
354 compiler gfortran => runtime
355 differential => flang-new
356 expect run.stdout check-comments
357 expect run.exit_code equals 0
358 end
359 ```
360
361 ```text
362 suite "v2/generic-consistency"
363
364 case "fake_compiler_runtime_matrix"
365 source "../../fixtures/runtime/if_else.f90"
366 opts => O0, O1, O2
367 repeat => 3
368 compiler "../../fixtures/fake_compilers/match_42_a.sh" => asm, runtime
369 consistency => cli_asm_reproducible, cli_run_reproducible
370 expect asm contains ".globl _main"
371 expect run.stdout contains "42"
372 expect run.exit_code equals 0
373 end
374 ```
375
376 For mem2reg-branch compatibility, `check-comments` on `armfortas.ir` understands
377 inline `! IR_CHECK:` and `! IR_NOT:` annotations, while `run.stdout
378 check-comments` keeps using the usual `! CHECK:` lines.
379
380 Two more opt-in bridges exist for imported mem2reg-style audits:
381
382 - `expect-fail comments` reads `! ERROR_EXPECTED:` lines from the source
383 - `xfail comments` reads the first `! XFAIL:` line from the source
384
385 Those compose the same way the old mem2reg harness did: a case can keep a
386 source-owned expected diagnostic and still remain `xfail` until trunk starts
387 producing that diagnostic correctly.
388
389 Suite-v2 can also drive the generic compare engine:
390
391 ```text
392 suite "v2/generic-compare"
393
394 case "fake_compilers_match_matrix"
395 source "../../fixtures/runtime/if_else.f90"
396 opts => O0, O1, O2
397 compare "../../fixtures/fake_compilers/match_42_a.sh" "../../fixtures/fake_compilers/match_42_b.sh" => asm
398 expect compare.status equals "match"
399 expect compare.classification equals "match"
400 expect compare.difference_count equals 0
401 end
402 ```
403
404 Suite-v2 unhappy paths can use the same generic engine too:
405
406 ```text
407 suite "v2/generic-failures"
408
409 case "fake_compiler_expected_diagnostic"
410 source "../../fixtures/invalid/fake_compile_fail_expected.f90"
411 compiler "../../fixtures/fake_compilers/compile_fail.sh" => diagnostics
412 expect-fail comments
413 end
414
415 case "armfortas_parse_error"
416 source "../../fixtures/invalid/parse_error.f90"
417 compiler armfortas => diagnostics
418 expect-fail parser contains "expected entity name"
419 end
420 ```
421
422 And they can be matrixed the same way as the happy-path suites:
423
424 ```text
425 suite "v2/generic-failure-matrix"
426
427 case "fake_compilers_compile_divergence_matrix"
428 source "../../fixtures/runtime/if_else.f90"
429 opts => O0, O1, O2
430 compare "../../fixtures/fake_compilers/compile_fail.sh" "../../fixtures/fake_compilers/match_42_a.sh" => diagnostics
431 expect compare.status equals "diff"
432 expect compare.classification equals "compile divergence"
433 expect compare.difference_count equals 2
434 end
435 ```
436
437 If a suite-v2 case is intentionally blocked by adapter capability limits, you
438 can say so directly:
439
440 ```text
441 suite "v2/capability-policy"
442
443 case "gfortran_armfortas_ir_future"
444 source "../../fixtures/runtime/if_else.f90"
445 compiler gfortran => armfortas.ir
446 future capability "generic gfortran surface has no armfortas extras"
447 end
448
449 case "mixed_surface_ir_compare_xfail"
450 source "../../fixtures/runtime/if_else.f90"
451 compare armfortas gfortran => armfortas.ir
452 xfail capability "mixed-surface namespaced compare stays soft for now"
453 end
454 ```
455
456 Namespaced armfortas artifacts can be matrixed too:
457
458 ```text
459 suite "v2/armfortas-namespace-matrix"
460
461 case "if_else_frontend_matrix"
462 source "../../fixtures/runtime/if_else.f90"
463 opts => O0, O1, O2
464 compiler armfortas => armfortas.tokens, armfortas.ast, armfortas.sema
465 expect armfortas.tokens contains "\"then\""
466 expect armfortas.ast contains "node: IfConstruct"
467 expect armfortas.sema contains "diagnostics: none"
468 end
469 ```
470
471 Graph cases use `entry` plus ordered `file` lines:
472
473 ```text
474 suite "v2/generic-graphs"
475
476 case "module_chain_frontend"
477 entry "../../fixtures/modules/module_chain/main.f90"
478 file "../../fixtures/modules/module_chain/math_seed.f90"
479 file "../../fixtures/modules/module_chain/math_values.f90"
480 file "../../fixtures/modules/module_chain/main.f90"
481 compiler armfortas => armfortas.ast, armfortas.sema
482 expect armfortas.ast contains "name: \"math_seed\""
483 expect armfortas.sema contains "local_name: \"doubled\""
484 end
485 ```
486
487 Today the armfortas adapter materializes graph cases into one generated source
488 in declared file order before capture/compile. The authored files still stay in
489 the failure bundle.
490
491 Common things the runner understands:
492
493 - stage capture like `armfortas => tokens, ir, asm, obj, run`
494 - generic compiler capture like `compiler gfortran => asm, obj, runtime` or `compiler "/path/to/compiler" => asm, obj, runtime`
495 - suite-v2 generic compiler cases can also use opt matrices, `differential => ...`, and CLI-style reproducibility checks
496 - `check-comments` on `armfortas.ir` / `ir` uses `! IR_CHECK:` and `! IR_NOT:`
497 - `expect-fail comments` uses inline `! ERROR_EXPECTED:` source comments
498 - `xfail comments` uses the first inline `! XFAIL:` source comment
499 - generic compare cases like `compare gfortran flang-new => asm`, including opt matrices
500 - capability-aware authored softening with `future capability "..."` and `xfail capability "..."`
501 - suite-v2 graph cases with `entry` plus ordered `file` lines
502 - opt matrices like `opts => O0, O1, O2`
503 - references like `differential => gfortran, flang-new`
504 - expected failures like `xfail "reason"`
505 - per-opt status like `xfail when O1, O2 because "reason"`
506 - consistency checks like `cli_obj_vs_system_as` and `capture_run_reproducible`
507 - report outputs like `--json-report path/to/report.json` and `--markdown-report path/to/report.md`
508 - environment and adapter inspection with `doctor`
509 - direct one-shot compare with `compare`
510 - direct one-shot artifact/stage inspection with `introspect`
511
512 ## Notes
513
514 - `.docs/` is local and gitignored.
515 - `bencch` is now the public CLI story; `afs-tests` remains as a compatibility
516 alias.
517 - The product is now centered on `compare`, `introspect`, `run`, and `doctor`.
518 - The runner is currently strongest on stage capture, differential behavior,
519 and consistency work around reproducibility and cross-path mismatches.