bencch Public

Watch 0 Fork 0 Star 0

markdown · 16954 bytes Raw Blame History

bencch

Generic compiler bench, with armfortas as the first rich adapter.

This repo holds:

bench-core/ — bench-owned compiler-facing types
bench/ — the bencch / afs-tests runner
suites/ — authored bench suites
fixtures/ — reusable fixture programs
reports/ — failure and consistency bundles

Current Setup

bencch now has its own workspace manifest and public CLI.

CLI-side compiler and tool paths are overridable. Rich linked armfortas capture still needs an armfortas checkout, but Sprint 13 now gives that a real bootstrap path instead of assuming bencch is embedded as a submodule.

Embedded usage still works:

cargo run -p afs-tests --bin bencch -- list
cargo run -p afs-tests --bin bencch -- run --suite frontend

Standalone linked usage now works through a generated local workspace:

scripts/bootstrap-linked-armfortas.sh /path/to/armfortas
cargo run --manifest-path .bencch-local/Cargo.toml -p afs-tests --bin bencch -- doctor

That generated path keeps linked capture working and makes doctor report the actual linked armfortas checkout instead of assuming bencch is embedded.

Standalone external-only usage now works through a second generated workspace:

scripts/bootstrap-standalone-external.sh
cargo run --manifest-path .bencch-external/Cargo.toml -p afs-tests --bin bencch -- doctor

That mode drops linked capture entirely and keeps the generic external-driver surface available for compare, introspect, and external-facing run work.

Example external-only introspection:

cargo run --manifest-path .bencch-external/Cargo.toml -p afs-tests --bin bencch -- introspect fixtures/fake_compilers/match_42_a.sh fixtures/runtime/if_else.f90 --artifact asm,runtime

Example external-only authored suite run:

cargo run --manifest-path .bencch-external/Cargo.toml -p afs-tests --bin bencch -- run --suite v2/generic-introspect --case fake_compiler_runtime --all

Example external-only authored compare matrix:

cargo run --manifest-path .bencch-external/Cargo.toml -p afs-tests --bin bencch -- run --suite v2/generic-compare --case fake_compilers_match_matrix --all

Example external-only authored differential matrix:

cargo run --manifest-path .bencch-external/Cargo.toml -p afs-tests --bin bencch -- run --suite v2/generic-differential --all

Example external-only authored consistency matrix:

cargo run --manifest-path .bencch-external/Cargo.toml -p afs-tests --bin bencch -- run --suite v2/generic-consistency --all

Example external-only authored failure matrix:

cargo run --manifest-path .bencch-external/Cargo.toml -p afs-tests --bin bencch -- run --suite v2/generic-failure-matrix --case fake_compiler_expected_diagnostic_matrix --all

Legacy rich-stage suites still need linked capture. In an external-only build, they now fail early with a direct message telling you to use scripts/bootstrap-linked-armfortas.sh.

Usage

List suites:

cargo run -p afs-tests --bin bencch -- list

List suites with case-level capability discovery:

cargo run -p afs-tests --bin bencch -- list --suite v2/generic --verbose

Run one suite family:

cargo run -p afs-tests --bin bencch -- run --suite consistency/runtime

Inspect the current embedded/standalone posture:

cargo run -p afs-tests --bin bencch -- doctor

doctor now also lists the generic artifacts and namespaced adapter extras that each named compiler surface can provide in the current build.

The built-in named compiler set now includes armfortas, gfortran, flang-new, lfortran, ifort, ifx, and nvfortran / pgfortran, plus any explicit compiler path you pass to compare or introspect.

It also probes each named compiler surface a bit more deeply now:

probe_status like linked, invokable, or missing
probe_resolved_path
probe_banner when a version/help probe returns something useful

Write the same doctor snapshot to JSON and Markdown:

cargo run -p afs-tests --bin bencch -- doctor --json-report reports/doctor.json --markdown-report reports/doctor.md

The JSON report now includes structured sections for workspace, named compiler surfaces, tools, and mode, while keeping the flat field map too.

list --verbose now echoes the same probe posture for suite-v2 generic cases, so capability-blocked authored cases show both:

why the request is blocked or deferred
what compiler binary or linked surface would have been used

Generate a local linked workspace against an external armfortas checkout:

scripts/bootstrap-linked-armfortas.sh /path/to/armfortas

Then run bencch through that generated workspace:

cargo run --manifest-path .bencch-local/Cargo.toml -p afs-tests --bin bencch -- list

Compare two compilers on one program:

cargo run -p afs-tests --bin bencch -- compare armfortas gfortran --program fixtures/runtime/mixed_types.f90

Compare named compilers with an explicit armfortas binary:

cargo run -p afs-tests --bin bencch -- compare armfortas gfortran --program fixtures/runtime/if_else.f90 --armfortas-bin ../target/debug/armfortas

The same compare surface works across opt levels too:

cargo run -p afs-tests --bin bencch -- compare armfortas gfortran --opt O2 --program fixtures/runtime/mixed_types.f90 --armfortas-bin ../target/debug/armfortas

Compare with an extra artifact diff:

cargo run -p afs-tests --bin bencch -- compare armfortas gfortran --program fixtures/runtime/mixed_types.f90 --artifact asm

Compare two explicit compiler binaries:

cargo run -p afs-tests --bin bencch -- compare /path/to/one /path/to/other --program fixtures/runtime/mixed_types.f90 --artifact asm,obj

Namespaced adapter artifacts are allowed in compare too, but only when both compiler surfaces can actually provide them. If not, bencch fails early with an explicit capability message.

Introspect one compiler on one program:

cargo run -p afs-tests --bin bencch -- introspect armfortas fixtures/runtime/mixed_types.f90

Introspect a rich armfortas stage explicitly:

cargo run -p afs-tests --bin bencch -- introspect armfortas fixtures/runtime/mixed_types.f90 --artifact armfortas.ir,asm

Introspect the full linked armfortas stage surface:

cargo run -p afs-tests --bin bencch -- introspect armfortas fixtures/runtime/mixed_types.f90 --all

Trim large introspection sections to a readable preview:

cargo run -p afs-tests --bin bencch -- introspect armfortas fixtures/runtime/if_else.f90 --all --max-artifact-lines 12

Keep only section summaries and omit artifact bodies:

cargo run -p afs-tests --bin bencch -- introspect armfortas fixtures/runtime/if_else.f90 --all --summary-only

Introspect a named external compiler on the generic surface:

cargo run -p afs-tests --bin bencch -- introspect gfortran fixtures/runtime/if_else.f90 --artifact asm,obj,runtime

If you request artifacts that a compiler surface cannot provide, bencch fails early with a capability message instead of pretending the compiler failed mid-pipeline.

Introspect an explicit compiler path on that same generic surface:

cargo run -p afs-tests --bin bencch -- introspect /path/to/compiler fixtures/runtime/if_else.f90 --artifact asm,obj,runtime

Introspect a failing armfortas source and keep the partial capture:

cargo run -p afs-tests --bin bencch -- introspect armfortas fixtures/invalid/parse_error.f90 --artifact armfortas.tokens,armfortas.ir,asm

Run against an explicit compiler binary:

cargo run -p afs-tests --bin bencch -- run --suite consistency/runtime-control-flow --armfortas-bin ./target/debug/armfortas

Run an asm/object surface through an explicit compiler binary:

cargo run -p afs-tests --bin bencch -- run --suite backend/asm --case runtime_wrapper_and_calls --armfortas-bin ./target/debug/armfortas

Run differential checks with explicit reference compiler paths:

cargo run -p afs-tests --bin bencch -- run --suite differential/runtime-control-flow --gfortran-bin /opt/homebrew/bin/gfortran --flang-bin /opt/homebrew/bin/flang-new

Run one case with full stage capture:

cargo run -p afs-tests --bin bencch -- run --suite frontend --case stage_walk --all --verbose

Write machine-readable reports:

cargo run -p afs-tests --bin bencch -- run --suite modules --all --json-report reports/modules.json --markdown-report reports/modules.md

Run consistency coverage:

cargo run -p afs-tests --bin bencch -- run --suite consistency --all

Run differential coverage:

cargo run -p afs-tests --bin bencch -- run --suite differential

Reports are written under reports/.

compare now prints a short summary block with status, divergence classification, basis, difference count, changed artifacts, and the backend used on each side before any per-artifact diffs.

introspect now groups portable outputs like asm, obj, and runtime separately from adapter extras like armfortas.ir and armfortas.tokens in text, JSON, and Markdown output, and it now reports requested, captured, and missing artifacts at the top of the report. Failure-side introspection also surfaces the failure stage when the adapter knows it, plus a short diagnostic excerpt before the full diagnostics block. For large captures, --summary-only and --max-artifact-lines <n> keep the text and Markdown surfaces readable. JSON reports keep the full artifact bodies and now add compact artifact_summaries alongside them for quick scanning.

Environment overrides work too:

BENCCH_ARMFORTAS_BIN=./target/debug/armfortas cargo run -p afs-tests --bin bencch -- run --suite consistency/object

Backend choice is visible in:

cargo run -p afs-tests --bin bencch -- doctor
--verbose case runs
JSON and Markdown reports as primary_backend
bundle metadata.txt and armfortas/metadata.txt

Suite Format

Suites are plain text files under suites/.

suite "consistency/runtime"

case "mixed_types_cli_run_reproducible"
source "../../fixtures/runtime/mixed_types.f90"
opts => all
armfortas => run
repeat => 3
consistency => cli_run_reproducible
expect run.stdout check-comments
expect run.exit_code equals 0
end

The new suite-v2 generic surface can target any compiler spec the same way bencch introspect does:

suite "v2/generic-introspect"

case "fake_compiler_runtime_matrix"
source "../../fixtures/runtime/if_else.f90"
opts => O0, O1, O2
compiler "../../fixtures/fake_compilers/match_42_a.sh" => asm, runtime
expect asm contains ".globl _main"
expect run.stdout contains "42"
expect run.exit_code equals 0
end

Generic compiler cases can also lean on references and CLI-style reproducibility checks:

suite "v2/generic-differential"

case "gfortran_runtime_matrix"
source "../../fixtures/runtime/if_else.f90"
opts => O0, O1, O2
compiler gfortran => runtime
differential => flang-new
expect run.stdout check-comments
expect run.exit_code equals 0
end

suite "v2/generic-consistency"

case "fake_compiler_runtime_matrix"
source "../../fixtures/runtime/if_else.f90"
opts => O0, O1, O2
repeat => 3
compiler "../../fixtures/fake_compilers/match_42_a.sh" => asm, runtime
consistency => cli_asm_reproducible, cli_run_reproducible
expect asm contains ".globl _main"
expect run.stdout contains "42"
expect run.exit_code equals 0
end

For mem2reg-branch compatibility, check-comments on armfortas.ir understands inline ! IR_CHECK: and ! IR_NOT: annotations, while run.stdout check-comments keeps using the usual ! CHECK: lines.

Two more opt-in bridges exist for imported mem2reg-style audits:

expect-fail comments reads ! ERROR_EXPECTED: lines from the source
xfail comments reads the first ! XFAIL: line from the source

Those compose the same way the old mem2reg harness did: a case can keep a source-owned expected diagnostic and still remain xfail until trunk starts producing that diagnostic correctly.

Suite-v2 can also drive the generic compare engine:

suite "v2/generic-compare"

case "fake_compilers_match_matrix"
source "../../fixtures/runtime/if_else.f90"
opts => O0, O1, O2
compare "../../fixtures/fake_compilers/match_42_a.sh" "../../fixtures/fake_compilers/match_42_b.sh" => asm
expect compare.status equals "match"
expect compare.classification equals "match"
expect compare.difference_count equals 0
end

Suite-v2 unhappy paths can use the same generic engine too:

suite "v2/generic-failures"

case "fake_compiler_expected_diagnostic"
source "../../fixtures/invalid/fake_compile_fail_expected.f90"
compiler "../../fixtures/fake_compilers/compile_fail.sh" => diagnostics
expect-fail comments
end

case "armfortas_parse_error"
source "../../fixtures/invalid/parse_error.f90"
compiler armfortas => diagnostics
expect-fail parser contains "expected entity name"
end

And they can be matrixed the same way as the happy-path suites:

suite "v2/generic-failure-matrix"

case "fake_compilers_compile_divergence_matrix"
source "../../fixtures/runtime/if_else.f90"
opts => O0, O1, O2
compare "../../fixtures/fake_compilers/compile_fail.sh" "../../fixtures/fake_compilers/match_42_a.sh" => diagnostics
expect compare.status equals "diff"
expect compare.classification equals "compile divergence"
expect compare.difference_count equals 2
end

If a suite-v2 case is intentionally blocked by adapter capability limits, you can say so directly:

suite "v2/capability-policy"

case "gfortran_armfortas_ir_future"
source "../../fixtures/runtime/if_else.f90"
compiler gfortran => armfortas.ir
future capability "generic gfortran surface has no armfortas extras"
end

case "mixed_surface_ir_compare_xfail"
source "../../fixtures/runtime/if_else.f90"
compare armfortas gfortran => armfortas.ir
xfail capability "mixed-surface namespaced compare stays soft for now"
end

Namespaced armfortas artifacts can be matrixed too:

suite "v2/armfortas-namespace-matrix"

case "if_else_frontend_matrix"
source "../../fixtures/runtime/if_else.f90"
opts => O0, O1, O2
compiler armfortas => armfortas.tokens, armfortas.ast, armfortas.sema
expect armfortas.tokens contains "\"then\""
expect armfortas.ast contains "node: IfConstruct"
expect armfortas.sema contains "diagnostics: none"
end

Graph cases use entry plus ordered file lines:

suite "v2/generic-graphs"

case "module_chain_frontend"
entry "../../fixtures/modules/module_chain/main.f90"
file "../../fixtures/modules/module_chain/math_seed.f90"
file "../../fixtures/modules/module_chain/math_values.f90"
file "../../fixtures/modules/module_chain/main.f90"
compiler armfortas => armfortas.ast, armfortas.sema
expect armfortas.ast contains "name: \"math_seed\""
expect armfortas.sema contains "local_name: \"doubled\""
end

Today the armfortas adapter materializes graph cases into one generated source in declared file order before capture/compile. The authored files still stay in the failure bundle.

Common things the runner understands:

stage capture like armfortas => tokens, ir, asm, obj, run
generic compiler capture like compiler gfortran => asm, obj, runtime or compiler "/path/to/compiler" => asm, obj, runtime
suite-v2 generic compiler cases can also use opt matrices, differential => ..., and CLI-style reproducibility checks
check-comments on armfortas.ir / ir uses ! IR_CHECK: and ! IR_NOT:
expect-fail comments uses inline ! ERROR_EXPECTED: source comments
xfail comments uses the first inline ! XFAIL: source comment
generic compare cases like compare gfortran flang-new => asm, including opt matrices
capability-aware authored softening with future capability "..." and xfail capability "..."
suite-v2 graph cases with entry plus ordered file lines
opt matrices like opts => O0, O1, O2
references like differential => gfortran, flang-new
expected failures like xfail "reason"
per-opt status like xfail when O1, O2 because "reason"
consistency checks like cli_obj_vs_system_as and capture_run_reproducible
report outputs like --json-report path/to/report.json and --markdown-report path/to/report.md
environment and adapter inspection with doctor
direct one-shot compare with compare
direct one-shot artifact/stage inspection with introspect

Notes

.docs/ is local and gitignored.
bencch is now the public CLI story; afs-tests remains as a compatibility alias.
The product is now centered on compare, introspect, run, and doctor.
The runner is currently strongest on stage capture, differential behavior, and consistency work around reproducibility and cross-path mismatches.

View source

  
        1
        # bencch
      
        2
        
        3
        Generic compiler bench, with `armfortas` as the first rich adapter.
      
        4
        
        5
        This repo holds:
      
        6
        
        7
        - `bench-core/` — bench-owned compiler-facing types
      
        8
        - `bench/` — the `bencch` / `afs-tests` runner
      
        9
        - `suites/` — authored bench suites
      
        10
        - `fixtures/` — reusable fixture programs
      
        11
        - `reports/` — failure and consistency bundles
      
        12
        
        13
        ## Current Setup
      
        14
        
        15
        `bencch` now has its own workspace manifest and public CLI.
      
        16
        
        17
        CLI-side compiler and tool paths are overridable. Rich linked `armfortas`
      
        18
        capture still needs an `armfortas` checkout, but Sprint 13 now gives that a
      
        19
        real bootstrap path instead of assuming `bencch` is embedded as a submodule.
      
        20
        
        21
        Embedded usage still works:
      
        22
        
        23
        ```bash
      
        24
        cargo run -p afs-tests --bin bencch -- list
      
        25
        cargo run -p afs-tests --bin bencch -- run --suite frontend
      
        26
        ```
      
        27
        
        28
        Standalone linked usage now works through a generated local workspace:
      
        29
        
        30
        ```bash
      
        31
        scripts/bootstrap-linked-armfortas.sh /path/to/armfortas
      
        32
        cargo run --manifest-path .bencch-local/Cargo.toml -p afs-tests --bin bencch -- doctor
      
        33
        ```
      
        34
        
        35
        That generated path keeps linked capture working and makes `doctor` report the
      
        36
        actual linked `armfortas` checkout instead of assuming `bencch` is embedded.
      
        37
        
        38
        Standalone external-only usage now works through a second generated workspace:
      
        39
        
        40
        ```bash
      
        41
        scripts/bootstrap-standalone-external.sh
      
        42
        cargo run --manifest-path .bencch-external/Cargo.toml -p afs-tests --bin bencch -- doctor
      
        43
        ```
      
        44
        
        45
        That mode drops linked capture entirely and keeps the generic external-driver
      
        46
        surface available for `compare`, `introspect`, and external-facing `run` work.
      
        47
        
        48
        Example external-only introspection:
      
        49
        
        50
        ```bash
      
        51
        cargo run --manifest-path .bencch-external/Cargo.toml -p afs-tests --bin bencch -- introspect fixtures/fake_compilers/match_42_a.sh fixtures/runtime/if_else.f90 --artifact asm,runtime
      
        52
        ```
      
        53
        
        54
        Example external-only authored suite run:
      
        55
        
        56
        ```bash
      
        57
        cargo run --manifest-path .bencch-external/Cargo.toml -p afs-tests --bin bencch -- run --suite v2/generic-introspect --case fake_compiler_runtime --all
      
        58
        ```
      
        59
        
        60
        Example external-only authored compare matrix:
      
        61
        
        62
        ```bash
      
        63
        cargo run --manifest-path .bencch-external/Cargo.toml -p afs-tests --bin bencch -- run --suite v2/generic-compare --case fake_compilers_match_matrix --all
      
        64
        ```
      
        65
        
        66
        Example external-only authored differential matrix:
      
        67
        
        68
        ```bash
      
        69
        cargo run --manifest-path .bencch-external/Cargo.toml -p afs-tests --bin bencch -- run --suite v2/generic-differential --all
      
        70
        ```
      
        71
        
        72
        Example external-only authored consistency matrix:
      
        73
        
        74
        ```bash
      
        75
        cargo run --manifest-path .bencch-external/Cargo.toml -p afs-tests --bin bencch -- run --suite v2/generic-consistency --all
      
        76
        ```
      
        77
        
        78
        Example external-only authored failure matrix:
      
        79
        
        80
        ```bash
      
        81
        cargo run --manifest-path .bencch-external/Cargo.toml -p afs-tests --bin bencch -- run --suite v2/generic-failure-matrix --case fake_compiler_expected_diagnostic_matrix --all
      
        82
        ```
      
        83
        
        84
        Legacy rich-stage suites still need linked capture. In an external-only build,
      
        85
        they now fail early with a direct message telling you to use
      
        86
        `scripts/bootstrap-linked-armfortas.sh`.
      
        87
        
        88
        ## Usage
      
        89
        
        90
        List suites:
      
        91
        
        92
        ```bash
      
        93
        cargo run -p afs-tests --bin bencch -- list
      
        94
        ```
      
        95
        
        96
        List suites with case-level capability discovery:
      
        97
        
        98
        ```bash
      
        99
        cargo run -p afs-tests --bin bencch -- list --suite v2/generic --verbose
      
        100
        ```
      
        101
        
        102
        Run one suite family:
      
        103
        
        104
        ```bash
      
        105
        cargo run -p afs-tests --bin bencch -- run --suite consistency/runtime
      
        106
        ```
      
        107
        
        108
        Inspect the current embedded/standalone posture:
      
        109
        
        110
        ```bash
      
        111
        cargo run -p afs-tests --bin bencch -- doctor
      
        112
        ```
      
        113
        
        114
        `doctor` now also lists the generic artifacts and namespaced adapter extras
      
        115
        that each named compiler surface can provide in the current build.
      
        116
        
        117
        The built-in named compiler set now includes `armfortas`, `gfortran`,
      
        118
        `flang-new`, `lfortran`, `ifort`, `ifx`, and `nvfortran` / `pgfortran`, plus
      
        119
        any explicit compiler path you pass to `compare` or `introspect`.
      
        120
        
        121
        It also probes each named compiler surface a bit more deeply now:
      
        122
        
        123
        - `probe_status` like `linked`, `invokable`, or `missing`
      
        124
        - `probe_resolved_path`
      
        125
        - `probe_banner` when a version/help probe returns something useful
      
        126
        
        127
        Write the same `doctor` snapshot to JSON and Markdown:
      
        128
        
        129
        ```bash
      
        130
        cargo run -p afs-tests --bin bencch -- doctor --json-report reports/doctor.json --markdown-report reports/doctor.md
      
        131
        ```
      
        132
        
        133
        The JSON report now includes structured sections for workspace, named compiler
      
        134
        surfaces, tools, and mode, while keeping the flat field map too.
      
        135
        
        136
        `list --verbose` now echoes the same probe posture for suite-v2 generic cases,
      
        137
        so capability-blocked authored cases show both:
      
        138
        
        139
        - why the request is blocked or deferred
      
        140
        - what compiler binary or linked surface would have been used
      
        141
        
        142
        Generate a local linked workspace against an external `armfortas` checkout:
      
        143
        
        144
        ```bash
      
        145
        scripts/bootstrap-linked-armfortas.sh /path/to/armfortas
      
        146
        ```
      
        147
        
        148
        Then run `bencch` through that generated workspace:
      
        149
        
        150
        ```bash
      
        151
        cargo run --manifest-path .bencch-local/Cargo.toml -p afs-tests --bin bencch -- list
      
        152
        ```
      
        153
        
        154
        Compare two compilers on one program:
      
        155
        
        156
        ```bash
      
        157
        cargo run -p afs-tests --bin bencch -- compare armfortas gfortran --program fixtures/runtime/mixed_types.f90
      
        158
        ```
      
        159
        
        160
        Compare named compilers with an explicit armfortas binary:
      
        161
        
        162
        ```bash
      
        163
        cargo run -p afs-tests --bin bencch -- compare armfortas gfortran --program fixtures/runtime/if_else.f90 --armfortas-bin ../target/debug/armfortas
      
        164
        ```
      
        165
        
        166
        The same compare surface works across opt levels too:
      
        167
        
        168
        ```bash
      
        169
        cargo run -p afs-tests --bin bencch -- compare armfortas gfortran --opt O2 --program fixtures/runtime/mixed_types.f90 --armfortas-bin ../target/debug/armfortas
      
        170
        ```
      
        171
        
        172
        Compare with an extra artifact diff:
      
        173
        
        174
        ```bash
      
        175
        cargo run -p afs-tests --bin bencch -- compare armfortas gfortran --program fixtures/runtime/mixed_types.f90 --artifact asm
      
        176
        ```
      
        177
        
        178
        Compare two explicit compiler binaries:
      
        179
        
        180
        ```bash
      
        181
        cargo run -p afs-tests --bin bencch -- compare /path/to/one /path/to/other --program fixtures/runtime/mixed_types.f90 --artifact asm,obj
      
        182
        ```
      
        183
        
        184
        Namespaced adapter artifacts are allowed in `compare` too, but only when both
      
        185
        compiler surfaces can actually provide them. If not, `bencch` fails early with
      
        186
        an explicit capability message.
      
        187
        
        188
        Introspect one compiler on one program:
      
        189
        
        190
        ```bash
      
        191
        cargo run -p afs-tests --bin bencch -- introspect armfortas fixtures/runtime/mixed_types.f90
      
        192
        ```
      
        193
        
        194
        Introspect a rich armfortas stage explicitly:
      
        195
        
        196
        ```bash
      
        197
        cargo run -p afs-tests --bin bencch -- introspect armfortas fixtures/runtime/mixed_types.f90 --artifact armfortas.ir,asm
      
        198
        ```
      
        199
        
        200
        Introspect the full linked armfortas stage surface:
      
        201
        
        202
        ```bash
      
        203
        cargo run -p afs-tests --bin bencch -- introspect armfortas fixtures/runtime/mixed_types.f90 --all
      
        204
        ```
      
        205
        
        206
        Trim large introspection sections to a readable preview:
      
        207
        
        208
        ```bash
      
        209
        cargo run -p afs-tests --bin bencch -- introspect armfortas fixtures/runtime/if_else.f90 --all --max-artifact-lines 12
      
        210
        ```
      
        211
        
        212
        Keep only section summaries and omit artifact bodies:
      
        213
        
        214
        ```bash
      
        215
        cargo run -p afs-tests --bin bencch -- introspect armfortas fixtures/runtime/if_else.f90 --all --summary-only
      
        216
        ```
      
        217
        
        218
        Introspect a named external compiler on the generic surface:
      
        219
        
        220
        ```bash
      
        221
        cargo run -p afs-tests --bin bencch -- introspect gfortran fixtures/runtime/if_else.f90 --artifact asm,obj,runtime
      
        222
        ```
      
        223
        
        224
        If you request artifacts that a compiler surface cannot provide, `bencch`
      
        225
        fails early with a capability message instead of pretending the compiler
      
        226
        failed mid-pipeline.
      
        227
        
        228
        Introspect an explicit compiler path on that same generic surface:
      
        229
        
        230
        ```bash
      
        231
        cargo run -p afs-tests --bin bencch -- introspect /path/to/compiler fixtures/runtime/if_else.f90 --artifact asm,obj,runtime
      
        232
        ```
      
        233
        
        234
        Introspect a failing armfortas source and keep the partial capture:
      
        235
        
        236
        ```bash
      
        237
        cargo run -p afs-tests --bin bencch -- introspect armfortas fixtures/invalid/parse_error.f90 --artifact armfortas.tokens,armfortas.ir,asm
      
        238
        ```
      
        239
        
        240
        Run against an explicit compiler binary:
      
        241
        
        242
        ```bash
      
        243
        cargo run -p afs-tests --bin bencch -- run --suite consistency/runtime-control-flow --armfortas-bin ./target/debug/armfortas
      
        244
        ```
      
        245
        
        246
        Run an asm/object surface through an explicit compiler binary:
      
        247
        
        248
        ```bash
      
        249
        cargo run -p afs-tests --bin bencch -- run --suite backend/asm --case runtime_wrapper_and_calls --armfortas-bin ./target/debug/armfortas
      
        250
        ```
      
        251
        
        252
        Run differential checks with explicit reference compiler paths:
      
        253
        
        254
        ```bash
      
        255
        cargo run -p afs-tests --bin bencch -- run --suite differential/runtime-control-flow --gfortran-bin /opt/homebrew/bin/gfortran --flang-bin /opt/homebrew/bin/flang-new
      
        256
        ```
      
        257
        
        258
        Run one case with full stage capture:
      
        259
        
        260
        ```bash
      
        261
        cargo run -p afs-tests --bin bencch -- run --suite frontend --case stage_walk --all --verbose
      
        262
        ```
      
        263
        
        264
        Write machine-readable reports:
      
        265
        
        266
        ```bash
      
        267
        cargo run -p afs-tests --bin bencch -- run --suite modules --all --json-report reports/modules.json --markdown-report reports/modules.md
      
        268
        ```
      
        269
        
        270
        Run consistency coverage:
      
        271
        
        272
        ```bash
      
        273
        cargo run -p afs-tests --bin bencch -- run --suite consistency --all
      
        274
        ```
      
        275
        
        276
        Run differential coverage:
      
        277
        
        278
        ```bash
      
        279
        cargo run -p afs-tests --bin bencch -- run --suite differential
      
        280
        ```
      
        281
        
        282
        Reports are written under `reports/`.
      
        283
        
        284
        `compare` now prints a short summary block with status, divergence
      
        285
        classification, basis, difference count, changed artifacts, and the backend
      
        286
        used on each side before any per-artifact diffs.
      
        287
        
        288
        `introspect` now groups portable outputs like `asm`, `obj`, and `runtime`
      
        289
        separately from adapter extras like `armfortas.ir` and `armfortas.tokens` in
      
        290
        text, JSON, and Markdown output, and it now reports requested, captured, and
      
        291
        missing artifacts at the top of the report. Failure-side introspection also
      
        292
        surfaces the failure stage when the adapter knows it, plus a short diagnostic
      
        293
        excerpt before the full diagnostics block. For large captures, `--summary-only`
      
        294
        and `--max-artifact-lines <n>` keep the text and Markdown surfaces readable.
      
        295
        JSON reports keep the full artifact bodies and now add compact
      
        296
        `artifact_summaries` alongside them for quick scanning.
      
        297
        
        298
        Environment overrides work too:
      
        299
        
        300
        ```bash
      
        301
        BENCCH_ARMFORTAS_BIN=./target/debug/armfortas cargo run -p afs-tests --bin bencch -- run --suite consistency/object
      
        302
        ```
      
        303
        
        304
        Backend choice is visible in:
      
        305
        
        306
        - `cargo run -p afs-tests --bin bencch -- doctor`
      
        307
        - `--verbose` case runs
      
        308
        - JSON and Markdown reports as `primary_backend`
      
        309
        - bundle `metadata.txt` and `armfortas/metadata.txt`
      
        310
        
        311
        ## Suite Format
      
        312
        
        313
        Suites are plain text files under `suites/`.
      
        314
        
        315
        ```text
      
        316
        suite "consistency/runtime"
      
        317
        
        318
        case "mixed_types_cli_run_reproducible"
      
        319
        source "../../fixtures/runtime/mixed_types.f90"
      
        320
        opts => all
      
        321
        armfortas => run
      
        322
        repeat => 3
      
        323
        consistency => cli_run_reproducible
      
        324
        expect run.stdout check-comments
      
        325
        expect run.exit_code equals 0
      
        326
        end
      
        327
        ```
      
        328
        
        329
        The new suite-v2 generic surface can target any compiler spec the same way
      
        330
        `bencch introspect` does:
      
        331
        
        332
        ```text
      
        333
        suite "v2/generic-introspect"
      
        334
        
        335
        case "fake_compiler_runtime_matrix"
      
        336
        source "../../fixtures/runtime/if_else.f90"
      
        337
        opts => O0, O1, O2
      
        338
        compiler "../../fixtures/fake_compilers/match_42_a.sh" => asm, runtime
      
        339
        expect asm contains ".globl _main"
      
        340
        expect run.stdout contains "42"
      
        341
        expect run.exit_code equals 0
      
        342
        end
      
        343
        ```
      
        344
        
        345
        Generic compiler cases can also lean on references and CLI-style
      
        346
        reproducibility checks:
      
        347
        
        348
        ```text
      
        349
        suite "v2/generic-differential"
      
        350
        
        351
        case "gfortran_runtime_matrix"
      
        352
        source "../../fixtures/runtime/if_else.f90"
      
        353
        opts => O0, O1, O2
      
        354
        compiler gfortran => runtime
      
        355
        differential => flang-new
      
        356
        expect run.stdout check-comments
      
        357
        expect run.exit_code equals 0
      
        358
        end
      
        359
        ```
      
        360
        
        361
        ```text
      
        362
        suite "v2/generic-consistency"
      
        363
        
        364
        case "fake_compiler_runtime_matrix"
      
        365
        source "../../fixtures/runtime/if_else.f90"
      
        366
        opts => O0, O1, O2
      
        367
        repeat => 3
      
        368
        compiler "../../fixtures/fake_compilers/match_42_a.sh" => asm, runtime
      
        369
        consistency => cli_asm_reproducible, cli_run_reproducible
      
        370
        expect asm contains ".globl _main"
      
        371
        expect run.stdout contains "42"
      
        372
        expect run.exit_code equals 0
      
        373
        end
      
        374
        ```
      
        375
        
        376
        For mem2reg-branch compatibility, `check-comments` on `armfortas.ir` understands
      
        377
        inline `! IR_CHECK:` and `! IR_NOT:` annotations, while `run.stdout
      
        378
        check-comments` keeps using the usual `! CHECK:` lines.
      
        379
        
        380
        Two more opt-in bridges exist for imported mem2reg-style audits:
      
        381
        
        382
        - `expect-fail comments` reads `! ERROR_EXPECTED:` lines from the source
      
        383
        - `xfail comments` reads the first `! XFAIL:` line from the source
      
        384
        
        385
        Those compose the same way the old mem2reg harness did: a case can keep a
      
        386
        source-owned expected diagnostic and still remain `xfail` until trunk starts
      
        387
        producing that diagnostic correctly.
      
        388
        
        389
        Suite-v2 can also drive the generic compare engine:
      
        390
        
        391
        ```text
      
        392
        suite "v2/generic-compare"
      
        393
        
        394
        case "fake_compilers_match_matrix"
      
        395
        source "../../fixtures/runtime/if_else.f90"
      
        396
        opts => O0, O1, O2
      
        397
        compare "../../fixtures/fake_compilers/match_42_a.sh" "../../fixtures/fake_compilers/match_42_b.sh" => asm
      
        398
        expect compare.status equals "match"
      
        399
        expect compare.classification equals "match"
      
        400
        expect compare.difference_count equals 0
      
        401
        end
      
        402
        ```
      
        403
        
        404
        Suite-v2 unhappy paths can use the same generic engine too:
      
        405
        
        406
        ```text
      
        407
        suite "v2/generic-failures"
      
        408
        
        409
        case "fake_compiler_expected_diagnostic"
      
        410
        source "../../fixtures/invalid/fake_compile_fail_expected.f90"
      
        411
        compiler "../../fixtures/fake_compilers/compile_fail.sh" => diagnostics
      
        412
        expect-fail comments
      
        413
        end
      
        414
        
        415
        case "armfortas_parse_error"
      
        416
        source "../../fixtures/invalid/parse_error.f90"
      
        417
        compiler armfortas => diagnostics
      
        418
        expect-fail parser contains "expected entity name"
      
        419
        end
      
        420
        ```
      
        421
        
        422
        And they can be matrixed the same way as the happy-path suites:
      
        423
        
        424
        ```text
      
        425
        suite "v2/generic-failure-matrix"
      
        426
        
        427
        case "fake_compilers_compile_divergence_matrix"
      
        428
        source "../../fixtures/runtime/if_else.f90"
      
        429
        opts => O0, O1, O2
      
        430
        compare "../../fixtures/fake_compilers/compile_fail.sh" "../../fixtures/fake_compilers/match_42_a.sh" => diagnostics
      
        431
        expect compare.status equals "diff"
      
        432
        expect compare.classification equals "compile divergence"
      
        433
        expect compare.difference_count equals 2
      
        434
        end
      
        435
        ```
      
        436
        
        437
        If a suite-v2 case is intentionally blocked by adapter capability limits, you
      
        438
        can say so directly:
      
        439
        
        440
        ```text
      
        441
        suite "v2/capability-policy"
      
        442
        
        443
        case "gfortran_armfortas_ir_future"
      
        444
        source "../../fixtures/runtime/if_else.f90"
      
        445
        compiler gfortran => armfortas.ir
      
        446
        future capability "generic gfortran surface has no armfortas extras"
      
        447
        end
      
        448
        
        449
        case "mixed_surface_ir_compare_xfail"
      
        450
        source "../../fixtures/runtime/if_else.f90"
      
        451
        compare armfortas gfortran => armfortas.ir
      
        452
        xfail capability "mixed-surface namespaced compare stays soft for now"
      
        453
        end
      
        454
        ```
      
        455
        
        456
        Namespaced armfortas artifacts can be matrixed too:
      
        457
        
        458
        ```text
      
        459
        suite "v2/armfortas-namespace-matrix"
      
        460
        
        461
        case "if_else_frontend_matrix"
      
        462
        source "../../fixtures/runtime/if_else.f90"
      
        463
        opts => O0, O1, O2
      
        464
        compiler armfortas => armfortas.tokens, armfortas.ast, armfortas.sema
      
        465
        expect armfortas.tokens contains "\"then\""
      
        466
        expect armfortas.ast contains "node: IfConstruct"
      
        467
        expect armfortas.sema contains "diagnostics: none"
      
        468
        end
      
        469
        ```
      
        470
        
        471
        Graph cases use `entry` plus ordered `file` lines:
      
        472
        
        473
        ```text
      
        474
        suite "v2/generic-graphs"
      
        475
        
        476
        case "module_chain_frontend"
      
        477
        entry "../../fixtures/modules/module_chain/main.f90"
      
        478
        file "../../fixtures/modules/module_chain/math_seed.f90"
      
        479
        file "../../fixtures/modules/module_chain/math_values.f90"
      
        480
        file "../../fixtures/modules/module_chain/main.f90"
      
        481
        compiler armfortas => armfortas.ast, armfortas.sema
      
        482
        expect armfortas.ast contains "name: \"math_seed\""
      
        483
        expect armfortas.sema contains "local_name: \"doubled\""
      
        484
        end
      
        485
        ```
      
        486
        
        487
        Today the armfortas adapter materializes graph cases into one generated source
      
        488
        in declared file order before capture/compile. The authored files still stay in
      
        489
        the failure bundle.
      
        490
        
        491
        Common things the runner understands:
      
        492
        
        493
        - stage capture like `armfortas => tokens, ir, asm, obj, run`
      
        494
        - generic compiler capture like `compiler gfortran => asm, obj, runtime` or `compiler "/path/to/compiler" => asm, obj, runtime`
      
        495
        - suite-v2 generic compiler cases can also use opt matrices, `differential => ...`, and CLI-style reproducibility checks
      
        496
        - `check-comments` on `armfortas.ir` / `ir` uses `! IR_CHECK:` and `! IR_NOT:`
      
        497
        - `expect-fail comments` uses inline `! ERROR_EXPECTED:` source comments
      
        498
        - `xfail comments` uses the first inline `! XFAIL:` source comment
      
        499
        - generic compare cases like `compare gfortran flang-new => asm`, including opt matrices
      
        500
        - capability-aware authored softening with `future capability "..."` and `xfail capability "..."`
      
        501
        - suite-v2 graph cases with `entry` plus ordered `file` lines
      
        502
        - opt matrices like `opts => O0, O1, O2`
      
        503
        - references like `differential => gfortran, flang-new`
      
        504
        - expected failures like `xfail "reason"`
      
        505
        - per-opt status like `xfail when O1, O2 because "reason"`
      
        506
        - consistency checks like `cli_obj_vs_system_as` and `capture_run_reproducible`
      
        507
        - report outputs like `--json-report path/to/report.json` and `--markdown-report path/to/report.md`
      
        508
        - environment and adapter inspection with `doctor`
      
        509
        - direct one-shot compare with `compare`
      
        510
        - direct one-shot artifact/stage inspection with `introspect`
      
        511
        
        512
        ## Notes
      
        513
        
        514
        - `.docs/` is local and gitignored.
      
        515
        - `bencch` is now the public CLI story; `afs-tests` remains as a compatibility
      
        516
          alias.
      
        517
        - The product is now centered on `compare`, `introspect`, `run`, and `doctor`.
      
        518
        - The runner is currently strongest on stage capture, differential behavior,
      
        519
          and consistency work around reproducibility and cross-path mismatches.

1	# bencch
2
3	Generic compiler bench, with `armfortas` as the first rich adapter.
4
5	This repo holds:
6
7	- `bench-core/` — bench-owned compiler-facing types
8	- `bench/` — the `bencch` / `afs-tests` runner
9	- `suites/` — authored bench suites
10	- `fixtures/` — reusable fixture programs
11	- `reports/` — failure and consistency bundles
12
13	## Current Setup
14
15	`bencch` now has its own workspace manifest and public CLI.
16
17	CLI-side compiler and tool paths are overridable. Rich linked `armfortas`
18	capture still needs an `armfortas` checkout, but Sprint 13 now gives that a
19	real bootstrap path instead of assuming `bencch` is embedded as a submodule.
20
21	Embedded usage still works:
22
23	```bash
24	cargo run -p afs-tests --bin bencch -- list
25	cargo run -p afs-tests --bin bencch -- run --suite frontend
26	```
27
28	Standalone linked usage now works through a generated local workspace:
29
30	```bash
31	scripts/bootstrap-linked-armfortas.sh /path/to/armfortas
32	cargo run --manifest-path .bencch-local/Cargo.toml -p afs-tests --bin bencch -- doctor
33	```
34
35	That generated path keeps linked capture working and makes `doctor` report the
36	actual linked `armfortas` checkout instead of assuming `bencch` is embedded.
37
38	Standalone external-only usage now works through a second generated workspace:
39
40	```bash
41	scripts/bootstrap-standalone-external.sh
42	cargo run --manifest-path .bencch-external/Cargo.toml -p afs-tests --bin bencch -- doctor
43	```
44
45	That mode drops linked capture entirely and keeps the generic external-driver
46	surface available for `compare`, `introspect`, and external-facing `run` work.
47
48	Example external-only introspection:
49
50	```bash
51	cargo run --manifest-path .bencch-external/Cargo.toml -p afs-tests --bin bencch -- introspect fixtures/fake_compilers/match_42_a.sh fixtures/runtime/if_else.f90 --artifact asm,runtime
52	```
53
54	Example external-only authored suite run:
55
56	```bash
57	cargo run --manifest-path .bencch-external/Cargo.toml -p afs-tests --bin bencch -- run --suite v2/generic-introspect --case fake_compiler_runtime --all
58	```
59
60	Example external-only authored compare matrix:
61
62	```bash
63	cargo run --manifest-path .bencch-external/Cargo.toml -p afs-tests --bin bencch -- run --suite v2/generic-compare --case fake_compilers_match_matrix --all
64	```
65
66	Example external-only authored differential matrix:
67
68	```bash
69	cargo run --manifest-path .bencch-external/Cargo.toml -p afs-tests --bin bencch -- run --suite v2/generic-differential --all
70	```
71
72	Example external-only authored consistency matrix:
73
74	```bash
75	cargo run --manifest-path .bencch-external/Cargo.toml -p afs-tests --bin bencch -- run --suite v2/generic-consistency --all
76	```
77
78	Example external-only authored failure matrix:
79
80	```bash
81	cargo run --manifest-path .bencch-external/Cargo.toml -p afs-tests --bin bencch -- run --suite v2/generic-failure-matrix --case fake_compiler_expected_diagnostic_matrix --all
82	```
83
84	Legacy rich-stage suites still need linked capture. In an external-only build,
85	they now fail early with a direct message telling you to use
86	`scripts/bootstrap-linked-armfortas.sh`.
87
88	## Usage
89
90	List suites:
91
92	```bash
93	cargo run -p afs-tests --bin bencch -- list
94	```
95
96	List suites with case-level capability discovery:
97
98	```bash
99	cargo run -p afs-tests --bin bencch -- list --suite v2/generic --verbose
100	```
101
102	Run one suite family:
103
104	```bash
105	cargo run -p afs-tests --bin bencch -- run --suite consistency/runtime
106	```
107
108	Inspect the current embedded/standalone posture:
109
110	```bash
111	cargo run -p afs-tests --bin bencch -- doctor
112	```
113
114	`doctor` now also lists the generic artifacts and namespaced adapter extras
115	that each named compiler surface can provide in the current build.
116
117	The built-in named compiler set now includes `armfortas`, `gfortran`,
118	`flang-new`, `lfortran`, `ifort`, `ifx`, and `nvfortran` / `pgfortran`, plus
119	any explicit compiler path you pass to `compare` or `introspect`.
120
121	It also probes each named compiler surface a bit more deeply now:
122
123	- `probe_status` like `linked`, `invokable`, or `missing`
124	- `probe_resolved_path`
125	- `probe_banner` when a version/help probe returns something useful
126
127	Write the same `doctor` snapshot to JSON and Markdown:
128
129	```bash
130	cargo run -p afs-tests --bin bencch -- doctor --json-report reports/doctor.json --markdown-report reports/doctor.md
131	```
132
133	The JSON report now includes structured sections for workspace, named compiler
134	surfaces, tools, and mode, while keeping the flat field map too.
135
136	`list --verbose` now echoes the same probe posture for suite-v2 generic cases,
137	so capability-blocked authored cases show both:
138
139	- why the request is blocked or deferred
140	- what compiler binary or linked surface would have been used
141
142	Generate a local linked workspace against an external `armfortas` checkout:
143
144	```bash
145	scripts/bootstrap-linked-armfortas.sh /path/to/armfortas
146	```
147
148	Then run `bencch` through that generated workspace:
149
150	```bash
151	cargo run --manifest-path .bencch-local/Cargo.toml -p afs-tests --bin bencch -- list
152	```
153
154	Compare two compilers on one program:
155
156	```bash
157	cargo run -p afs-tests --bin bencch -- compare armfortas gfortran --program fixtures/runtime/mixed_types.f90
158	```
159
160	Compare named compilers with an explicit armfortas binary:
161
162	```bash
163	cargo run -p afs-tests --bin bencch -- compare armfortas gfortran --program fixtures/runtime/if_else.f90 --armfortas-bin ../target/debug/armfortas
164	```
165
166	The same compare surface works across opt levels too:
167
168	```bash
169	cargo run -p afs-tests --bin bencch -- compare armfortas gfortran --opt O2 --program fixtures/runtime/mixed_types.f90 --armfortas-bin ../target/debug/armfortas
170	```
171
172	Compare with an extra artifact diff:
173
174	```bash
175	cargo run -p afs-tests --bin bencch -- compare armfortas gfortran --program fixtures/runtime/mixed_types.f90 --artifact asm
176	```
177
178	Compare two explicit compiler binaries:
179
180	```bash
181	cargo run -p afs-tests --bin bencch -- compare /path/to/one /path/to/other --program fixtures/runtime/mixed_types.f90 --artifact asm,obj
182	```
183
184	Namespaced adapter artifacts are allowed in `compare` too, but only when both
185	compiler surfaces can actually provide them. If not, `bencch` fails early with
186	an explicit capability message.
187
188	Introspect one compiler on one program:
189
190	```bash
191	cargo run -p afs-tests --bin bencch -- introspect armfortas fixtures/runtime/mixed_types.f90
192	```
193
194	Introspect a rich armfortas stage explicitly:
195
196	```bash
197	cargo run -p afs-tests --bin bencch -- introspect armfortas fixtures/runtime/mixed_types.f90 --artifact armfortas.ir,asm
198	```
199
200	Introspect the full linked armfortas stage surface:
201
202	```bash
203	cargo run -p afs-tests --bin bencch -- introspect armfortas fixtures/runtime/mixed_types.f90 --all
204	```
205
206	Trim large introspection sections to a readable preview:
207
208	```bash
209	cargo run -p afs-tests --bin bencch -- introspect armfortas fixtures/runtime/if_else.f90 --all --max-artifact-lines 12
210	```
211
212	Keep only section summaries and omit artifact bodies:
213
214	```bash
215	cargo run -p afs-tests --bin bencch -- introspect armfortas fixtures/runtime/if_else.f90 --all --summary-only
216	```
217
218	Introspect a named external compiler on the generic surface:
219
220	```bash
221	cargo run -p afs-tests --bin bencch -- introspect gfortran fixtures/runtime/if_else.f90 --artifact asm,obj,runtime
222	```
223
224	If you request artifacts that a compiler surface cannot provide, `bencch`
225	fails early with a capability message instead of pretending the compiler
226	failed mid-pipeline.
227
228	Introspect an explicit compiler path on that same generic surface:
229
230	```bash
231	cargo run -p afs-tests --bin bencch -- introspect /path/to/compiler fixtures/runtime/if_else.f90 --artifact asm,obj,runtime
232	```
233
234	Introspect a failing armfortas source and keep the partial capture:
235
236	```bash
237	cargo run -p afs-tests --bin bencch -- introspect armfortas fixtures/invalid/parse_error.f90 --artifact armfortas.tokens,armfortas.ir,asm
238	```
239
240	Run against an explicit compiler binary:
241
242	```bash
243	cargo run -p afs-tests --bin bencch -- run --suite consistency/runtime-control-flow --armfortas-bin ./target/debug/armfortas
244	```
245
246	Run an asm/object surface through an explicit compiler binary:
247
248	```bash
249	cargo run -p afs-tests --bin bencch -- run --suite backend/asm --case runtime_wrapper_and_calls --armfortas-bin ./target/debug/armfortas
250	```
251
252	Run differential checks with explicit reference compiler paths:
253
254	```bash
255	cargo run -p afs-tests --bin bencch -- run --suite differential/runtime-control-flow --gfortran-bin /opt/homebrew/bin/gfortran --flang-bin /opt/homebrew/bin/flang-new
256	```
257
258	Run one case with full stage capture:
259
260	```bash
261	cargo run -p afs-tests --bin bencch -- run --suite frontend --case stage_walk --all --verbose
262	```
263
264	Write machine-readable reports:
265
266	```bash
267	cargo run -p afs-tests --bin bencch -- run --suite modules --all --json-report reports/modules.json --markdown-report reports/modules.md
268	```
269
270	Run consistency coverage:
271
272	```bash
273	cargo run -p afs-tests --bin bencch -- run --suite consistency --all
274	```
275
276	Run differential coverage:
277
278	```bash
279	cargo run -p afs-tests --bin bencch -- run --suite differential
280	```
281
282	Reports are written under `reports/`.
283
284	`compare` now prints a short summary block with status, divergence
285	classification, basis, difference count, changed artifacts, and the backend
286	used on each side before any per-artifact diffs.
287
288	`introspect` now groups portable outputs like `asm`, `obj`, and `runtime`
289	separately from adapter extras like `armfortas.ir` and `armfortas.tokens` in
290	text, JSON, and Markdown output, and it now reports requested, captured, and
291	missing artifacts at the top of the report. Failure-side introspection also
292	surfaces the failure stage when the adapter knows it, plus a short diagnostic
293	excerpt before the full diagnostics block. For large captures, `--summary-only`
294	and `--max-artifact-lines <n>` keep the text and Markdown surfaces readable.
295	JSON reports keep the full artifact bodies and now add compact
296	`artifact_summaries` alongside them for quick scanning.
297
298	Environment overrides work too:
299
300	```bash
301	BENCCH_ARMFORTAS_BIN=./target/debug/armfortas cargo run -p afs-tests --bin bencch -- run --suite consistency/object
302	```
303
304	Backend choice is visible in:
305
306	- `cargo run -p afs-tests --bin bencch -- doctor`
307	- `--verbose` case runs
308	- JSON and Markdown reports as `primary_backend`
309	- bundle `metadata.txt` and `armfortas/metadata.txt`
310
311	## Suite Format
312
313	Suites are plain text files under `suites/`.
314
315	```text
316	suite "consistency/runtime"
317
318	case "mixed_types_cli_run_reproducible"
319	source "../../fixtures/runtime/mixed_types.f90"
320	opts => all
321	armfortas => run
322	repeat => 3
323	consistency => cli_run_reproducible
324	expect run.stdout check-comments
325	expect run.exit_code equals 0
326	end
327	```
328
329	The new suite-v2 generic surface can target any compiler spec the same way
330	`bencch introspect` does:
331
332	```text
333	suite "v2/generic-introspect"
334
335	case "fake_compiler_runtime_matrix"
336	source "../../fixtures/runtime/if_else.f90"
337	opts => O0, O1, O2
338	compiler "../../fixtures/fake_compilers/match_42_a.sh" => asm, runtime
339	expect asm contains ".globl _main"
340	expect run.stdout contains "42"
341	expect run.exit_code equals 0
342	end
343	```
344
345	Generic compiler cases can also lean on references and CLI-style
346	reproducibility checks:
347
348	```text
349	suite "v2/generic-differential"
350
351	case "gfortran_runtime_matrix"
352	source "../../fixtures/runtime/if_else.f90"
353	opts => O0, O1, O2
354	compiler gfortran => runtime
355	differential => flang-new
356	expect run.stdout check-comments
357	expect run.exit_code equals 0
358	end
359	```
360
361	```text
362	suite "v2/generic-consistency"
363
364	case "fake_compiler_runtime_matrix"
365	source "../../fixtures/runtime/if_else.f90"
366	opts => O0, O1, O2
367	repeat => 3
368	compiler "../../fixtures/fake_compilers/match_42_a.sh" => asm, runtime
369	consistency => cli_asm_reproducible, cli_run_reproducible
370	expect asm contains ".globl _main"
371	expect run.stdout contains "42"
372	expect run.exit_code equals 0
373	end
374	```
375
376	For mem2reg-branch compatibility, `check-comments` on `armfortas.ir` understands
377	inline `! IR_CHECK:` and `! IR_NOT:` annotations, while `run.stdout
378	check-comments` keeps using the usual `! CHECK:` lines.
379
380	Two more opt-in bridges exist for imported mem2reg-style audits:
381
382	- `expect-fail comments` reads `! ERROR_EXPECTED:` lines from the source
383	- `xfail comments` reads the first `! XFAIL:` line from the source
384
385	Those compose the same way the old mem2reg harness did: a case can keep a
386	source-owned expected diagnostic and still remain `xfail` until trunk starts
387	producing that diagnostic correctly.
388
389	Suite-v2 can also drive the generic compare engine:
390
391	```text
392	suite "v2/generic-compare"
393
394	case "fake_compilers_match_matrix"
395	source "../../fixtures/runtime/if_else.f90"
396	opts => O0, O1, O2
397	compare "../../fixtures/fake_compilers/match_42_a.sh" "../../fixtures/fake_compilers/match_42_b.sh" => asm
398	expect compare.status equals "match"
399	expect compare.classification equals "match"
400	expect compare.difference_count equals 0
401	end
402	```
403
404	Suite-v2 unhappy paths can use the same generic engine too:
405
406	```text
407	suite "v2/generic-failures"
408
409	case "fake_compiler_expected_diagnostic"
410	source "../../fixtures/invalid/fake_compile_fail_expected.f90"
411	compiler "../../fixtures/fake_compilers/compile_fail.sh" => diagnostics
412	expect-fail comments
413	end
414
415	case "armfortas_parse_error"
416	source "../../fixtures/invalid/parse_error.f90"
417	compiler armfortas => diagnostics
418	expect-fail parser contains "expected entity name"
419	end
420	```
421
422	And they can be matrixed the same way as the happy-path suites:
423
424	```text
425	suite "v2/generic-failure-matrix"
426
427	case "fake_compilers_compile_divergence_matrix"
428	source "../../fixtures/runtime/if_else.f90"
429	opts => O0, O1, O2
430	compare "../../fixtures/fake_compilers/compile_fail.sh" "../../fixtures/fake_compilers/match_42_a.sh" => diagnostics
431	expect compare.status equals "diff"
432	expect compare.classification equals "compile divergence"
433	expect compare.difference_count equals 2
434	end
435	```
436
437	If a suite-v2 case is intentionally blocked by adapter capability limits, you
438	can say so directly:
439
440	```text
441	suite "v2/capability-policy"
442
443	case "gfortran_armfortas_ir_future"
444	source "../../fixtures/runtime/if_else.f90"
445	compiler gfortran => armfortas.ir
446	future capability "generic gfortran surface has no armfortas extras"
447	end
448
449	case "mixed_surface_ir_compare_xfail"
450	source "../../fixtures/runtime/if_else.f90"
451	compare armfortas gfortran => armfortas.ir
452	xfail capability "mixed-surface namespaced compare stays soft for now"
453	end
454	```
455
456	Namespaced armfortas artifacts can be matrixed too:
457
458	```text
459	suite "v2/armfortas-namespace-matrix"
460
461	case "if_else_frontend_matrix"
462	source "../../fixtures/runtime/if_else.f90"
463	opts => O0, O1, O2
464	compiler armfortas => armfortas.tokens, armfortas.ast, armfortas.sema
465	expect armfortas.tokens contains "\"then\""
466	expect armfortas.ast contains "node: IfConstruct"
467	expect armfortas.sema contains "diagnostics: none"
468	end
469	```
470
471	Graph cases use `entry` plus ordered `file` lines:
472
473	```text
474	suite "v2/generic-graphs"
475
476	case "module_chain_frontend"
477	entry "../../fixtures/modules/module_chain/main.f90"
478	file "../../fixtures/modules/module_chain/math_seed.f90"
479	file "../../fixtures/modules/module_chain/math_values.f90"
480	file "../../fixtures/modules/module_chain/main.f90"
481	compiler armfortas => armfortas.ast, armfortas.sema
482	expect armfortas.ast contains "name: \"math_seed\""
483	expect armfortas.sema contains "local_name: \"doubled\""
484	end
485	```
486
487	Today the armfortas adapter materializes graph cases into one generated source
488	in declared file order before capture/compile. The authored files still stay in
489	the failure bundle.
490
491	Common things the runner understands:
492
493	- stage capture like `armfortas => tokens, ir, asm, obj, run`
494	- generic compiler capture like `compiler gfortran => asm, obj, runtime` or `compiler "/path/to/compiler" => asm, obj, runtime`
495	- suite-v2 generic compiler cases can also use opt matrices, `differential => ...`, and CLI-style reproducibility checks
496	- `check-comments` on `armfortas.ir` / `ir` uses `! IR_CHECK:` and `! IR_NOT:`
497	- `expect-fail comments` uses inline `! ERROR_EXPECTED:` source comments
498	- `xfail comments` uses the first inline `! XFAIL:` source comment
499	- generic compare cases like `compare gfortran flang-new => asm`, including opt matrices
500	- capability-aware authored softening with `future capability "..."` and `xfail capability "..."`
501	- suite-v2 graph cases with `entry` plus ordered `file` lines
502	- opt matrices like `opts => O0, O1, O2`
503	- references like `differential => gfortran, flang-new`
504	- expected failures like `xfail "reason"`
505	- per-opt status like `xfail when O1, O2 because "reason"`
506	- consistency checks like `cli_obj_vs_system_as` and `capture_run_reproducible`
507	- report outputs like `--json-report path/to/report.json` and `--markdown-report path/to/report.md`
508	- environment and adapter inspection with `doctor`
509	- direct one-shot compare with `compare`
510	- direct one-shot artifact/stage inspection with `introspect`
511
512	## Notes
513
514	- `.docs/` is local and gitignored.
515	- `bencch` is now the public CLI story; `afs-tests` remains as a compatibility
516	alias.
517	- The product is now centered on `compare`, `introspect`, `run`, and `doctor`.
518	- The runner is currently strongest on stage capture, differential behavior,
519	and consistency work around reproducibility and cross-path mismatches.