markdown · 4583 bytes Raw Blame History

bensch

A POSIX shell testing framework. Pass any shell binary, get a compliance report.

Quick Start

# Test bash
./bensch --shell /bin/bash --suite posix

# Test a custom shell with interactive PTY tests
./bensch --shell /path/to/myshell --suite interactive

# Full test run with profile auto-detection
./bensch --shell /bin/zsh --suite all

# See what's available
./bensch --list-suites
./bensch --list-profiles

What It Tests

bensch runs three categories of tests against any shell binary:

Suite Tests What It Checks
posix ~3,800 POSIX compliance: expansion, quoting, redirection, control flow, builtins, heredocs, job control
interactive ~1,014 PTY behavior: line editing, history, tab completion, vi mode, signals, prompts
builtins ~650 Individual builtin commands compared against a reference shell (bash)

How It Works

POSIX tests run each command in both the shell under test and a reference shell (default: bash), then compare outputs. This tells you exactly where your shell diverges from established behavior.

Interactive tests spawn the shell in a pseudo-terminal via pexpect and drive it with keystrokes — typing commands, pressing Tab, Ctrl+R, arrow keys, Ctrl+C — then verify the terminal output matches expectations.

Builtin tests exercise individual commands (cd, export, test, trap, etc.) and compare output and exit codes against the reference shell.

Shell Profiles

Each shell has a profile (profiles/*.yaml) describing its capabilities:

# profiles/bash.yaml
name: bash
prompt_pattern: '\$ '
prompt_set_command: "PS1='$ '"
mode_reset_command: "set -o emacs"
capabilities:
  readline: true
  vi_mode: true
  job_control: true
  command_completion: true

Profiles control prompt detection, session reset, rc-file disabling, and which test suites are applicable. Shells without readline (like dash) automatically skip interactive editing tests.

Available profiles: bash, zsh, dash, ksh, fortsh, generic

Requirements

  • Python 3.8+ with pip (for interactive tests)
  • bash (as reference shell for comparison tests)
  • A shell binary to test

Dependencies are installed automatically into a local .venv on first run.

Project Structure

bensch/
├── bensch                    # Entry point script
├── framework/                # Core PTY engine and test runner
│   ├── shell_pty.py          # pexpect-based PTY wrapper
│   ├── runner.py             # YAML test runner with session management
│   ├── profile.py            # Shell profile loader
│   └── utils/                # Key definitions, output matchers
├── suites/                   # Test specifications
│   ├── interactive/          # YAML PTY tests (posix, editing, history, ...)
│   ├── posix/                # Shell-script POSIX compliance tests
│   └── builtins/             # Builtin command tests (portable, extended)
├── profiles/                 # Shell capability profiles
└── docs/                     # Documentation

Compliance Matrix

Tested on macOS ARM64 (April 2026). POSIX scores are averaged across 7 test suites. Builtins are 605 assertions. Integration is 479 assertions.

Shell Version POSIX Builtins Integration Notes
bash 5.3 ~100% 93% 100% Reference shell
osh 0.37 93% 91% Oils — bash replacement, excellent compat
/bin/sh bash 3.2 97% macOS system shell, lacks bash 4+ features
mksh R59c 93% 87% 99% MirBSD Korn Shell
yash 2.60 92% 88% 99% Designed for strict POSIX compliance
dash 0.5.12 91% 84% 99% Debian minimal POSIX shell
zsh 5.9 89% 89% 99% Extensions cause POSIX divergence
fish 4.1 31% 86% 98% Not POSIX — different syntax entirely
rc 23% 84% Plan 9 shell, not POSIX
elvish 0.21 18% 75% Modern shell, not POSIX
ksh macOS built-in ~0% SEGFAULTS — broken system binary

Origin

bensch was extracted from the fortsh project's test infrastructure, which achieved 1014/1014 interactive test parity across x86 Linux, ARM64 Linux, and macOS ARM64. The framework is 93% shell-agnostic by design.

License

GPLv3

View source
1 # bensch
2
3 A POSIX shell testing framework. Pass any shell binary, get a compliance report.
4
5 ## Quick Start
6
7 ```bash
8 # Test bash
9 ./bensch --shell /bin/bash --suite posix
10
11 # Test a custom shell with interactive PTY tests
12 ./bensch --shell /path/to/myshell --suite interactive
13
14 # Full test run with profile auto-detection
15 ./bensch --shell /bin/zsh --suite all
16
17 # See what's available
18 ./bensch --list-suites
19 ./bensch --list-profiles
20 ```
21
22 ## What It Tests
23
24 bensch runs three categories of tests against any shell binary:
25
26 | Suite | Tests | What It Checks |
27 |-------|-------|----------------|
28 | `posix` | ~3,800 | POSIX compliance: expansion, quoting, redirection, control flow, builtins, heredocs, job control |
29 | `interactive` | ~1,014 | PTY behavior: line editing, history, tab completion, vi mode, signals, prompts |
30 | `builtins` | ~650 | Individual builtin commands compared against a reference shell (bash) |
31
32 ## How It Works
33
34 **POSIX tests** run each command in both the shell under test and a reference shell (default: bash), then compare outputs. This tells you exactly where your shell diverges from established behavior.
35
36 **Interactive tests** spawn the shell in a pseudo-terminal via pexpect and drive it with keystrokes — typing commands, pressing Tab, Ctrl+R, arrow keys, Ctrl+C — then verify the terminal output matches expectations.
37
38 **Builtin tests** exercise individual commands (`cd`, `export`, `test`, `trap`, etc.) and compare output and exit codes against the reference shell.
39
40 ## Shell Profiles
41
42 Each shell has a profile (`profiles/*.yaml`) describing its capabilities:
43
44 ```yaml
45 # profiles/bash.yaml
46 name: bash
47 prompt_pattern: '\$ '
48 prompt_set_command: "PS1='$ '"
49 mode_reset_command: "set -o emacs"
50 capabilities:
51 readline: true
52 vi_mode: true
53 job_control: true
54 command_completion: true
55 ```
56
57 Profiles control prompt detection, session reset, rc-file disabling, and which test suites are applicable. Shells without readline (like dash) automatically skip interactive editing tests.
58
59 Available profiles: `bash`, `zsh`, `dash`, `ksh`, `fortsh`, `generic`
60
61 ## Requirements
62
63 - Python 3.8+ with pip (for interactive tests)
64 - bash (as reference shell for comparison tests)
65 - A shell binary to test
66
67 Dependencies are installed automatically into a local `.venv` on first run.
68
69 ## Project Structure
70
71 ```
72 bensch/
73 ├── bensch # Entry point script
74 ├── framework/ # Core PTY engine and test runner
75 │ ├── shell_pty.py # pexpect-based PTY wrapper
76 │ ├── runner.py # YAML test runner with session management
77 │ ├── profile.py # Shell profile loader
78 │ └── utils/ # Key definitions, output matchers
79 ├── suites/ # Test specifications
80 │ ├── interactive/ # YAML PTY tests (posix, editing, history, ...)
81 │ ├── posix/ # Shell-script POSIX compliance tests
82 │ └── builtins/ # Builtin command tests (portable, extended)
83 ├── profiles/ # Shell capability profiles
84 └── docs/ # Documentation
85 ```
86
87 ## Compliance Matrix
88
89 Tested on macOS ARM64 (April 2026). POSIX scores are averaged across 7 test suites. Builtins are 605 assertions. Integration is 479 assertions.
90
91 | Shell | Version | POSIX | Builtins | Integration | Notes |
92 |-------|---------|-------|----------|-------------|-------|
93 | **bash** | 5.3 | **~100%** | 93% | 100% | Reference shell |
94 | **osh** | 0.37 | **93%** | 91% | — | Oils — bash replacement, excellent compat |
95 | **/bin/sh** | bash 3.2 | **97%** | — | — | macOS system shell, lacks bash 4+ features |
96 | **mksh** | R59c | **93%** | 87% | 99% | MirBSD Korn Shell |
97 | **yash** | 2.60 | **92%** | 88% | 99% | Designed for strict POSIX compliance |
98 | **dash** | 0.5.12 | **91%** | 84% | 99% | Debian minimal POSIX shell |
99 | **zsh** | 5.9 | **89%** | 89% | 99% | Extensions cause POSIX divergence |
100 | **fish** | 4.1 | **31%** | 86% | 98% | Not POSIX — different syntax entirely |
101 | **rc** | — | **23%** | 84% | — | Plan 9 shell, not POSIX |
102 | **elvish** | 0.21 | **18%** | 75% | — | Modern shell, not POSIX |
103 | **ksh** | macOS built-in | **~0%** | — | — | SEGFAULTS — broken system binary |
104
105 ## Origin
106
107 bensch was extracted from the [fortsh](https://github.com/FortranGoingOnForty/fortsh) project's test infrastructure, which achieved 1014/1014 interactive test parity across x86 Linux, ARM64 Linux, and macOS ARM64. The framework is 93% shell-agnostic by design.
108
109 ## License
110
111 GPLv3