Interactive Test Expansion - Quick Reference
Current State
- Interactive Tests: 321 tests (72.6% pass rate)
- Non-Interactive Tests: ~656 POSIX compliance tests
- Coverage Gap: ~335 tests (mostly edge cases and builtin testing)
What We're Missing
1. Edge Case Coverage (~200 tests)
From posix_compliance_gaps.sh (180 tests):
- Parameter expansion edge cases (nested, complex patterns)
- Builtin edge cases (set, shift, eval, return, break/continue)
- Quoting and escaping complexity
- Here-document variations (<<-, <<EOF)
- Function scope and recursion
- Special parameter edge cases ($@, $*, IFS interactions)
- Redirection edge cases (<>, append with FD)
2. Interactive-Specific Depth (~150 tests)
Areas where we have basic tests but need edge cases:
- Line Editing: Undo, macros, very long lines, Unicode edge cases
- History: File operations, size limits, multi-line commands, sharing
- Completion: Programmable completion, special chars, long lists, context-sensitive
- Job Control: Multiple job specs, notification timing, wait variants
3. Cross-Feature Interactions (~100 tests)
Combinations that reveal bugs:
- Editing + History (edit recalled command, Ctrl+R while editing)
- Completion + Variables (complete with spaces, inside ${VAR[TAB]})
- Job Control + Signals (Ctrl+C during completion, notification while editing)
- Prompt + Escape Sequences (command substitution in prompt, resize behavior)
4. Error Handling & Resources (~80 tests)
- Input edge cases (binary input, invalid UTF-8, very fast typing)
- Output edge cases (> buffer size, control chars, broken pipe)
- Resource limits (max history, FD exhaustion, process limits)
- Error recovery (undefined HOME/PATH, terminal errors)
How to Expand
Quick Wins (Easiest First)
1. Use the Converter Tool (30 minutes)
# Convert simple tests from POSIX suite
cd tests/interactive
.venv/bin/python utils/convert_posix_tests.py \
../../fortsh/tests/posix_compliance_test.sh \
test_specs/posix_basic_converted.yaml
# Review and fix MANUAL_REVIEW items
# Most echo commands auto-convert well
Expected: ~60-80 usable tests from the 96 in posix_compliance_test.sh
2. Hand-Craft Edge Cases (2-3 hours)
Pick 30-40 edge cases from posix_compliance_gaps.sh that are interesting:
# Example: Builtin edge cases
- name: "shift with no arguments uses $@"
steps:
- send_line: "set -- a b c"
- send_line: "shift"
- send_line: "echo $@"
expect_output: "b c"
- name: "shift beyond available args is error"
steps:
- send_line: "set -- a"
- send_line: "shift 2; echo $?"
expect_output: "1"
- name: "eval with empty string"
steps:
- send_line: "eval ''; echo $?"
expect_output: "0"
3. Extend Existing Categories (1-2 hours)
Add variations to existing tests:
# In history.yaml, add:
- name: "History with very long command (>1000 chars)"
steps:
- send_line: "echo <1000 char string>"
- send_key: "Up"
expect_output: "<verify it appears>"
# In completion.yaml, add:
- name: "Complete filename with spaces and quotes"
steps:
- send: "ls 'file with "
- send_key: "Tab"
expect_output: "file with spaces.txt"
Medium Effort (More Time Investment)
4. Create New Spec Files (3-4 hours each)
Add comprehensive coverage for specific areas:
test_specs/builtins_edge_cases.yaml:
- All edge cases for cd, set, shift, eval, return, break, continue
- Readonly and unset interactions
- Alias edge cases
- getopts comprehensive testing
test_specs/parameter_expansion_advanced.yaml:
- Nested parameter expansion
- All pattern matching variations (%, %%, #, ##)
- Substring operations
- Complex default/assign/error patterns
test_specs/cross_feature_interactions.yaml:
- Editing while history search active
- Completion during variable expansion
- Job control notification during prompt display
- Signal handling during different interactive states
Tools We Created
-
convert_posix_tests.py
- Parses
compare_posix_outputcalls from shell scripts - Generates YAML test specs
- Marks complex cases for manual review
- ~60-70% of tests auto-convert successfully
- Parses
-
Session Reuse Framework
- Reuses PTY sessions (10 tests per session)
- Automatic reset between tests
- Handles ~300+ tests without resource exhaustion
Recommended Priorities
Phase 1: Foundation (Week 1) - Target: +100 tests
- Convert basic POSIX tests (echo, variables, simple commands)
- Add builtin edge cases (shift, set, eval - highest value)
- Extend parameter expansion tests
Outcome: 421 tests (~74% pass rate expected)
Phase 2: Interactive Depth (Week 2) - Target: +80 tests
- Line editing edge cases (long lines, Unicode, special chars)
- History edge cases (multi-line, file operations, size limits)
- Completion improvements (special chars, long lists)
Outcome: 501 tests (~73% pass rate expected, interactive-specific may reveal bugs)
Phase 3: Interactions & Edge Cases (Week 3) - Target: +70 tests
- Cross-feature interaction tests
- Job control comprehensive testing
- Error handling and resource limit tests
Outcome: 571 tests (~72% pass rate expected)
Phase 4: Coverage Complete (Week 4) - Target: +80 tests
- Remaining POSIX gaps conversions
- Stress and performance tests
- Documentation and CI integration
Outcome: 650+ tests (parity with non-interactive suite)
Quick Start Example
Here's how to add 10 new tests in 15 minutes:
-
Pick a category (e.g., "shift builtin edge cases")
-
Create tests in existing spec or new file:
# Add to test_specs/posix.yaml or create test_specs/builtins.yaml
- Write 10 variations:
- name: "shift no args"
steps:
- send_line: "set -- a b; shift; echo $@"
expect_output: "b"
- name: "shift with count"
steps:
- send_line: "set -- a b c; shift 2; echo $@"
expect_output: "c"
- name: "shift all"
steps:
- send_line: "set -- a; shift; echo $#"
expect_output: "0"
# ... 7 more variations
- Run tests:
tests/interactive/.venv/bin/python tests/interactive/run_tests.py \
--fortsh ../fortsh/bin/fortsh --spec builtins.yaml
- Fix failures and commit.
Metrics
Time Estimates:
- Convert 100 tests: ~2 hours (with converter)
- Hand-write 50 tests: ~3 hours
- Review/fix converted tests: ~1 hour per 50 tests
- Total for 500+ test expansion: ~20-25 hours
Expected Outcomes:
- Coverage: Match non-interactive test count (650+)
- Pass Rate: 70-75% (some new tests will reveal bugs)
- Quality: Better edge case coverage
- Maintenance: Easier to identify gaps
Resources
- Expansion Plan:
EXPANSION_PLAN.md(detailed strategy) - Converter Tool:
utils/convert_posix_tests.py - Non-Interactive Tests:
../../fortsh/tests/posix_compliance*.sh - Current Tests:
test_specs/*.yaml
View source
| 1 | # Interactive Test Expansion - Quick Reference |
| 2 | |
| 3 | ## Current State |
| 4 | - **Interactive Tests**: 321 tests (72.6% pass rate) |
| 5 | - **Non-Interactive Tests**: ~656 POSIX compliance tests |
| 6 | - **Coverage Gap**: ~335 tests (mostly edge cases and builtin testing) |
| 7 | |
| 8 | ## What We're Missing |
| 9 | |
| 10 | ### 1. Edge Case Coverage (~200 tests) |
| 11 | From `posix_compliance_gaps.sh` (180 tests): |
| 12 | - Parameter expansion edge cases (nested, complex patterns) |
| 13 | - Builtin edge cases (set, shift, eval, return, break/continue) |
| 14 | - Quoting and escaping complexity |
| 15 | - Here-document variations (<<-, <<EOF) |
| 16 | - Function scope and recursion |
| 17 | - Special parameter edge cases ($@, $*, IFS interactions) |
| 18 | - Redirection edge cases (<>, append with FD) |
| 19 | |
| 20 | ### 2. Interactive-Specific Depth (~150 tests) |
| 21 | Areas where we have basic tests but need edge cases: |
| 22 | - **Line Editing**: Undo, macros, very long lines, Unicode edge cases |
| 23 | - **History**: File operations, size limits, multi-line commands, sharing |
| 24 | - **Completion**: Programmable completion, special chars, long lists, context-sensitive |
| 25 | - **Job Control**: Multiple job specs, notification timing, wait variants |
| 26 | |
| 27 | ### 3. Cross-Feature Interactions (~100 tests) |
| 28 | Combinations that reveal bugs: |
| 29 | - Editing + History (edit recalled command, Ctrl+R while editing) |
| 30 | - Completion + Variables (complete with spaces, inside ${VAR[TAB]}) |
| 31 | - Job Control + Signals (Ctrl+C during completion, notification while editing) |
| 32 | - Prompt + Escape Sequences (command substitution in prompt, resize behavior) |
| 33 | |
| 34 | ### 4. Error Handling & Resources (~80 tests) |
| 35 | - Input edge cases (binary input, invalid UTF-8, very fast typing) |
| 36 | - Output edge cases (> buffer size, control chars, broken pipe) |
| 37 | - Resource limits (max history, FD exhaustion, process limits) |
| 38 | - Error recovery (undefined HOME/PATH, terminal errors) |
| 39 | |
| 40 | ## How to Expand |
| 41 | |
| 42 | ### Quick Wins (Easiest First) |
| 43 | |
| 44 | #### 1. Use the Converter Tool (30 minutes) |
| 45 | ```bash |
| 46 | # Convert simple tests from POSIX suite |
| 47 | cd tests/interactive |
| 48 | .venv/bin/python utils/convert_posix_tests.py \ |
| 49 | ../../fortsh/tests/posix_compliance_test.sh \ |
| 50 | test_specs/posix_basic_converted.yaml |
| 51 | |
| 52 | # Review and fix MANUAL_REVIEW items |
| 53 | # Most echo commands auto-convert well |
| 54 | ``` |
| 55 | |
| 56 | **Expected**: ~60-80 usable tests from the 96 in posix_compliance_test.sh |
| 57 | |
| 58 | #### 2. Hand-Craft Edge Cases (2-3 hours) |
| 59 | Pick 30-40 edge cases from `posix_compliance_gaps.sh` that are interesting: |
| 60 | ```yaml |
| 61 | # Example: Builtin edge cases |
| 62 | - name: "shift with no arguments uses $@" |
| 63 | steps: |
| 64 | - send_line: "set -- a b c" |
| 65 | - send_line: "shift" |
| 66 | - send_line: "echo $@" |
| 67 | expect_output: "b c" |
| 68 | |
| 69 | - name: "shift beyond available args is error" |
| 70 | steps: |
| 71 | - send_line: "set -- a" |
| 72 | - send_line: "shift 2; echo $?" |
| 73 | expect_output: "1" |
| 74 | |
| 75 | - name: "eval with empty string" |
| 76 | steps: |
| 77 | - send_line: "eval ''; echo $?" |
| 78 | expect_output: "0" |
| 79 | ``` |
| 80 | |
| 81 | #### 3. Extend Existing Categories (1-2 hours) |
| 82 | Add variations to existing tests: |
| 83 | ```yaml |
| 84 | # In history.yaml, add: |
| 85 | - name: "History with very long command (>1000 chars)" |
| 86 | steps: |
| 87 | - send_line: "echo <1000 char string>" |
| 88 | - send_key: "Up" |
| 89 | expect_output: "<verify it appears>" |
| 90 | |
| 91 | # In completion.yaml, add: |
| 92 | - name: "Complete filename with spaces and quotes" |
| 93 | steps: |
| 94 | - send: "ls 'file with " |
| 95 | - send_key: "Tab" |
| 96 | expect_output: "file with spaces.txt" |
| 97 | ``` |
| 98 | |
| 99 | ### Medium Effort (More Time Investment) |
| 100 | |
| 101 | #### 4. Create New Spec Files (3-4 hours each) |
| 102 | Add comprehensive coverage for specific areas: |
| 103 | |
| 104 | **test_specs/builtins_edge_cases.yaml**: |
| 105 | - All edge cases for cd, set, shift, eval, return, break, continue |
| 106 | - Readonly and unset interactions |
| 107 | - Alias edge cases |
| 108 | - getopts comprehensive testing |
| 109 | |
| 110 | **test_specs/parameter_expansion_advanced.yaml**: |
| 111 | - Nested parameter expansion |
| 112 | - All pattern matching variations (%, %%, #, ##) |
| 113 | - Substring operations |
| 114 | - Complex default/assign/error patterns |
| 115 | |
| 116 | **test_specs/cross_feature_interactions.yaml**: |
| 117 | - Editing while history search active |
| 118 | - Completion during variable expansion |
| 119 | - Job control notification during prompt display |
| 120 | - Signal handling during different interactive states |
| 121 | |
| 122 | ### Tools We Created |
| 123 | |
| 124 | 1. **convert_posix_tests.py** |
| 125 | - Parses `compare_posix_output` calls from shell scripts |
| 126 | - Generates YAML test specs |
| 127 | - Marks complex cases for manual review |
| 128 | - ~60-70% of tests auto-convert successfully |
| 129 | |
| 130 | 2. **Session Reuse Framework** |
| 131 | - Reuses PTY sessions (10 tests per session) |
| 132 | - Automatic reset between tests |
| 133 | - Handles ~300+ tests without resource exhaustion |
| 134 | |
| 135 | ## Recommended Priorities |
| 136 | |
| 137 | ### Phase 1: Foundation (Week 1) - Target: +100 tests |
| 138 | 1. Convert basic POSIX tests (echo, variables, simple commands) |
| 139 | 2. Add builtin edge cases (shift, set, eval - highest value) |
| 140 | 3. Extend parameter expansion tests |
| 141 | |
| 142 | **Outcome**: 421 tests (~74% pass rate expected) |
| 143 | |
| 144 | ### Phase 2: Interactive Depth (Week 2) - Target: +80 tests |
| 145 | 1. Line editing edge cases (long lines, Unicode, special chars) |
| 146 | 2. History edge cases (multi-line, file operations, size limits) |
| 147 | 3. Completion improvements (special chars, long lists) |
| 148 | |
| 149 | **Outcome**: 501 tests (~73% pass rate expected, interactive-specific may reveal bugs) |
| 150 | |
| 151 | ### Phase 3: Interactions & Edge Cases (Week 3) - Target: +70 tests |
| 152 | 1. Cross-feature interaction tests |
| 153 | 2. Job control comprehensive testing |
| 154 | 3. Error handling and resource limit tests |
| 155 | |
| 156 | **Outcome**: 571 tests (~72% pass rate expected) |
| 157 | |
| 158 | ### Phase 4: Coverage Complete (Week 4) - Target: +80 tests |
| 159 | 1. Remaining POSIX gaps conversions |
| 160 | 2. Stress and performance tests |
| 161 | 3. Documentation and CI integration |
| 162 | |
| 163 | **Outcome**: 650+ tests (parity with non-interactive suite) |
| 164 | |
| 165 | ## Quick Start Example |
| 166 | |
| 167 | Here's how to add 10 new tests in 15 minutes: |
| 168 | |
| 169 | 1. **Pick a category** (e.g., "shift builtin edge cases") |
| 170 | |
| 171 | 2. **Create tests** in existing spec or new file: |
| 172 | ```bash |
| 173 | # Add to test_specs/posix.yaml or create test_specs/builtins.yaml |
| 174 | ``` |
| 175 | |
| 176 | 3. **Write 10 variations**: |
| 177 | ```yaml |
| 178 | - name: "shift no args" |
| 179 | steps: |
| 180 | - send_line: "set -- a b; shift; echo $@" |
| 181 | expect_output: "b" |
| 182 | |
| 183 | - name: "shift with count" |
| 184 | steps: |
| 185 | - send_line: "set -- a b c; shift 2; echo $@" |
| 186 | expect_output: "c" |
| 187 | |
| 188 | - name: "shift all" |
| 189 | steps: |
| 190 | - send_line: "set -- a; shift; echo $#" |
| 191 | expect_output: "0" |
| 192 | |
| 193 | # ... 7 more variations |
| 194 | ``` |
| 195 | |
| 196 | 4. **Run tests**: |
| 197 | ```bash |
| 198 | tests/interactive/.venv/bin/python tests/interactive/run_tests.py \ |
| 199 | --fortsh ../fortsh/bin/fortsh --spec builtins.yaml |
| 200 | ``` |
| 201 | |
| 202 | 5. **Fix failures** and commit. |
| 203 | |
| 204 | ## Metrics |
| 205 | |
| 206 | **Time Estimates**: |
| 207 | - Convert 100 tests: ~2 hours (with converter) |
| 208 | - Hand-write 50 tests: ~3 hours |
| 209 | - Review/fix converted tests: ~1 hour per 50 tests |
| 210 | - **Total for 500+ test expansion**: ~20-25 hours |
| 211 | |
| 212 | **Expected Outcomes**: |
| 213 | - **Coverage**: Match non-interactive test count (650+) |
| 214 | - **Pass Rate**: 70-75% (some new tests will reveal bugs) |
| 215 | - **Quality**: Better edge case coverage |
| 216 | - **Maintenance**: Easier to identify gaps |
| 217 | |
| 218 | ## Resources |
| 219 | |
| 220 | - **Expansion Plan**: `EXPANSION_PLAN.md` (detailed strategy) |
| 221 | - **Converter Tool**: `utils/convert_posix_tests.py` |
| 222 | - **Non-Interactive Tests**: `../../fortsh/tests/posix_compliance*.sh` |
| 223 | - **Current Tests**: `test_specs/*.yaml` |