Interactive Test Expansion Plan
Current State
- Interactive tests: 321 tests (72.6% pass rate)
- Non-interactive tests: ~656 test assertions across POSIX compliance suite
- Gap: Interactive tests lack edge case coverage and comprehensive builtin testing
Analysis of Non-Interactive Test Coverage
Categories in Non-Interactive Tests
-
posix_compliance_test.sh (101 tests) - Basic POSIX features
- Commands, variables, parameter expansion, command substitution
- Arithmetic, redirections, pipelines, conditionals, loops
- Functions, special parameters, builtins
-
posix_compliance_extended.sh (119 tests) - Extended features
- Advanced parameter expansion, globbing patterns
- Complex redirections, subshells, job control
- Advanced arithmetic, error handling
-
posix_compliance_advanced.sh (119 tests) - Advanced features
- Complex quoting, nested structures
- Advanced control flow, signal handling
- Performance edge cases
-
posix_compliance_gaps.sh (180 tests) - Edge cases
- Here-document tab stripping, complex IFS splitting
- Function recursion, readonly/unset interactions
- Builtin edge cases (set, shift, eval, return, etc.)
- Alias, getopts, umask, hash, type, times, trap
- Empty/whitespace edge cases
-
posix_compliance_coverage.sh (100 tests) - Coverage gaps
- Untested combinations, boundary conditions
-
posix_compliance_untested.sh (37 tests) - Known gaps
Expansion Strategy
Phase 1: Port Non-Interactive Tests (Target: +300 tests)
Many non-interactive tests can be adapted to interactive mode by:
- Sending commands instead of using
-cflag - Waiting for output instead of capturing stdout
- Grouping related tests for session reuse
Approach: Create a converter script/tool to semi-automate this:
# Example: Convert from
compare_posix_output "echo simple" "echo hello"
# To YAML:
- name: "echo simple"
steps:
- send_line: "echo hello"
expect_output: "hello"
Priority Categories:
- Edge case builtins (set, shift, eval, return, dot, break/continue)
- Parameter expansion edge cases (nested, complex patterns)
- Quoting and escaping edge cases
- Redirection edge cases (here-doc, read/write mode <>, etc.)
- Function scope and recursion
- Special parameter edge cases ($@, $*, $#, etc.)
Phase 2: Interactive-Specific Features (Target: +150 tests)
Features unique to interactive mode that need deeper testing:
2.1 Advanced Line Editing (Target: +50 tests)
- Undo/redo operations (if supported)
- Macro recording/playback
- Multiple cursor positions
- Copy/paste with system clipboard
- Mouse support (if applicable)
- Unicode edge cases (emoji, RTL text, combining characters)
- Very long lines (> 1000 chars)
- Line editing with terminal resize
- Incremental search (Ctrl+R) edge cases
- Argument history (Alt+., M-C-y)
2.2 Advanced History (Target: +40 tests)
- History file operations (load, save, corruption handling)
- History size limits and rotation
- History ignoring patterns (HISTIGNORE)
- History timestamps
- Multi-line command history
- History sharing between sessions
- History search edge cases (empty pattern, special chars)
- History expansion with quoting
- History substitution modifiers (:p, :s, :g, etc.)
2.3 Advanced Completion (Target: +40 tests)
- Programmable completion scripts
- Custom completion functions
- Completion with special characters in filenames
- Completion case-insensitivity options
- Completion menu navigation edge cases
- Completion with very long lists (> 1000 items)
- Completion timeout handling
- Context-sensitive completion (git, make, custom commands)
2.4 Job Control Edge Cases (Target: +20 tests)
- Multiple job spec formats (%%, %+, %-, %n, %string, %?string)
- Job notification timing
- Job control with pipelines
- disown edge cases
- wait edge cases (wait %n, wait -n, wait with no jobs)
- Job control state after shell builtin failures
- Job control with subshells
- SIGCHLD handling
Phase 3: Cross-Feature Interaction Tests (Target: +100 tests)
Test combinations that may reveal bugs:
3.1 Editing + History
- Edit command from history, then recall it again
- Ctrl+R while editing a command
- History expansion while using line editing
- Multi-line command editing and recall
3.2 Completion + Variables/Functions
- Complete with variable containing spaces
- Complete inside parameter expansion ${VAR[TAB]}
- Complete function names after definition
- Complete with exported vs local variables
3.3 Job Control + Signals
- Ctrl+C during completion
- Ctrl+Z during history search
- Signal handling while editing
- Background job completion notification during editing
3.4 Prompt + Escape Sequences
- Prompt with command substitution that fails
- Prompt with very long expansion
- Prompt during terminal resize
- Prompt with non-printing characters
- PS2 in various multi-line contexts
Phase 4: Error Handling & Edge Cases (Target: +80 tests)
4.1 Input Edge Cases
- Binary input (Ctrl+@, Ctrl+A-Z all combinations)
- Invalid UTF-8 sequences
- Terminal escape sequences as input
- Very fast input (paste simulation)
- Input buffer overflow scenarios
4.2 Output Edge Cases
- Output > terminal buffer size
- Output with mixed control characters
- Output to full/broken pipe
- Output during terminal disconnect
4.3 Resource Limits
- Maximum history size reached
- File descriptor exhaustion
- Memory limits (very long command line)
- Process limits (max jobs)
4.4 Error Recovery
- Recovery from read error
- Recovery from terminal configuration errors
- Behavior when HOME undefined
- Behavior when PATH undefined/empty
Phase 5: Stress & Performance Tests (Target: +50 tests)
- Rapid command execution (typing speed test)
- Large history file loading
- Completion with huge directory (1000+ files)
- Very deep directory structures
- Long-running command interruption patterns
- Memory leak detection (long session)
- Session with 1000+ commands
Implementation Approach
Option 1: Manual YAML Creation
- Pros: Full control, can optimize for interactive mode
- Cons: Tedious, error-prone, time-consuming
- Estimate: ~40 hours for 500 tests
Option 2: Semi-Automated Conversion Tool
Create Python script to convert non-interactive tests:
#!/usr/bin/env python3
"""Convert non-interactive POSIX tests to interactive YAML format."""
import re
import yaml
def parse_compare_posix_test(line):
"""Parse: compare_posix_output "test name" "command" """
match = re.match(r'compare_posix_output "([^"]+)" "([^"]+)"', line)
if match:
name, command = match.groups()
# Estimate expected output
return {
'name': name,
'steps': [{'send_line': command}],
'expect_output': estimate_output(command),
'match_type': 'contains'
}
return None
- Pros: Fast initial creation, consistency
- Cons: May need manual adjustment, output estimation tricky
- Estimate: ~5 hours to write tool + ~10 hours to review/adjust
Option 3: Hybrid Approach (RECOMMENDED)
- Use converter for straightforward tests (commands, variables, etc.)
- Manually create complex interactive tests
- Use test generation for repetitive patterns
Estimate: ~15 hours total
Prioritization
High Priority (Do First)
-
Builtin edge cases from posix_compliance_gaps.sh
- These are well-defined and easy to convert
- High value for POSIX compliance
-
Parameter expansion edge cases
- Already have some coverage, extend it
- Clear expected outputs
-
Job control improvements
- Currently weakest area (52.5% pass rate)
- Critical for interactive shell
Medium Priority
- Advanced line editing features
- History expansion edge cases
- Cross-feature interaction tests
Lower Priority (Nice to Have)
- Stress tests
- Performance benchmarks
- Resource limit tests
Success Metrics
- Target coverage: 650+ interactive tests (matching non-interactive count)
- Target pass rate: Maintain or improve 72%+ overall
- Target time: All tests complete in < 10 minutes
- Coverage: Each POSIX shell feature has ≥3 tests (basic, edge case, error)
Next Steps
- Create converter tool for simple test translation
- Port 50 builtin edge case tests from gaps.sh
- Add 30 job control edge case tests
- Review and document patterns for future expansion
- Update test framework if needed for new test types
Questions to Resolve
- Should we maintain 1:1 parity with non-interactive tests, or create interactive-specific variants?
- How do we handle tests that are inherently non-interactive (e.g., exit code of entire script)?
- Should we have separate "fast" and "comprehensive" test suites?
- How do we test readline features that may not be present (compile-time options)?
View source
| 1 | # Interactive Test Expansion Plan |
| 2 | |
| 3 | ## Current State |
| 4 | - **Interactive tests**: 321 tests (72.6% pass rate) |
| 5 | - **Non-interactive tests**: ~656 test assertions across POSIX compliance suite |
| 6 | - **Gap**: Interactive tests lack edge case coverage and comprehensive builtin testing |
| 7 | |
| 8 | ## Analysis of Non-Interactive Test Coverage |
| 9 | |
| 10 | ### Categories in Non-Interactive Tests |
| 11 | 1. **posix_compliance_test.sh** (101 tests) - Basic POSIX features |
| 12 | - Commands, variables, parameter expansion, command substitution |
| 13 | - Arithmetic, redirections, pipelines, conditionals, loops |
| 14 | - Functions, special parameters, builtins |
| 15 | |
| 16 | 2. **posix_compliance_extended.sh** (119 tests) - Extended features |
| 17 | - Advanced parameter expansion, globbing patterns |
| 18 | - Complex redirections, subshells, job control |
| 19 | - Advanced arithmetic, error handling |
| 20 | |
| 21 | 3. **posix_compliance_advanced.sh** (119 tests) - Advanced features |
| 22 | - Complex quoting, nested structures |
| 23 | - Advanced control flow, signal handling |
| 24 | - Performance edge cases |
| 25 | |
| 26 | 4. **posix_compliance_gaps.sh** (180 tests) - Edge cases |
| 27 | - Here-document tab stripping, complex IFS splitting |
| 28 | - Function recursion, readonly/unset interactions |
| 29 | - Builtin edge cases (set, shift, eval, return, etc.) |
| 30 | - Alias, getopts, umask, hash, type, times, trap |
| 31 | - Empty/whitespace edge cases |
| 32 | |
| 33 | 5. **posix_compliance_coverage.sh** (100 tests) - Coverage gaps |
| 34 | - Untested combinations, boundary conditions |
| 35 | |
| 36 | 6. **posix_compliance_untested.sh** (37 tests) - Known gaps |
| 37 | |
| 38 | ## Expansion Strategy |
| 39 | |
| 40 | ### Phase 1: Port Non-Interactive Tests (Target: +300 tests) |
| 41 | |
| 42 | Many non-interactive tests can be adapted to interactive mode by: |
| 43 | 1. Sending commands instead of using `-c` flag |
| 44 | 2. Waiting for output instead of capturing stdout |
| 45 | 3. Grouping related tests for session reuse |
| 46 | |
| 47 | **Approach**: Create a converter script/tool to semi-automate this: |
| 48 | ```python |
| 49 | # Example: Convert from |
| 50 | compare_posix_output "echo simple" "echo hello" |
| 51 | |
| 52 | # To YAML: |
| 53 | - name: "echo simple" |
| 54 | steps: |
| 55 | - send_line: "echo hello" |
| 56 | expect_output: "hello" |
| 57 | ``` |
| 58 | |
| 59 | **Priority Categories**: |
| 60 | - [ ] Edge case builtins (set, shift, eval, return, dot, break/continue) |
| 61 | - [ ] Parameter expansion edge cases (nested, complex patterns) |
| 62 | - [ ] Quoting and escaping edge cases |
| 63 | - [ ] Redirection edge cases (here-doc, read/write mode <>, etc.) |
| 64 | - [ ] Function scope and recursion |
| 65 | - [ ] Special parameter edge cases ($@, $*, $#, etc.) |
| 66 | |
| 67 | ### Phase 2: Interactive-Specific Features (Target: +150 tests) |
| 68 | |
| 69 | Features unique to interactive mode that need deeper testing: |
| 70 | |
| 71 | #### 2.1 Advanced Line Editing (Target: +50 tests) |
| 72 | - [ ] Undo/redo operations (if supported) |
| 73 | - [ ] Macro recording/playback |
| 74 | - [ ] Multiple cursor positions |
| 75 | - [ ] Copy/paste with system clipboard |
| 76 | - [ ] Mouse support (if applicable) |
| 77 | - [ ] Unicode edge cases (emoji, RTL text, combining characters) |
| 78 | - [ ] Very long lines (> 1000 chars) |
| 79 | - [ ] Line editing with terminal resize |
| 80 | - [ ] Incremental search (Ctrl+R) edge cases |
| 81 | - [ ] Argument history (Alt+., M-C-y) |
| 82 | |
| 83 | #### 2.2 Advanced History (Target: +40 tests) |
| 84 | - [ ] History file operations (load, save, corruption handling) |
| 85 | - [ ] History size limits and rotation |
| 86 | - [ ] History ignoring patterns (HISTIGNORE) |
| 87 | - [ ] History timestamps |
| 88 | - [ ] Multi-line command history |
| 89 | - [ ] History sharing between sessions |
| 90 | - [ ] History search edge cases (empty pattern, special chars) |
| 91 | - [ ] History expansion with quoting |
| 92 | - [ ] History substitution modifiers (:p, :s, :g, etc.) |
| 93 | |
| 94 | #### 2.3 Advanced Completion (Target: +40 tests) |
| 95 | - [ ] Programmable completion scripts |
| 96 | - [ ] Custom completion functions |
| 97 | - [ ] Completion with special characters in filenames |
| 98 | - [ ] Completion case-insensitivity options |
| 99 | - [ ] Completion menu navigation edge cases |
| 100 | - [ ] Completion with very long lists (> 1000 items) |
| 101 | - [ ] Completion timeout handling |
| 102 | - [ ] Context-sensitive completion (git, make, custom commands) |
| 103 | |
| 104 | #### 2.4 Job Control Edge Cases (Target: +20 tests) |
| 105 | - [ ] Multiple job spec formats (%%, %+, %-, %n, %string, %?string) |
| 106 | - [ ] Job notification timing |
| 107 | - [ ] Job control with pipelines |
| 108 | - [ ] disown edge cases |
| 109 | - [ ] wait edge cases (wait %n, wait -n, wait with no jobs) |
| 110 | - [ ] Job control state after shell builtin failures |
| 111 | - [ ] Job control with subshells |
| 112 | - [ ] SIGCHLD handling |
| 113 | |
| 114 | ### Phase 3: Cross-Feature Interaction Tests (Target: +100 tests) |
| 115 | |
| 116 | Test combinations that may reveal bugs: |
| 117 | |
| 118 | #### 3.1 Editing + History |
| 119 | - [ ] Edit command from history, then recall it again |
| 120 | - [ ] Ctrl+R while editing a command |
| 121 | - [ ] History expansion while using line editing |
| 122 | - [ ] Multi-line command editing and recall |
| 123 | |
| 124 | #### 3.2 Completion + Variables/Functions |
| 125 | - [ ] Complete with variable containing spaces |
| 126 | - [ ] Complete inside parameter expansion ${VAR[TAB]} |
| 127 | - [ ] Complete function names after definition |
| 128 | - [ ] Complete with exported vs local variables |
| 129 | |
| 130 | #### 3.3 Job Control + Signals |
| 131 | - [ ] Ctrl+C during completion |
| 132 | - [ ] Ctrl+Z during history search |
| 133 | - [ ] Signal handling while editing |
| 134 | - [ ] Background job completion notification during editing |
| 135 | |
| 136 | #### 3.4 Prompt + Escape Sequences |
| 137 | - [ ] Prompt with command substitution that fails |
| 138 | - [ ] Prompt with very long expansion |
| 139 | - [ ] Prompt during terminal resize |
| 140 | - [ ] Prompt with non-printing characters |
| 141 | - [ ] PS2 in various multi-line contexts |
| 142 | |
| 143 | ### Phase 4: Error Handling & Edge Cases (Target: +80 tests) |
| 144 | |
| 145 | #### 4.1 Input Edge Cases |
| 146 | - [ ] Binary input (Ctrl+@, Ctrl+A-Z all combinations) |
| 147 | - [ ] Invalid UTF-8 sequences |
| 148 | - [ ] Terminal escape sequences as input |
| 149 | - [ ] Very fast input (paste simulation) |
| 150 | - [ ] Input buffer overflow scenarios |
| 151 | |
| 152 | #### 4.2 Output Edge Cases |
| 153 | - [ ] Output > terminal buffer size |
| 154 | - [ ] Output with mixed control characters |
| 155 | - [ ] Output to full/broken pipe |
| 156 | - [ ] Output during terminal disconnect |
| 157 | |
| 158 | #### 4.3 Resource Limits |
| 159 | - [ ] Maximum history size reached |
| 160 | - [ ] File descriptor exhaustion |
| 161 | - [ ] Memory limits (very long command line) |
| 162 | - [ ] Process limits (max jobs) |
| 163 | |
| 164 | #### 4.4 Error Recovery |
| 165 | - [ ] Recovery from read error |
| 166 | - [ ] Recovery from terminal configuration errors |
| 167 | - [ ] Behavior when HOME undefined |
| 168 | - [ ] Behavior when PATH undefined/empty |
| 169 | |
| 170 | ### Phase 5: Stress & Performance Tests (Target: +50 tests) |
| 171 | |
| 172 | - [ ] Rapid command execution (typing speed test) |
| 173 | - [ ] Large history file loading |
| 174 | - [ ] Completion with huge directory (1000+ files) |
| 175 | - [ ] Very deep directory structures |
| 176 | - [ ] Long-running command interruption patterns |
| 177 | - [ ] Memory leak detection (long session) |
| 178 | - [ ] Session with 1000+ commands |
| 179 | |
| 180 | ## Implementation Approach |
| 181 | |
| 182 | ### Option 1: Manual YAML Creation |
| 183 | - Pros: Full control, can optimize for interactive mode |
| 184 | - Cons: Tedious, error-prone, time-consuming |
| 185 | - Estimate: ~40 hours for 500 tests |
| 186 | |
| 187 | ### Option 2: Semi-Automated Conversion Tool |
| 188 | Create Python script to convert non-interactive tests: |
| 189 | |
| 190 | ```python |
| 191 | #!/usr/bin/env python3 |
| 192 | """Convert non-interactive POSIX tests to interactive YAML format.""" |
| 193 | |
| 194 | import re |
| 195 | import yaml |
| 196 | |
| 197 | def parse_compare_posix_test(line): |
| 198 | """Parse: compare_posix_output "test name" "command" """ |
| 199 | match = re.match(r'compare_posix_output "([^"]+)" "([^"]+)"', line) |
| 200 | if match: |
| 201 | name, command = match.groups() |
| 202 | # Estimate expected output |
| 203 | return { |
| 204 | 'name': name, |
| 205 | 'steps': [{'send_line': command}], |
| 206 | 'expect_output': estimate_output(command), |
| 207 | 'match_type': 'contains' |
| 208 | } |
| 209 | return None |
| 210 | ``` |
| 211 | |
| 212 | - Pros: Fast initial creation, consistency |
| 213 | - Cons: May need manual adjustment, output estimation tricky |
| 214 | - Estimate: ~5 hours to write tool + ~10 hours to review/adjust |
| 215 | |
| 216 | ### Option 3: Hybrid Approach (RECOMMENDED) |
| 217 | 1. Use converter for straightforward tests (commands, variables, etc.) |
| 218 | 2. Manually create complex interactive tests |
| 219 | 3. Use test generation for repetitive patterns |
| 220 | |
| 221 | Estimate: ~15 hours total |
| 222 | |
| 223 | ## Prioritization |
| 224 | |
| 225 | ### High Priority (Do First) |
| 226 | 1. **Builtin edge cases** from posix_compliance_gaps.sh |
| 227 | - These are well-defined and easy to convert |
| 228 | - High value for POSIX compliance |
| 229 | |
| 230 | 2. **Parameter expansion edge cases** |
| 231 | - Already have some coverage, extend it |
| 232 | - Clear expected outputs |
| 233 | |
| 234 | 3. **Job control improvements** |
| 235 | - Currently weakest area (52.5% pass rate) |
| 236 | - Critical for interactive shell |
| 237 | |
| 238 | ### Medium Priority |
| 239 | 4. **Advanced line editing features** |
| 240 | 5. **History expansion edge cases** |
| 241 | 6. **Cross-feature interaction tests** |
| 242 | |
| 243 | ### Lower Priority (Nice to Have) |
| 244 | 7. **Stress tests** |
| 245 | 8. **Performance benchmarks** |
| 246 | 9. **Resource limit tests** |
| 247 | |
| 248 | ## Success Metrics |
| 249 | |
| 250 | - **Target coverage**: 650+ interactive tests (matching non-interactive count) |
| 251 | - **Target pass rate**: Maintain or improve 72%+ overall |
| 252 | - **Target time**: All tests complete in < 10 minutes |
| 253 | - **Coverage**: Each POSIX shell feature has ≥3 tests (basic, edge case, error) |
| 254 | |
| 255 | ## Next Steps |
| 256 | |
| 257 | 1. **Create converter tool** for simple test translation |
| 258 | 2. **Port 50 builtin edge case tests** from gaps.sh |
| 259 | 3. **Add 30 job control edge case tests** |
| 260 | 4. **Review and document patterns** for future expansion |
| 261 | 5. **Update test framework** if needed for new test types |
| 262 | |
| 263 | ## Questions to Resolve |
| 264 | |
| 265 | 1. Should we maintain 1:1 parity with non-interactive tests, or create interactive-specific variants? |
| 266 | 2. How do we handle tests that are inherently non-interactive (e.g., exit code of entire script)? |
| 267 | 3. Should we have separate "fast" and "comprehensive" test suites? |
| 268 | 4. How do we test readline features that may not be present (compile-time options)? |