markdown · 9304 bytes Raw Blame History

Interactive Test Expansion Plan

Current State

  • Interactive tests: 321 tests (72.6% pass rate)
  • Non-interactive tests: ~656 test assertions across POSIX compliance suite
  • Gap: Interactive tests lack edge case coverage and comprehensive builtin testing

Analysis of Non-Interactive Test Coverage

Categories in Non-Interactive Tests

  1. posix_compliance_test.sh (101 tests) - Basic POSIX features

    • Commands, variables, parameter expansion, command substitution
    • Arithmetic, redirections, pipelines, conditionals, loops
    • Functions, special parameters, builtins
  2. posix_compliance_extended.sh (119 tests) - Extended features

    • Advanced parameter expansion, globbing patterns
    • Complex redirections, subshells, job control
    • Advanced arithmetic, error handling
  3. posix_compliance_advanced.sh (119 tests) - Advanced features

    • Complex quoting, nested structures
    • Advanced control flow, signal handling
    • Performance edge cases
  4. posix_compliance_gaps.sh (180 tests) - Edge cases

    • Here-document tab stripping, complex IFS splitting
    • Function recursion, readonly/unset interactions
    • Builtin edge cases (set, shift, eval, return, etc.)
    • Alias, getopts, umask, hash, type, times, trap
    • Empty/whitespace edge cases
  5. posix_compliance_coverage.sh (100 tests) - Coverage gaps

    • Untested combinations, boundary conditions
  6. posix_compliance_untested.sh (37 tests) - Known gaps

Expansion Strategy

Phase 1: Port Non-Interactive Tests (Target: +300 tests)

Many non-interactive tests can be adapted to interactive mode by:

  1. Sending commands instead of using -c flag
  2. Waiting for output instead of capturing stdout
  3. Grouping related tests for session reuse

Approach: Create a converter script/tool to semi-automate this:

# Example: Convert from
compare_posix_output "echo simple" "echo hello"

# To YAML:
- name: "echo simple"
  steps:
    - send_line: "echo hello"
  expect_output: "hello"

Priority Categories:

  • Edge case builtins (set, shift, eval, return, dot, break/continue)
  • Parameter expansion edge cases (nested, complex patterns)
  • Quoting and escaping edge cases
  • Redirection edge cases (here-doc, read/write mode <>, etc.)
  • Function scope and recursion
  • Special parameter edge cases ($@, $*, $#, etc.)

Phase 2: Interactive-Specific Features (Target: +150 tests)

Features unique to interactive mode that need deeper testing:

2.1 Advanced Line Editing (Target: +50 tests)

  • Undo/redo operations (if supported)
  • Macro recording/playback
  • Multiple cursor positions
  • Copy/paste with system clipboard
  • Mouse support (if applicable)
  • Unicode edge cases (emoji, RTL text, combining characters)
  • Very long lines (> 1000 chars)
  • Line editing with terminal resize
  • Incremental search (Ctrl+R) edge cases
  • Argument history (Alt+., M-C-y)

2.2 Advanced History (Target: +40 tests)

  • History file operations (load, save, corruption handling)
  • History size limits and rotation
  • History ignoring patterns (HISTIGNORE)
  • History timestamps
  • Multi-line command history
  • History sharing between sessions
  • History search edge cases (empty pattern, special chars)
  • History expansion with quoting
  • History substitution modifiers (:p, :s, :g, etc.)

2.3 Advanced Completion (Target: +40 tests)

  • Programmable completion scripts
  • Custom completion functions
  • Completion with special characters in filenames
  • Completion case-insensitivity options
  • Completion menu navigation edge cases
  • Completion with very long lists (> 1000 items)
  • Completion timeout handling
  • Context-sensitive completion (git, make, custom commands)

2.4 Job Control Edge Cases (Target: +20 tests)

  • Multiple job spec formats (%%, %+, %-, %n, %string, %?string)
  • Job notification timing
  • Job control with pipelines
  • disown edge cases
  • wait edge cases (wait %n, wait -n, wait with no jobs)
  • Job control state after shell builtin failures
  • Job control with subshells
  • SIGCHLD handling

Phase 3: Cross-Feature Interaction Tests (Target: +100 tests)

Test combinations that may reveal bugs:

3.1 Editing + History

  • Edit command from history, then recall it again
  • Ctrl+R while editing a command
  • History expansion while using line editing
  • Multi-line command editing and recall

3.2 Completion + Variables/Functions

  • Complete with variable containing spaces
  • Complete inside parameter expansion ${VAR[TAB]}
  • Complete function names after definition
  • Complete with exported vs local variables

3.3 Job Control + Signals

  • Ctrl+C during completion
  • Ctrl+Z during history search
  • Signal handling while editing
  • Background job completion notification during editing

3.4 Prompt + Escape Sequences

  • Prompt with command substitution that fails
  • Prompt with very long expansion
  • Prompt during terminal resize
  • Prompt with non-printing characters
  • PS2 in various multi-line contexts

Phase 4: Error Handling & Edge Cases (Target: +80 tests)

4.1 Input Edge Cases

  • Binary input (Ctrl+@, Ctrl+A-Z all combinations)
  • Invalid UTF-8 sequences
  • Terminal escape sequences as input
  • Very fast input (paste simulation)
  • Input buffer overflow scenarios

4.2 Output Edge Cases

  • Output > terminal buffer size
  • Output with mixed control characters
  • Output to full/broken pipe
  • Output during terminal disconnect

4.3 Resource Limits

  • Maximum history size reached
  • File descriptor exhaustion
  • Memory limits (very long command line)
  • Process limits (max jobs)

4.4 Error Recovery

  • Recovery from read error
  • Recovery from terminal configuration errors
  • Behavior when HOME undefined
  • Behavior when PATH undefined/empty

Phase 5: Stress & Performance Tests (Target: +50 tests)

  • Rapid command execution (typing speed test)
  • Large history file loading
  • Completion with huge directory (1000+ files)
  • Very deep directory structures
  • Long-running command interruption patterns
  • Memory leak detection (long session)
  • Session with 1000+ commands

Implementation Approach

Option 1: Manual YAML Creation

  • Pros: Full control, can optimize for interactive mode
  • Cons: Tedious, error-prone, time-consuming
  • Estimate: ~40 hours for 500 tests

Option 2: Semi-Automated Conversion Tool

Create Python script to convert non-interactive tests:

#!/usr/bin/env python3
"""Convert non-interactive POSIX tests to interactive YAML format."""

import re
import yaml

def parse_compare_posix_test(line):
    """Parse: compare_posix_output "test name" "command" """
    match = re.match(r'compare_posix_output "([^"]+)" "([^"]+)"', line)
    if match:
        name, command = match.groups()
        # Estimate expected output
        return {
            'name': name,
            'steps': [{'send_line': command}],
            'expect_output': estimate_output(command),
            'match_type': 'contains'
        }
    return None
  • Pros: Fast initial creation, consistency
  • Cons: May need manual adjustment, output estimation tricky
  • Estimate: ~5 hours to write tool + ~10 hours to review/adjust
  1. Use converter for straightforward tests (commands, variables, etc.)
  2. Manually create complex interactive tests
  3. Use test generation for repetitive patterns

Estimate: ~15 hours total

Prioritization

High Priority (Do First)

  1. Builtin edge cases from posix_compliance_gaps.sh

    • These are well-defined and easy to convert
    • High value for POSIX compliance
  2. Parameter expansion edge cases

    • Already have some coverage, extend it
    • Clear expected outputs
  3. Job control improvements

    • Currently weakest area (52.5% pass rate)
    • Critical for interactive shell

Medium Priority

  1. Advanced line editing features
  2. History expansion edge cases
  3. Cross-feature interaction tests

Lower Priority (Nice to Have)

  1. Stress tests
  2. Performance benchmarks
  3. Resource limit tests

Success Metrics

  • Target coverage: 650+ interactive tests (matching non-interactive count)
  • Target pass rate: Maintain or improve 72%+ overall
  • Target time: All tests complete in < 10 minutes
  • Coverage: Each POSIX shell feature has ≥3 tests (basic, edge case, error)

Next Steps

  1. Create converter tool for simple test translation
  2. Port 50 builtin edge case tests from gaps.sh
  3. Add 30 job control edge case tests
  4. Review and document patterns for future expansion
  5. Update test framework if needed for new test types

Questions to Resolve

  1. Should we maintain 1:1 parity with non-interactive tests, or create interactive-specific variants?
  2. How do we handle tests that are inherently non-interactive (e.g., exit code of entire script)?
  3. Should we have separate "fast" and "comprehensive" test suites?
  4. How do we test readline features that may not be present (compile-time options)?
View source
1 # Interactive Test Expansion Plan
2
3 ## Current State
4 - **Interactive tests**: 321 tests (72.6% pass rate)
5 - **Non-interactive tests**: ~656 test assertions across POSIX compliance suite
6 - **Gap**: Interactive tests lack edge case coverage and comprehensive builtin testing
7
8 ## Analysis of Non-Interactive Test Coverage
9
10 ### Categories in Non-Interactive Tests
11 1. **posix_compliance_test.sh** (101 tests) - Basic POSIX features
12 - Commands, variables, parameter expansion, command substitution
13 - Arithmetic, redirections, pipelines, conditionals, loops
14 - Functions, special parameters, builtins
15
16 2. **posix_compliance_extended.sh** (119 tests) - Extended features
17 - Advanced parameter expansion, globbing patterns
18 - Complex redirections, subshells, job control
19 - Advanced arithmetic, error handling
20
21 3. **posix_compliance_advanced.sh** (119 tests) - Advanced features
22 - Complex quoting, nested structures
23 - Advanced control flow, signal handling
24 - Performance edge cases
25
26 4. **posix_compliance_gaps.sh** (180 tests) - Edge cases
27 - Here-document tab stripping, complex IFS splitting
28 - Function recursion, readonly/unset interactions
29 - Builtin edge cases (set, shift, eval, return, etc.)
30 - Alias, getopts, umask, hash, type, times, trap
31 - Empty/whitespace edge cases
32
33 5. **posix_compliance_coverage.sh** (100 tests) - Coverage gaps
34 - Untested combinations, boundary conditions
35
36 6. **posix_compliance_untested.sh** (37 tests) - Known gaps
37
38 ## Expansion Strategy
39
40 ### Phase 1: Port Non-Interactive Tests (Target: +300 tests)
41
42 Many non-interactive tests can be adapted to interactive mode by:
43 1. Sending commands instead of using `-c` flag
44 2. Waiting for output instead of capturing stdout
45 3. Grouping related tests for session reuse
46
47 **Approach**: Create a converter script/tool to semi-automate this:
48 ```python
49 # Example: Convert from
50 compare_posix_output "echo simple" "echo hello"
51
52 # To YAML:
53 - name: "echo simple"
54 steps:
55 - send_line: "echo hello"
56 expect_output: "hello"
57 ```
58
59 **Priority Categories**:
60 - [ ] Edge case builtins (set, shift, eval, return, dot, break/continue)
61 - [ ] Parameter expansion edge cases (nested, complex patterns)
62 - [ ] Quoting and escaping edge cases
63 - [ ] Redirection edge cases (here-doc, read/write mode <>, etc.)
64 - [ ] Function scope and recursion
65 - [ ] Special parameter edge cases ($@, $*, $#, etc.)
66
67 ### Phase 2: Interactive-Specific Features (Target: +150 tests)
68
69 Features unique to interactive mode that need deeper testing:
70
71 #### 2.1 Advanced Line Editing (Target: +50 tests)
72 - [ ] Undo/redo operations (if supported)
73 - [ ] Macro recording/playback
74 - [ ] Multiple cursor positions
75 - [ ] Copy/paste with system clipboard
76 - [ ] Mouse support (if applicable)
77 - [ ] Unicode edge cases (emoji, RTL text, combining characters)
78 - [ ] Very long lines (> 1000 chars)
79 - [ ] Line editing with terminal resize
80 - [ ] Incremental search (Ctrl+R) edge cases
81 - [ ] Argument history (Alt+., M-C-y)
82
83 #### 2.2 Advanced History (Target: +40 tests)
84 - [ ] History file operations (load, save, corruption handling)
85 - [ ] History size limits and rotation
86 - [ ] History ignoring patterns (HISTIGNORE)
87 - [ ] History timestamps
88 - [ ] Multi-line command history
89 - [ ] History sharing between sessions
90 - [ ] History search edge cases (empty pattern, special chars)
91 - [ ] History expansion with quoting
92 - [ ] History substitution modifiers (:p, :s, :g, etc.)
93
94 #### 2.3 Advanced Completion (Target: +40 tests)
95 - [ ] Programmable completion scripts
96 - [ ] Custom completion functions
97 - [ ] Completion with special characters in filenames
98 - [ ] Completion case-insensitivity options
99 - [ ] Completion menu navigation edge cases
100 - [ ] Completion with very long lists (> 1000 items)
101 - [ ] Completion timeout handling
102 - [ ] Context-sensitive completion (git, make, custom commands)
103
104 #### 2.4 Job Control Edge Cases (Target: +20 tests)
105 - [ ] Multiple job spec formats (%%, %+, %-, %n, %string, %?string)
106 - [ ] Job notification timing
107 - [ ] Job control with pipelines
108 - [ ] disown edge cases
109 - [ ] wait edge cases (wait %n, wait -n, wait with no jobs)
110 - [ ] Job control state after shell builtin failures
111 - [ ] Job control with subshells
112 - [ ] SIGCHLD handling
113
114 ### Phase 3: Cross-Feature Interaction Tests (Target: +100 tests)
115
116 Test combinations that may reveal bugs:
117
118 #### 3.1 Editing + History
119 - [ ] Edit command from history, then recall it again
120 - [ ] Ctrl+R while editing a command
121 - [ ] History expansion while using line editing
122 - [ ] Multi-line command editing and recall
123
124 #### 3.2 Completion + Variables/Functions
125 - [ ] Complete with variable containing spaces
126 - [ ] Complete inside parameter expansion ${VAR[TAB]}
127 - [ ] Complete function names after definition
128 - [ ] Complete with exported vs local variables
129
130 #### 3.3 Job Control + Signals
131 - [ ] Ctrl+C during completion
132 - [ ] Ctrl+Z during history search
133 - [ ] Signal handling while editing
134 - [ ] Background job completion notification during editing
135
136 #### 3.4 Prompt + Escape Sequences
137 - [ ] Prompt with command substitution that fails
138 - [ ] Prompt with very long expansion
139 - [ ] Prompt during terminal resize
140 - [ ] Prompt with non-printing characters
141 - [ ] PS2 in various multi-line contexts
142
143 ### Phase 4: Error Handling & Edge Cases (Target: +80 tests)
144
145 #### 4.1 Input Edge Cases
146 - [ ] Binary input (Ctrl+@, Ctrl+A-Z all combinations)
147 - [ ] Invalid UTF-8 sequences
148 - [ ] Terminal escape sequences as input
149 - [ ] Very fast input (paste simulation)
150 - [ ] Input buffer overflow scenarios
151
152 #### 4.2 Output Edge Cases
153 - [ ] Output > terminal buffer size
154 - [ ] Output with mixed control characters
155 - [ ] Output to full/broken pipe
156 - [ ] Output during terminal disconnect
157
158 #### 4.3 Resource Limits
159 - [ ] Maximum history size reached
160 - [ ] File descriptor exhaustion
161 - [ ] Memory limits (very long command line)
162 - [ ] Process limits (max jobs)
163
164 #### 4.4 Error Recovery
165 - [ ] Recovery from read error
166 - [ ] Recovery from terminal configuration errors
167 - [ ] Behavior when HOME undefined
168 - [ ] Behavior when PATH undefined/empty
169
170 ### Phase 5: Stress & Performance Tests (Target: +50 tests)
171
172 - [ ] Rapid command execution (typing speed test)
173 - [ ] Large history file loading
174 - [ ] Completion with huge directory (1000+ files)
175 - [ ] Very deep directory structures
176 - [ ] Long-running command interruption patterns
177 - [ ] Memory leak detection (long session)
178 - [ ] Session with 1000+ commands
179
180 ## Implementation Approach
181
182 ### Option 1: Manual YAML Creation
183 - Pros: Full control, can optimize for interactive mode
184 - Cons: Tedious, error-prone, time-consuming
185 - Estimate: ~40 hours for 500 tests
186
187 ### Option 2: Semi-Automated Conversion Tool
188 Create Python script to convert non-interactive tests:
189
190 ```python
191 #!/usr/bin/env python3
192 """Convert non-interactive POSIX tests to interactive YAML format."""
193
194 import re
195 import yaml
196
197 def parse_compare_posix_test(line):
198 """Parse: compare_posix_output "test name" "command" """
199 match = re.match(r'compare_posix_output "([^"]+)" "([^"]+)"', line)
200 if match:
201 name, command = match.groups()
202 # Estimate expected output
203 return {
204 'name': name,
205 'steps': [{'send_line': command}],
206 'expect_output': estimate_output(command),
207 'match_type': 'contains'
208 }
209 return None
210 ```
211
212 - Pros: Fast initial creation, consistency
213 - Cons: May need manual adjustment, output estimation tricky
214 - Estimate: ~5 hours to write tool + ~10 hours to review/adjust
215
216 ### Option 3: Hybrid Approach (RECOMMENDED)
217 1. Use converter for straightforward tests (commands, variables, etc.)
218 2. Manually create complex interactive tests
219 3. Use test generation for repetitive patterns
220
221 Estimate: ~15 hours total
222
223 ## Prioritization
224
225 ### High Priority (Do First)
226 1. **Builtin edge cases** from posix_compliance_gaps.sh
227 - These are well-defined and easy to convert
228 - High value for POSIX compliance
229
230 2. **Parameter expansion edge cases**
231 - Already have some coverage, extend it
232 - Clear expected outputs
233
234 3. **Job control improvements**
235 - Currently weakest area (52.5% pass rate)
236 - Critical for interactive shell
237
238 ### Medium Priority
239 4. **Advanced line editing features**
240 5. **History expansion edge cases**
241 6. **Cross-feature interaction tests**
242
243 ### Lower Priority (Nice to Have)
244 7. **Stress tests**
245 8. **Performance benchmarks**
246 9. **Resource limit tests**
247
248 ## Success Metrics
249
250 - **Target coverage**: 650+ interactive tests (matching non-interactive count)
251 - **Target pass rate**: Maintain or improve 72%+ overall
252 - **Target time**: All tests complete in < 10 minutes
253 - **Coverage**: Each POSIX shell feature has ≥3 tests (basic, edge case, error)
254
255 ## Next Steps
256
257 1. **Create converter tool** for simple test translation
258 2. **Port 50 builtin edge case tests** from gaps.sh
259 3. **Add 30 job control edge case tests**
260 4. **Review and document patterns** for future expansion
261 5. **Update test framework** if needed for new test types
262
263 ## Questions to Resolve
264
265 1. Should we maintain 1:1 parity with non-interactive tests, or create interactive-specific variants?
266 2. How do we handle tests that are inherently non-interactive (e.g., exit code of entire script)?
267 3. Should we have separate "fast" and "comprehensive" test suites?
268 4. How do we test readline features that may not be present (compile-time options)?