fortsh Public

Watch 0 Fork 0 Star 0

markdown · 9304 bytes Raw Blame History

Interactive Test Expansion Plan

Current State

Interactive tests: 321 tests (72.6% pass rate)
Non-interactive tests: ~656 test assertions across POSIX compliance suite
Gap: Interactive tests lack edge case coverage and comprehensive builtin testing

Analysis of Non-Interactive Test Coverage

Categories in Non-Interactive Tests

posix_compliance_test.sh (101 tests) - Basic POSIX features
- Commands, variables, parameter expansion, command substitution
- Arithmetic, redirections, pipelines, conditionals, loops
- Functions, special parameters, builtins
posix_compliance_extended.sh (119 tests) - Extended features
- Advanced parameter expansion, globbing patterns
- Complex redirections, subshells, job control
- Advanced arithmetic, error handling
posix_compliance_advanced.sh (119 tests) - Advanced features
- Complex quoting, nested structures
- Advanced control flow, signal handling
- Performance edge cases
posix_compliance_gaps.sh (180 tests) - Edge cases
- Here-document tab stripping, complex IFS splitting
- Function recursion, readonly/unset interactions
- Builtin edge cases (set, shift, eval, return, etc.)
- Alias, getopts, umask, hash, type, times, trap
- Empty/whitespace edge cases
posix_compliance_coverage.sh (100 tests) - Coverage gaps
- Untested combinations, boundary conditions
posix_compliance_untested.sh (37 tests) - Known gaps

Expansion Strategy

Phase 1: Port Non-Interactive Tests (Target: +300 tests)

Many non-interactive tests can be adapted to interactive mode by:

Sending commands instead of using -c flag
Waiting for output instead of capturing stdout
Grouping related tests for session reuse

Approach: Create a converter script/tool to semi-automate this:

# Example: Convert from
compare_posix_output "echo simple" "echo hello"

# To YAML:
- name: "echo simple"
  steps:
    - send_line: "echo hello"
  expect_output: "hello"

Priority Categories:

Edge case builtins (set, shift, eval, return, dot, break/continue)
Parameter expansion edge cases (nested, complex patterns)
Quoting and escaping edge cases
Redirection edge cases (here-doc, read/write mode <>, etc.)
Function scope and recursion
Special parameter edge cases ($@, $*, $#, etc.)

Phase 2: Interactive-Specific Features (Target: +150 tests)

Features unique to interactive mode that need deeper testing:

2.1 Advanced Line Editing (Target: +50 tests)

Undo/redo operations (if supported)
Macro recording/playback
Multiple cursor positions
Copy/paste with system clipboard
Mouse support (if applicable)
Unicode edge cases (emoji, RTL text, combining characters)
Very long lines (> 1000 chars)
Line editing with terminal resize
Incremental search (Ctrl+R) edge cases
Argument history (Alt+., M-C-y)

2.2 Advanced History (Target: +40 tests)

History file operations (load, save, corruption handling)
History size limits and rotation
History ignoring patterns (HISTIGNORE)
History timestamps
Multi-line command history
History sharing between sessions
History search edge cases (empty pattern, special chars)
History expansion with quoting
History substitution modifiers (:p, :s, :g, etc.)

2.3 Advanced Completion (Target: +40 tests)

Programmable completion scripts
Custom completion functions
Completion with special characters in filenames
Completion case-insensitivity options
Completion menu navigation edge cases
Completion with very long lists (> 1000 items)
Completion timeout handling
Context-sensitive completion (git, make, custom commands)

2.4 Job Control Edge Cases (Target: +20 tests)

Multiple job spec formats (%%, %+, %-, %n, %string, %?string)
Job notification timing
Job control with pipelines
disown edge cases
wait edge cases (wait %n, wait -n, wait with no jobs)
Job control state after shell builtin failures
Job control with subshells
SIGCHLD handling

Phase 3: Cross-Feature Interaction Tests (Target: +100 tests)

Test combinations that may reveal bugs:

3.1 Editing + History

Edit command from history, then recall it again
Ctrl+R while editing a command
History expansion while using line editing
Multi-line command editing and recall

3.2 Completion + Variables/Functions

Complete with variable containing spaces
Complete inside parameter expansion ${VAR[TAB]}
Complete function names after definition
Complete with exported vs local variables

3.3 Job Control + Signals

Ctrl+C during completion
Ctrl+Z during history search
Signal handling while editing
Background job completion notification during editing

3.4 Prompt + Escape Sequences

Prompt with command substitution that fails
Prompt with very long expansion
Prompt during terminal resize
Prompt with non-printing characters
PS2 in various multi-line contexts

Phase 4: Error Handling & Edge Cases (Target: +80 tests)

4.1 Input Edge Cases

Binary input (Ctrl+@, Ctrl+A-Z all combinations)
Invalid UTF-8 sequences
Terminal escape sequences as input
Very fast input (paste simulation)
Input buffer overflow scenarios

4.2 Output Edge Cases

Output > terminal buffer size
Output with mixed control characters
Output to full/broken pipe
Output during terminal disconnect

4.3 Resource Limits

Maximum history size reached
File descriptor exhaustion
Memory limits (very long command line)
Process limits (max jobs)

4.4 Error Recovery

Recovery from read error
Recovery from terminal configuration errors
Behavior when HOME undefined
Behavior when PATH undefined/empty

Phase 5: Stress & Performance Tests (Target: +50 tests)

Rapid command execution (typing speed test)
Large history file loading
Completion with huge directory (1000+ files)
Very deep directory structures
Long-running command interruption patterns
Memory leak detection (long session)
Session with 1000+ commands

Implementation Approach

Option 1: Manual YAML Creation

Pros: Full control, can optimize for interactive mode
Cons: Tedious, error-prone, time-consuming
Estimate: ~40 hours for 500 tests

Option 2: Semi-Automated Conversion Tool

Create Python script to convert non-interactive tests:

#!/usr/bin/env python3
"""Convert non-interactive POSIX tests to interactive YAML format."""

import re
import yaml

def parse_compare_posix_test(line):
    """Parse: compare_posix_output "test name" "command" """
    match = re.match(r'compare_posix_output "([^"]+)" "([^"]+)"', line)
    if match:
        name, command = match.groups()
        # Estimate expected output
        return {
            'name': name,
            'steps': [{'send_line': command}],
            'expect_output': estimate_output(command),
            'match_type': 'contains'
        }
    return None

Pros: Fast initial creation, consistency
Cons: May need manual adjustment, output estimation tricky
Estimate: ~5 hours to write tool + ~10 hours to review/adjust

Option 3: Hybrid Approach (RECOMMENDED)

Use converter for straightforward tests (commands, variables, etc.)
Manually create complex interactive tests
Use test generation for repetitive patterns

Estimate: ~15 hours total

Prioritization

High Priority (Do First)

Builtin edge cases from posix_compliance_gaps.sh
- These are well-defined and easy to convert
- High value for POSIX compliance
Parameter expansion edge cases
- Already have some coverage, extend it
- Clear expected outputs
Job control improvements
- Currently weakest area (52.5% pass rate)
- Critical for interactive shell

Medium Priority

Advanced line editing features
History expansion edge cases
Cross-feature interaction tests

Lower Priority (Nice to Have)

Stress tests
Performance benchmarks
Resource limit tests

Success Metrics

Target coverage: 650+ interactive tests (matching non-interactive count)
Target pass rate: Maintain or improve 72%+ overall
Target time: All tests complete in < 10 minutes
Coverage: Each POSIX shell feature has ≥3 tests (basic, edge case, error)

Next Steps

Create converter tool for simple test translation
Port 50 builtin edge case tests from gaps.sh
Add 30 job control edge case tests
Review and document patterns for future expansion
Update test framework if needed for new test types

Questions to Resolve

Should we maintain 1:1 parity with non-interactive tests, or create interactive-specific variants?
How do we handle tests that are inherently non-interactive (e.g., exit code of entire script)?
Should we have separate "fast" and "comprehensive" test suites?
How do we test readline features that may not be present (compile-time options)?

View source

  
        1
        # Interactive Test Expansion Plan
      
        2
        
        3
        ## Current State
      
        4
        - **Interactive tests**: 321 tests (72.6% pass rate)
      
        5
        - **Non-interactive tests**: ~656 test assertions across POSIX compliance suite
      
        6
        - **Gap**: Interactive tests lack edge case coverage and comprehensive builtin testing
      
        7
        
        8
        ## Analysis of Non-Interactive Test Coverage
      
        9
        
        10
        ### Categories in Non-Interactive Tests
      
        11
        1. **posix_compliance_test.sh** (101 tests) - Basic POSIX features
      
        12
           - Commands, variables, parameter expansion, command substitution
      
        13
           - Arithmetic, redirections, pipelines, conditionals, loops
      
        14
           - Functions, special parameters, builtins
      
        15
        
        16
        2. **posix_compliance_extended.sh** (119 tests) - Extended features
      
        17
           - Advanced parameter expansion, globbing patterns
      
        18
           - Complex redirections, subshells, job control
      
        19
           - Advanced arithmetic, error handling
      
        20
        
        21
        3. **posix_compliance_advanced.sh** (119 tests) - Advanced features
      
        22
           - Complex quoting, nested structures
      
        23
           - Advanced control flow, signal handling
      
        24
           - Performance edge cases
      
        25
        
        26
        4. **posix_compliance_gaps.sh** (180 tests) - Edge cases
      
        27
           - Here-document tab stripping, complex IFS splitting
      
        28
           - Function recursion, readonly/unset interactions
      
        29
           - Builtin edge cases (set, shift, eval, return, etc.)
      
        30
           - Alias, getopts, umask, hash, type, times, trap
      
        31
           - Empty/whitespace edge cases
      
        32
        
        33
        5. **posix_compliance_coverage.sh** (100 tests) - Coverage gaps
      
        34
           - Untested combinations, boundary conditions
      
        35
        
        36
        6. **posix_compliance_untested.sh** (37 tests) - Known gaps
      
        37
        
        38
        ## Expansion Strategy
      
        39
        
        40
        ### Phase 1: Port Non-Interactive Tests (Target: +300 tests)
      
        41
        
        42
        Many non-interactive tests can be adapted to interactive mode by:
      
        43
        1. Sending commands instead of using `-c` flag
      
        44
        2. Waiting for output instead of capturing stdout
      
        45
        3. Grouping related tests for session reuse
      
        46
        
        47
        **Approach**: Create a converter script/tool to semi-automate this:
      
        48
        ```python
      
        49
        # Example: Convert from
      
        50
        compare_posix_output "echo simple" "echo hello"
      
        51
        
        52
        # To YAML:
      
        53
        - name: "echo simple"
      
        54
          steps:
      
        55
            - send_line: "echo hello"
      
        56
          expect_output: "hello"
      
        57
        ```
      
        58
        
        59
        **Priority Categories**:
      
        60
        - [ ] Edge case builtins (set, shift, eval, return, dot, break/continue)
      
        61
        - [ ] Parameter expansion edge cases (nested, complex patterns)
      
        62
        - [ ] Quoting and escaping edge cases
      
        63
        - [ ] Redirection edge cases (here-doc, read/write mode <>, etc.)
      
        64
        - [ ] Function scope and recursion
      
        65
        - [ ] Special parameter edge cases ($@, $*, $#, etc.)
      
        66
        
        67
        ### Phase 2: Interactive-Specific Features (Target: +150 tests)
      
        68
        
        69
        Features unique to interactive mode that need deeper testing:
      
        70
        
        71
        #### 2.1 Advanced Line Editing (Target: +50 tests)
      
        72
        - [ ] Undo/redo operations (if supported)
      
        73
        - [ ] Macro recording/playback
      
        74
        - [ ] Multiple cursor positions
      
        75
        - [ ] Copy/paste with system clipboard
      
        76
        - [ ] Mouse support (if applicable)
      
        77
        - [ ] Unicode edge cases (emoji, RTL text, combining characters)
      
        78
        - [ ] Very long lines (> 1000 chars)
      
        79
        - [ ] Line editing with terminal resize
      
        80
        - [ ] Incremental search (Ctrl+R) edge cases
      
        81
        - [ ] Argument history (Alt+., M-C-y)
      
        82
        
        83
        #### 2.2 Advanced History (Target: +40 tests)
      
        84
        - [ ] History file operations (load, save, corruption handling)
      
        85
        - [ ] History size limits and rotation
      
        86
        - [ ] History ignoring patterns (HISTIGNORE)
      
        87
        - [ ] History timestamps
      
        88
        - [ ] Multi-line command history
      
        89
        - [ ] History sharing between sessions
      
        90
        - [ ] History search edge cases (empty pattern, special chars)
      
        91
        - [ ] History expansion with quoting
      
        92
        - [ ] History substitution modifiers (:p, :s, :g, etc.)
      
        93
        
        94
        #### 2.3 Advanced Completion (Target: +40 tests)
      
        95
        - [ ] Programmable completion scripts
      
        96
        - [ ] Custom completion functions
      
        97
        - [ ] Completion with special characters in filenames
      
        98
        - [ ] Completion case-insensitivity options
      
        99
        - [ ] Completion menu navigation edge cases
      
        100
        - [ ] Completion with very long lists (> 1000 items)
      
        101
        - [ ] Completion timeout handling
      
        102
        - [ ] Context-sensitive completion (git, make, custom commands)
      
        103
        
        104
        #### 2.4 Job Control Edge Cases (Target: +20 tests)
      
        105
        - [ ] Multiple job spec formats (%%, %+, %-, %n, %string, %?string)
      
        106
        - [ ] Job notification timing
      
        107
        - [ ] Job control with pipelines
      
        108
        - [ ] disown edge cases
      
        109
        - [ ] wait edge cases (wait %n, wait -n, wait with no jobs)
      
        110
        - [ ] Job control state after shell builtin failures
      
        111
        - [ ] Job control with subshells
      
        112
        - [ ] SIGCHLD handling
      
        113
        
        114
        ### Phase 3: Cross-Feature Interaction Tests (Target: +100 tests)
      
        115
        
        116
        Test combinations that may reveal bugs:
      
        117
        
        118
        #### 3.1 Editing + History
      
        119
        - [ ] Edit command from history, then recall it again
      
        120
        - [ ] Ctrl+R while editing a command
      
        121
        - [ ] History expansion while using line editing
      
        122
        - [ ] Multi-line command editing and recall
      
        123
        
        124
        #### 3.2 Completion + Variables/Functions
      
        125
        - [ ] Complete with variable containing spaces
      
        126
        - [ ] Complete inside parameter expansion ${VAR[TAB]}
      
        127
        - [ ] Complete function names after definition
      
        128
        - [ ] Complete with exported vs local variables
      
        129
        
        130
        #### 3.3 Job Control + Signals
      
        131
        - [ ] Ctrl+C during completion
      
        132
        - [ ] Ctrl+Z during history search
      
        133
        - [ ] Signal handling while editing
      
        134
        - [ ] Background job completion notification during editing
      
        135
        
        136
        #### 3.4 Prompt + Escape Sequences
      
        137
        - [ ] Prompt with command substitution that fails
      
        138
        - [ ] Prompt with very long expansion
      
        139
        - [ ] Prompt during terminal resize
      
        140
        - [ ] Prompt with non-printing characters
      
        141
        - [ ] PS2 in various multi-line contexts
      
        142
        
        143
        ### Phase 4: Error Handling & Edge Cases (Target: +80 tests)
      
        144
        
        145
        #### 4.1 Input Edge Cases
      
        146
        - [ ] Binary input (Ctrl+@, Ctrl+A-Z all combinations)
      
        147
        - [ ] Invalid UTF-8 sequences
      
        148
        - [ ] Terminal escape sequences as input
      
        149
        - [ ] Very fast input (paste simulation)
      
        150
        - [ ] Input buffer overflow scenarios
      
        151
        
        152
        #### 4.2 Output Edge Cases
      
        153
        - [ ] Output > terminal buffer size
      
        154
        - [ ] Output with mixed control characters
      
        155
        - [ ] Output to full/broken pipe
      
        156
        - [ ] Output during terminal disconnect
      
        157
        
        158
        #### 4.3 Resource Limits
      
        159
        - [ ] Maximum history size reached
      
        160
        - [ ] File descriptor exhaustion
      
        161
        - [ ] Memory limits (very long command line)
      
        162
        - [ ] Process limits (max jobs)
      
        163
        
        164
        #### 4.4 Error Recovery
      
        165
        - [ ] Recovery from read error
      
        166
        - [ ] Recovery from terminal configuration errors
      
        167
        - [ ] Behavior when HOME undefined
      
        168
        - [ ] Behavior when PATH undefined/empty
      
        169
        
        170
        ### Phase 5: Stress & Performance Tests (Target: +50 tests)
      
        171
        
        172
        - [ ] Rapid command execution (typing speed test)
      
        173
        - [ ] Large history file loading
      
        174
        - [ ] Completion with huge directory (1000+ files)
      
        175
        - [ ] Very deep directory structures
      
        176
        - [ ] Long-running command interruption patterns
      
        177
        - [ ] Memory leak detection (long session)
      
        178
        - [ ] Session with 1000+ commands
      
        179
        
        180
        ## Implementation Approach
      
        181
        
        182
        ### Option 1: Manual YAML Creation
      
        183
        - Pros: Full control, can optimize for interactive mode
      
        184
        - Cons: Tedious, error-prone, time-consuming
      
        185
        - Estimate: ~40 hours for 500 tests
      
        186
        
        187
        ### Option 2: Semi-Automated Conversion Tool
      
        188
        Create Python script to convert non-interactive tests:
      
        189
        
        190
        ```python
      
        191
        #!/usr/bin/env python3
      
        192
        """Convert non-interactive POSIX tests to interactive YAML format."""
      
        193
        
        194
        import re
      
        195
        import yaml
      
        196
        
        197
        def parse_compare_posix_test(line):
      
        198
            """Parse: compare_posix_output "test name" "command" """
      
        199
            match = re.match(r'compare_posix_output "([^"]+)" "([^"]+)"', line)
      
        200
            if match:
      
        201
                name, command = match.groups()
      
        202
                # Estimate expected output
      
        203
                return {
      
        204
                    'name': name,
      
        205
                    'steps': [{'send_line': command}],
      
        206
                    'expect_output': estimate_output(command),
      
        207
                    'match_type': 'contains'
      
        208
                }
      
        209
            return None
      
        210
        ```
      
        211
        
        212
        - Pros: Fast initial creation, consistency
      
        213
        - Cons: May need manual adjustment, output estimation tricky
      
        214
        - Estimate: ~5 hours to write tool + ~10 hours to review/adjust
      
        215
        
        216
        ### Option 3: Hybrid Approach (RECOMMENDED)
      
        217
        1. Use converter for straightforward tests (commands, variables, etc.)
      
        218
        2. Manually create complex interactive tests
      
        219
        3. Use test generation for repetitive patterns
      
        220
        
        221
        Estimate: ~15 hours total
      
        222
        
        223
        ## Prioritization
      
        224
        
        225
        ### High Priority (Do First)
      
        226
        1. **Builtin edge cases** from posix_compliance_gaps.sh
      
        227
           - These are well-defined and easy to convert
      
        228
           - High value for POSIX compliance
      
        229
        
        230
        2. **Parameter expansion edge cases**
      
        231
           - Already have some coverage, extend it
      
        232
           - Clear expected outputs
      
        233
        
        234
        3. **Job control improvements**
      
        235
           - Currently weakest area (52.5% pass rate)
      
        236
           - Critical for interactive shell
      
        237
        
        238
        ### Medium Priority
      
        239
        4. **Advanced line editing features**
      
        240
        5. **History expansion edge cases**
      
        241
        6. **Cross-feature interaction tests**
      
        242
        
        243
        ### Lower Priority (Nice to Have)
      
        244
        7. **Stress tests**
      
        245
        8. **Performance benchmarks**
      
        246
        9. **Resource limit tests**
      
        247
        
        248
        ## Success Metrics
      
        249
        
        250
        - **Target coverage**: 650+ interactive tests (matching non-interactive count)
      
        251
        - **Target pass rate**: Maintain or improve 72%+ overall
      
        252
        - **Target time**: All tests complete in < 10 minutes
      
        253
        - **Coverage**: Each POSIX shell feature has ≥3 tests (basic, edge case, error)
      
        254
        
        255
        ## Next Steps
      
        256
        
        257
        1. **Create converter tool** for simple test translation
      
        258
        2. **Port 50 builtin edge case tests** from gaps.sh
      
        259
        3. **Add 30 job control edge case tests**
      
        260
        4. **Review and document patterns** for future expansion
      
        261
        5. **Update test framework** if needed for new test types
      
        262
        
        263
        ## Questions to Resolve
      
        264
        
        265
        1. Should we maintain 1:1 parity with non-interactive tests, or create interactive-specific variants?
      
        266
        2. How do we handle tests that are inherently non-interactive (e.g., exit code of entire script)?
      
        267
        3. Should we have separate "fast" and "comprehensive" test suites?
      
        268
        4. How do we test readline features that may not be present (compile-time options)?

1	# Interactive Test Expansion Plan
2
3	## Current State
4	- Interactive tests: 321 tests (72.6% pass rate)
5	- Non-interactive tests: ~656 test assertions across POSIX compliance suite
6	- Gap: Interactive tests lack edge case coverage and comprehensive builtin testing
7
8	## Analysis of Non-Interactive Test Coverage
9
10	### Categories in Non-Interactive Tests
11	1. posix_compliance_test.sh (101 tests) - Basic POSIX features
12	- Commands, variables, parameter expansion, command substitution
13	- Arithmetic, redirections, pipelines, conditionals, loops
14	- Functions, special parameters, builtins
15
16	2. posix_compliance_extended.sh (119 tests) - Extended features
17	- Advanced parameter expansion, globbing patterns
18	- Complex redirections, subshells, job control
19	- Advanced arithmetic, error handling
20
21	3. posix_compliance_advanced.sh (119 tests) - Advanced features
22	- Complex quoting, nested structures
23	- Advanced control flow, signal handling
24	- Performance edge cases
25
26	4. posix_compliance_gaps.sh (180 tests) - Edge cases
27	- Here-document tab stripping, complex IFS splitting
28	- Function recursion, readonly/unset interactions
29	- Builtin edge cases (set, shift, eval, return, etc.)
30	- Alias, getopts, umask, hash, type, times, trap
31	- Empty/whitespace edge cases
32
33	5. posix_compliance_coverage.sh (100 tests) - Coverage gaps
34	- Untested combinations, boundary conditions
35
36	6. posix_compliance_untested.sh (37 tests) - Known gaps
37
38	## Expansion Strategy
39
40	### Phase 1: Port Non-Interactive Tests (Target: +300 tests)
41
42	Many non-interactive tests can be adapted to interactive mode by:
43	1. Sending commands instead of using `-c` flag
44	2. Waiting for output instead of capturing stdout
45	3. Grouping related tests for session reuse
46
47	Approach: Create a converter script/tool to semi-automate this:
48	```python
49	# Example: Convert from
50	compare_posix_output "echo simple" "echo hello"
51
52	# To YAML:
53	- name: "echo simple"
54	steps:
55	- send_line: "echo hello"
56	expect_output: "hello"
57	```
58
59	Priority Categories:
60	- [ ] Edge case builtins (set, shift, eval, return, dot, break/continue)
61	- [ ] Parameter expansion edge cases (nested, complex patterns)
62	- [ ] Quoting and escaping edge cases
63	- [ ] Redirection edge cases (here-doc, read/write mode <>, etc.)
64	- [ ] Function scope and recursion
65	- [ ] Special parameter edge cases ($@, $*, $#, etc.)
66
67	### Phase 2: Interactive-Specific Features (Target: +150 tests)
68
69	Features unique to interactive mode that need deeper testing:
70
71	#### 2.1 Advanced Line Editing (Target: +50 tests)
72	- [ ] Undo/redo operations (if supported)
73	- [ ] Macro recording/playback
74	- [ ] Multiple cursor positions
75	- [ ] Copy/paste with system clipboard
76	- [ ] Mouse support (if applicable)
77	- [ ] Unicode edge cases (emoji, RTL text, combining characters)
78	- [ ] Very long lines (> 1000 chars)
79	- [ ] Line editing with terminal resize
80	- [ ] Incremental search (Ctrl+R) edge cases
81	- [ ] Argument history (Alt+., M-C-y)
82
83	#### 2.2 Advanced History (Target: +40 tests)
84	- [ ] History file operations (load, save, corruption handling)
85	- [ ] History size limits and rotation
86	- [ ] History ignoring patterns (HISTIGNORE)
87	- [ ] History timestamps
88	- [ ] Multi-line command history
89	- [ ] History sharing between sessions
90	- [ ] History search edge cases (empty pattern, special chars)
91	- [ ] History expansion with quoting
92	- [ ] History substitution modifiers (:p, :s, :g, etc.)
93
94	#### 2.3 Advanced Completion (Target: +40 tests)
95	- [ ] Programmable completion scripts
96	- [ ] Custom completion functions
97	- [ ] Completion with special characters in filenames
98	- [ ] Completion case-insensitivity options
99	- [ ] Completion menu navigation edge cases
100	- [ ] Completion with very long lists (> 1000 items)
101	- [ ] Completion timeout handling
102	- [ ] Context-sensitive completion (git, make, custom commands)
103
104	#### 2.4 Job Control Edge Cases (Target: +20 tests)
105	- [ ] Multiple job spec formats (%%, %+, %-, %n, %string, %?string)
106	- [ ] Job notification timing
107	- [ ] Job control with pipelines
108	- [ ] disown edge cases
109	- [ ] wait edge cases (wait %n, wait -n, wait with no jobs)
110	- [ ] Job control state after shell builtin failures
111	- [ ] Job control with subshells
112	- [ ] SIGCHLD handling
113
114	### Phase 3: Cross-Feature Interaction Tests (Target: +100 tests)
115
116	Test combinations that may reveal bugs:
117
118	#### 3.1 Editing + History
119	- [ ] Edit command from history, then recall it again
120	- [ ] Ctrl+R while editing a command
121	- [ ] History expansion while using line editing
122	- [ ] Multi-line command editing and recall
123
124	#### 3.2 Completion + Variables/Functions
125	- [ ] Complete with variable containing spaces
126	- [ ] Complete inside parameter expansion ${VAR[TAB]}
127	- [ ] Complete function names after definition
128	- [ ] Complete with exported vs local variables
129
130	#### 3.3 Job Control + Signals
131	- [ ] Ctrl+C during completion
132	- [ ] Ctrl+Z during history search
133	- [ ] Signal handling while editing
134	- [ ] Background job completion notification during editing
135
136	#### 3.4 Prompt + Escape Sequences
137	- [ ] Prompt with command substitution that fails
138	- [ ] Prompt with very long expansion
139	- [ ] Prompt during terminal resize
140	- [ ] Prompt with non-printing characters
141	- [ ] PS2 in various multi-line contexts
142
143	### Phase 4: Error Handling & Edge Cases (Target: +80 tests)
144
145	#### 4.1 Input Edge Cases
146	- [ ] Binary input (Ctrl+@, Ctrl+A-Z all combinations)
147	- [ ] Invalid UTF-8 sequences
148	- [ ] Terminal escape sequences as input
149	- [ ] Very fast input (paste simulation)
150	- [ ] Input buffer overflow scenarios
151
152	#### 4.2 Output Edge Cases
153	- [ ] Output > terminal buffer size
154	- [ ] Output with mixed control characters
155	- [ ] Output to full/broken pipe
156	- [ ] Output during terminal disconnect
157
158	#### 4.3 Resource Limits
159	- [ ] Maximum history size reached
160	- [ ] File descriptor exhaustion
161	- [ ] Memory limits (very long command line)
162	- [ ] Process limits (max jobs)
163
164	#### 4.4 Error Recovery
165	- [ ] Recovery from read error
166	- [ ] Recovery from terminal configuration errors
167	- [ ] Behavior when HOME undefined
168	- [ ] Behavior when PATH undefined/empty
169
170	### Phase 5: Stress & Performance Tests (Target: +50 tests)
171
172	- [ ] Rapid command execution (typing speed test)
173	- [ ] Large history file loading
174	- [ ] Completion with huge directory (1000+ files)
175	- [ ] Very deep directory structures
176	- [ ] Long-running command interruption patterns
177	- [ ] Memory leak detection (long session)
178	- [ ] Session with 1000+ commands
179
180	## Implementation Approach
181
182	### Option 1: Manual YAML Creation
183	- Pros: Full control, can optimize for interactive mode
184	- Cons: Tedious, error-prone, time-consuming
185	- Estimate: ~40 hours for 500 tests
186
187	### Option 2: Semi-Automated Conversion Tool
188	Create Python script to convert non-interactive tests:
189
190	```python
191	#!/usr/bin/env python3
192	"""Convert non-interactive POSIX tests to interactive YAML format."""
193
194	import re
195	import yaml
196
197	def parse_compare_posix_test(line):
198	"""Parse: compare_posix_output "test name" "command" """
199	match = re.match(r'compare_posix_output "([^"]+)" "([^"]+)"', line)
200	if match:
201	name, command = match.groups()
202	# Estimate expected output
203	return {
204	'name': name,
205	'steps': [{'send_line': command}],
206	'expect_output': estimate_output(command),
207	'match_type': 'contains'
208	}
209	return None
210	```
211
212	- Pros: Fast initial creation, consistency
213	- Cons: May need manual adjustment, output estimation tricky
214	- Estimate: ~5 hours to write tool + ~10 hours to review/adjust
215
216	### Option 3: Hybrid Approach (RECOMMENDED)
217	1. Use converter for straightforward tests (commands, variables, etc.)
218	2. Manually create complex interactive tests
219	3. Use test generation for repetitive patterns
220
221	Estimate: ~15 hours total
222
223	## Prioritization
224
225	### High Priority (Do First)
226	1. Builtin edge cases from posix_compliance_gaps.sh
227	- These are well-defined and easy to convert
228	- High value for POSIX compliance
229
230	2. Parameter expansion edge cases
231	- Already have some coverage, extend it
232	- Clear expected outputs
233
234	3. Job control improvements
235	- Currently weakest area (52.5% pass rate)
236	- Critical for interactive shell
237
238	### Medium Priority
239	4. Advanced line editing features
240	5. History expansion edge cases
241	6. Cross-feature interaction tests
242
243	### Lower Priority (Nice to Have)
244	7. Stress tests
245	8. Performance benchmarks
246	9. Resource limit tests
247
248	## Success Metrics
249
250	- Target coverage: 650+ interactive tests (matching non-interactive count)
251	- Target pass rate: Maintain or improve 72%+ overall
252	- Target time: All tests complete in < 10 minutes
253	- Coverage: Each POSIX shell feature has ≥3 tests (basic, edge case, error)
254
255	## Next Steps
256
257	1. Create converter tool for simple test translation
258	2. Port 50 builtin edge case tests from gaps.sh
259	3. Add 30 job control edge case tests
260	4. Review and document patterns for future expansion
261	5. Update test framework if needed for new test types
262
263	## Questions to Resolve
264
265	1. Should we maintain 1:1 parity with non-interactive tests, or create interactive-specific variants?
266	2. How do we handle tests that are inherently non-interactive (e.g., exit code of entire script)?
267	3. Should we have separate "fast" and "comprehensive" test suites?
268	4. How do we test readline features that may not be present (compile-time options)?