facsimile Public

Watch 0 Fork 0 Star 0

markdown · 7787 bytes Raw Blame History

UTF-8 Migration Progress

Goal: Make facsimile fully UTF-8 aware so box-drawing characters (├─│└) and other multi-byte UTF-8 sequences display and edit correctly.

Problem

Fortran's string operations work on bytes, not characters. A UTF-8 character like ├ is 3 bytes but should be treated as 1 character and displayed as 1 column.

Example:

"Hello" → 5 bytes, 5 chars, 5 display columns ✓ (works)
"├──" → 9 bytes, 3 chars, 3 display columns ✗ (broken before migration)

✅ Completed

1. Core UTF-8 Infrastructure

src/utils/utf8_module.f90 - COMPLETE
- ✅ utf8_char_count() - Count UTF-8 characters
- ✅ utf8_char_at() - Extract character at position
- ✅ utf8_char_to_byte_index() - Convert char pos → byte pos
- ✅ utf8_byte_to_char_index() - Convert byte pos → char pos
- ✅ utf8_display_width() - Calculate screen columns needed
- ✅ utf8_char_byte_length() - Get byte length of UTF-8 char
- ✅ Handles 1-4 byte UTF-8 sequences
- ✅ Handles wide characters (CJK = 2 columns)
- ✅ Handles combining characters (0 width)

2. Cursor Semantics

src/editor_state_module.f90 - COMPLETE
- ✅ Documented: cursor%column = UTF-8 character position (NOT byte index)
- ✅ Added detailed comments explaining the semantics
- ✅ Example: In "├──", column=2 refers to second ─ (byte 4)

3. Text Buffer UTF-8 Helpers

src/buffer/text_buffer_module.f90 - COMPLETE
- ✅ Added use utf8_module
- ✅ buffer_get_line_char_count() - Get character count of line
- ✅ buffer_char_at() - Get character at char position in line
- ✅ buffer_byte_to_char_col() - Convert byte col → char col
- ✅ buffer_char_to_byte_col() - Convert char col → byte col

4. Basic Cursor Movement

src/commands/command_handler_module.f90 - PARTIAL
- ✅ move_cursor_left() - Uses buffer_get_line_char_count()
- ✅ move_cursor_right() - Uses buffer_get_line_char_count()
- ✅ Both functions now work with character positions

5. Module Imports

src/terminal/renderer_module.f90 - PARTIAL
- ✅ Added use utf8_module
- ✅ Added buffer_get_line_char_count to imports

6. Renderer Display (HIGH PRIORITY)

src/terminal/renderer_module.f90 - COMPLETE
- ✅ render_line() - Uses UTF-8 character positions and display width
- ✅ Converts character positions to byte positions for slicing
- ✅ Uses utf8_display_width() for padding calculations
- ✅ Cursor screen positioning uses display width calculations
- ✅ Both active and inactive cursors positioned correctly

Impact: UTF-8 characters now display correctly!

📋 TODO (Remaining Work)

HIGH PRIORITY - Renderer Fixes

Files: src/terminal/renderer_module.f90

Specific locations that need fixing:

Line 83: len(line_content) → needs UTF-8 char count
Line 208: len(line) → needs UTF-8 char count
Line 219-220: Padding calculation needs display width
Line 245: len(line) → needs UTF-8 char count
Line 480, 487, 504, 517: Cursor screen position calculations
Line 570-573, 597-600: Viewport scrolling with character positions
Line 754: len(line_content) → needs UTF-8 char count
Line 959-960: Viewport range calculation
Line 1036, 1129, 1136, 1156, 1197: More cursor positioning

MEDIUM PRIORITY - Word Movement

Files: src/commands/command_handler_module.f90

Functions to update:

move_cursor_word_left() (line ~1105)
move_cursor_word_right() (line ~1176)
extend_selection_word_left() (line ~3447)
extend_selection_word_right() (line ~3521)
delete_word_backward() (line ~3680)
delete_word_forward() (line ~690)

Issue: Word boundaries detected by byte operations, breaks on UTF-8

MEDIUM PRIORITY - Editing Operations

Files: src/commands/command_handler_module.f90

Functions to update:

insert_char() - Insert at character position
delete_char() - Delete character (not byte)
delete_selection() - Use character positions
insert_newline() - Character position aware
All text manipulation that uses line(i:i) slicing

Issue: Inserting/deleting can break UTF-8 sequences

MEDIUM PRIORITY - Selection Operations

Files: src/commands/command_handler_module.f90

Functions to update:

extend_selection_left/right/up/down() - Character boundaries
select_word_at_cursor() - UTF-8 word boundaries
get_selected_text() - Extract text by character positions
Selection rendering in renderer_module

Issue: Selection ranges use byte positions, breaks UTF-8

LOWER PRIORITY - Search & Find

Files: src/prompts/*.f90, src/commands/command_handler_module.f90

Functions to update:

find_next_occurrence() - Search with UTF-8 awareness
select_next_match() - Match by characters
Search prompt operations

Issue: Pattern matching needs UTF-8 awareness

LOWER PRIORITY - Other Operations

Various files:

Smart home: Character-based indentation detection
Go to column: User enters character position
Transpose characters: Swap UTF-8 characters
Bracket matching: Find brackets in UTF-8 text
Line operations (move, duplicate): Should already work

Testing Strategy

Test Files

/tmp/test_unicode.txt - Box drawing characters
/tmp/ctrl_d_pagination_test.txt - For ctrl-d testing

Test Cases

Display: Open UTF-8 file, verify box chars show correctly
Cursor Movement: Arrow keys move by character (not byte)
Editing: Type at UTF-8 char boundaries
Selection: Select text containing UTF-8 chars
Search: Find UTF-8 characters with ctrl-d
Word Movement: Alt-left/right across UTF-8 words

Success Criteria

Box drawing characters (├─│└) display correctly
Cursor doesn't get "stuck" in middle of UTF-8 sequence
Typing doesn't corrupt UTF-8 sequences
Selections work across UTF-8 boundaries
File saves/loads preserve UTF-8 content

Notes

Design Decisions

Cursor column = character position (not byte position)
- More intuitive for users
- Matches behavior of other editors
Display width vs character count
- Most chars: 1 char = 1 column
- CJK chars: 1 char = 2 columns
- Combining: 1 char = 0 columns
Viewport in character positions
- Viewport uses character positions
- Converted to byte positions when rendering

Performance Considerations

UTF-8 operations have overhead vs byte operations
Caching line char counts could help
Most operations stay O(n) in line length

Edge Cases to Handle

Cursor at end of line (column = char_count + 1)
Empty lines (char_count = 0)
Files with invalid UTF-8 (treat as bytes)
Mixed width characters (CJK)
Combining characters

Current Build Status

✅ Builds successfully ✅ UTF-8 module complete and tested (10/10 tests passing) ✅ Basic cursor movement works (character-based, not byte-based) ✅ Display rendering works (box chars render correctly) ✅ Character insertion works at UTF-8 boundaries ⏳ Remaining: viewport, word movement, editing ops, selections

Test Results

Unit Tests

Created test/test_utf8_integration.f90 with 10 comprehensive tests:

✅ All 10 tests passing
Covers: char counting, byte↔char conversion, display width, buffer integration

Manual Testing

Tested with /tmp/test_utf8_simple.txt containing box-drawing chars (├──):

✅ Box characters display correctly in editor
✅ Cursor moves by CHARACTER positions (not bytes)
- Moving right through ├ (3 bytes) increments column by 1
- Moving right through ─ (3 bytes) increments column by 1
✅ Character insertion works at correct UTF-8 boundaries

Last updated: 2025-11-04

View source

  
        1
        # UTF-8 Migration Progress
      
        2
        
        3
        **Goal:** Make facsimile fully UTF-8 aware so box-drawing characters (├─│└) and other multi-byte UTF-8 sequences display and edit correctly.
      
        4
        
        5
        ## Problem
      
        6
        Fortran's string operations work on bytes, not characters. A UTF-8 character like `├` is 3 bytes but should be treated as 1 character and displayed as 1 column.
      
        7
        
        8
        **Example:**
      
        9
        - `"Hello"` → 5 bytes, 5 chars, 5 display columns ✓ (works)
      
        10
        - `"├──"` → 9 bytes, 3 chars, 3 display columns ✗ (broken before migration)
      
        11
        
        12
        ## ✅ Completed
      
        13
        
        14
        ### 1. Core UTF-8 Infrastructure
      
        15
        - **`src/utils/utf8_module.f90`** - COMPLETE
      
        16
          - ✅ `utf8_char_count()` - Count UTF-8 characters
      
        17
          - ✅ `utf8_char_at()` - Extract character at position
      
        18
          - ✅ `utf8_char_to_byte_index()` - Convert char pos → byte pos
      
        19
          - ✅ `utf8_byte_to_char_index()` - Convert byte pos → char pos
      
        20
          - ✅ `utf8_display_width()` - Calculate screen columns needed
      
        21
          - ✅ `utf8_char_byte_length()` - Get byte length of UTF-8 char
      
        22
          - ✅ Handles 1-4 byte UTF-8 sequences
      
        23
          - ✅ Handles wide characters (CJK = 2 columns)
      
        24
          - ✅ Handles combining characters (0 width)
      
        25
        
        26
        ### 2. Cursor Semantics
      
        27
        - **`src/editor_state_module.f90`** - COMPLETE
      
        28
          - ✅ Documented: `cursor%column` = UTF-8 character position (NOT byte index)
      
        29
          - ✅ Added detailed comments explaining the semantics
      
        30
          - ✅ Example: In `"├──"`, column=2 refers to second `─` (byte 4)
      
        31
        
        32
        ### 3. Text Buffer UTF-8 Helpers
      
        33
        - **`src/buffer/text_buffer_module.f90`** - COMPLETE
      
        34
          - ✅ Added `use utf8_module`
      
        35
          - ✅ `buffer_get_line_char_count()` - Get character count of line
      
        36
          - ✅ `buffer_char_at()` - Get character at char position in line
      
        37
          - ✅ `buffer_byte_to_char_col()` - Convert byte col → char col
      
        38
          - ✅ `buffer_char_to_byte_col()` - Convert char col → byte col
      
        39
        
        40
        ### 4. Basic Cursor Movement
      
        41
        - **`src/commands/command_handler_module.f90`** - PARTIAL
      
        42
          - ✅ `move_cursor_left()` - Uses `buffer_get_line_char_count()`
      
        43
          - ✅ `move_cursor_right()` - Uses `buffer_get_line_char_count()`
      
        44
          - ✅ Both functions now work with character positions
      
        45
        
        46
        ### 5. Module Imports
      
        47
        - **`src/terminal/renderer_module.f90`** - PARTIAL
      
        48
          - ✅ Added `use utf8_module`
      
        49
          - ✅ Added `buffer_get_line_char_count` to imports
      
        50
        
        51
        ### 6. Renderer Display (HIGH PRIORITY)
      
        52
        - **`src/terminal/renderer_module.f90`** - COMPLETE
      
        53
          - ✅ `render_line()` - Uses UTF-8 character positions and display width
      
        54
          - ✅ Converts character positions to byte positions for slicing
      
        55
          - ✅ Uses `utf8_display_width()` for padding calculations
      
        56
          - ✅ Cursor screen positioning uses display width calculations
      
        57
          - ✅ Both active and inactive cursors positioned correctly
      
        58
        
        59
        **Impact:** UTF-8 characters now display correctly!
      
        60
        
        61
        ## 📋 TODO (Remaining Work)
      
        62
        
        63
        ### HIGH PRIORITY - Renderer Fixes
      
        64
        Files: `src/terminal/renderer_module.f90`
      
        65
        
        66
        **Specific locations that need fixing:**
      
        67
        - Line 83: `len(line_content)` → needs UTF-8 char count
      
        68
        - Line 208: `len(line)` → needs UTF-8 char count
      
        69
        - Line 219-220: Padding calculation needs display width
      
        70
        - Line 245: `len(line)` → needs UTF-8 char count
      
        71
        - Line 480, 487, 504, 517: Cursor screen position calculations
      
        72
        - Line 570-573, 597-600: Viewport scrolling with character positions
      
        73
        - Line 754: `len(line_content)` → needs UTF-8 char count
      
        74
        - Line 959-960: Viewport range calculation
      
        75
        - Line 1036, 1129, 1136, 1156, 1197: More cursor positioning
      
        76
        
        77
        ### MEDIUM PRIORITY - Word Movement
      
        78
        Files: `src/commands/command_handler_module.f90`
      
        79
        
        80
        Functions to update:
      
        81
        - `move_cursor_word_left()` (line ~1105)
      
        82
        - `move_cursor_word_right()` (line ~1176)
      
        83
        - `extend_selection_word_left()` (line ~3447)
      
        84
        - `extend_selection_word_right()` (line ~3521)
      
        85
        - `delete_word_backward()` (line ~3680)
      
        86
        - `delete_word_forward()` (line ~690)
      
        87
        
        88
        **Issue:** Word boundaries detected by byte operations, breaks on UTF-8
      
        89
        
        90
        ### MEDIUM PRIORITY - Editing Operations
      
        91
        Files: `src/commands/command_handler_module.f90`
      
        92
        
        93
        Functions to update:
      
        94
        - `insert_char()` - Insert at character position
      
        95
        - `delete_char()` - Delete character (not byte)
      
        96
        - `delete_selection()` - Use character positions
      
        97
        - `insert_newline()` - Character position aware
      
        98
        - All text manipulation that uses `line(i:i)` slicing
      
        99
        
        100
        **Issue:** Inserting/deleting can break UTF-8 sequences
      
        101
        
        102
        ### MEDIUM PRIORITY - Selection Operations
      
        103
        Files: `src/commands/command_handler_module.f90`
      
        104
        
        105
        Functions to update:
      
        106
        - `extend_selection_left/right/up/down()` - Character boundaries
      
        107
        - `select_word_at_cursor()` - UTF-8 word boundaries
      
        108
        - `get_selected_text()` - Extract text by character positions
      
        109
        - Selection rendering in renderer_module
      
        110
        
        111
        **Issue:** Selection ranges use byte positions, breaks UTF-8
      
        112
        
        113
        ### LOWER PRIORITY - Search & Find
      
        114
        Files: `src/prompts/*.f90`, `src/commands/command_handler_module.f90`
      
        115
        
        116
        Functions to update:
      
        117
        - `find_next_occurrence()` - Search with UTF-8 awareness
      
        118
        - `select_next_match()` - Match by characters
      
        119
        - Search prompt operations
      
        120
        
        121
        **Issue:** Pattern matching needs UTF-8 awareness
      
        122
        
        123
        ### LOWER PRIORITY - Other Operations
      
        124
        Various files:
      
        125
        
        126
        - Smart home: Character-based indentation detection
      
        127
        - Go to column: User enters character position
      
        128
        - Transpose characters: Swap UTF-8 characters
      
        129
        - Bracket matching: Find brackets in UTF-8 text
      
        130
        - Line operations (move, duplicate): Should already work
      
        131
        
        132
        ## Testing Strategy
      
        133
        
        134
        ### Test Files
      
        135
        - `/tmp/test_unicode.txt` - Box drawing characters
      
        136
        - `/tmp/ctrl_d_pagination_test.txt` - For ctrl-d testing
      
        137
        
        138
        ### Test Cases
      
        139
        1. **Display:** Open UTF-8 file, verify box chars show correctly
      
        140
        2. **Cursor Movement:** Arrow keys move by character (not byte)
      
        141
        3. **Editing:** Type at UTF-8 char boundaries
      
        142
        4. **Selection:** Select text containing UTF-8 chars
      
        143
        5. **Search:** Find UTF-8 characters with ctrl-d
      
        144
        6. **Word Movement:** Alt-left/right across UTF-8 words
      
        145
        
        146
        ### Success Criteria
      
        147
        - Box drawing characters (├─│└) display correctly
      
        148
        - Cursor doesn't get "stuck" in middle of UTF-8 sequence
      
        149
        - Typing doesn't corrupt UTF-8 sequences
      
        150
        - Selections work across UTF-8 boundaries
      
        151
        - File saves/loads preserve UTF-8 content
      
        152
        
        153
        ## Notes
      
        154
        
        155
        ### Design Decisions
      
        156
        1. **Cursor column = character position** (not byte position)
      
        157
           - More intuitive for users
      
        158
           - Matches behavior of other editors
      
        159
        
        160
        2. **Display width vs character count**
      
        161
           - Most chars: 1 char = 1 column
      
        162
           - CJK chars: 1 char = 2 columns
      
        163
           - Combining: 1 char = 0 columns
      
        164
        
        165
        3. **Viewport in character positions**
      
        166
           - Viewport uses character positions
      
        167
           - Converted to byte positions when rendering
      
        168
        
        169
        ### Performance Considerations
      
        170
        - UTF-8 operations have overhead vs byte operations
      
        171
        - Caching line char counts could help
      
        172
        - Most operations stay O(n) in line length
      
        173
        
        174
        ### Edge Cases to Handle
      
        175
        - Cursor at end of line (column = char_count + 1)
      
        176
        - Empty lines (char_count = 0)
      
        177
        - Files with invalid UTF-8 (treat as bytes)
      
        178
        - Mixed width characters (CJK)
      
        179
        - Combining characters
      
        180
        
        181
        ## Current Build Status
      
        182
        ✅ Builds successfully
      
        183
        ✅ UTF-8 module complete and tested (10/10 tests passing)
      
        184
        ✅ Basic cursor movement works (character-based, not byte-based)
      
        185
        ✅ Display rendering works (box chars render correctly)
      
        186
        ✅ Character insertion works at UTF-8 boundaries
      
        187
        ⏳ Remaining: viewport, word movement, editing ops, selections
      
        188
        
        189
        ## Test Results
      
        190
        
        191
        ### Unit Tests
      
        192
        Created `test/test_utf8_integration.f90` with 10 comprehensive tests:
      
        193
        - ✅ All 10 tests passing
      
        194
        - Covers: char counting, byte↔char conversion, display width, buffer integration
      
        195
        
        196
        ### Manual Testing
      
        197
        Tested with `/tmp/test_utf8_simple.txt` containing box-drawing chars (├──):
      
        198
        - ✅ Box characters display correctly in editor
      
        199
        - ✅ Cursor moves by CHARACTER positions (not bytes)
      
        200
          - Moving right through `├` (3 bytes) increments column by 1
      
        201
          - Moving right through `─` (3 bytes) increments column by 1
      
        202
        - ✅ Character insertion works at correct UTF-8 boundaries
      
        203
        
        204
        Last updated: 2025-11-04

1	# UTF-8 Migration Progress
2
3	Goal: Make facsimile fully UTF-8 aware so box-drawing characters (├─│└) and other multi-byte UTF-8 sequences display and edit correctly.
4
5	## Problem
6	Fortran's string operations work on bytes, not characters. A UTF-8 character like `├` is 3 bytes but should be treated as 1 character and displayed as 1 column.
7
8	Example:
9	- `"Hello"` → 5 bytes, 5 chars, 5 display columns ✓ (works)
10	- `"├──"` → 9 bytes, 3 chars, 3 display columns ✗ (broken before migration)
11
12	## ✅ Completed
13
14	### 1. Core UTF-8 Infrastructure
15	- `src/utils/utf8_module.f90` - COMPLETE
16	- ✅ `utf8_char_count()` - Count UTF-8 characters
17	- ✅ `utf8_char_at()` - Extract character at position
18	- ✅ `utf8_char_to_byte_index()` - Convert char pos → byte pos
19	- ✅ `utf8_byte_to_char_index()` - Convert byte pos → char pos
20	- ✅ `utf8_display_width()` - Calculate screen columns needed
21	- ✅ `utf8_char_byte_length()` - Get byte length of UTF-8 char
22	- ✅ Handles 1-4 byte UTF-8 sequences
23	- ✅ Handles wide characters (CJK = 2 columns)
24	- ✅ Handles combining characters (0 width)
25
26	### 2. Cursor Semantics
27	- `src/editor_state_module.f90` - COMPLETE
28	- ✅ Documented: `cursor%column` = UTF-8 character position (NOT byte index)
29	- ✅ Added detailed comments explaining the semantics
30	- ✅ Example: In `"├──"`, column=2 refers to second `─` (byte 4)
31
32	### 3. Text Buffer UTF-8 Helpers
33	- `src/buffer/text_buffer_module.f90` - COMPLETE
34	- ✅ Added `use utf8_module`
35	- ✅ `buffer_get_line_char_count()` - Get character count of line
36	- ✅ `buffer_char_at()` - Get character at char position in line
37	- ✅ `buffer_byte_to_char_col()` - Convert byte col → char col
38	- ✅ `buffer_char_to_byte_col()` - Convert char col → byte col
39
40	### 4. Basic Cursor Movement
41	- `src/commands/command_handler_module.f90` - PARTIAL
42	- ✅ `move_cursor_left()` - Uses `buffer_get_line_char_count()`
43	- ✅ `move_cursor_right()` - Uses `buffer_get_line_char_count()`
44	- ✅ Both functions now work with character positions
45
46	### 5. Module Imports
47	- `src/terminal/renderer_module.f90` - PARTIAL
48	- ✅ Added `use utf8_module`
49	- ✅ Added `buffer_get_line_char_count` to imports
50
51	### 6. Renderer Display (HIGH PRIORITY)
52	- `src/terminal/renderer_module.f90` - COMPLETE
53	- ✅ `render_line()` - Uses UTF-8 character positions and display width
54	- ✅ Converts character positions to byte positions for slicing
55	- ✅ Uses `utf8_display_width()` for padding calculations
56	- ✅ Cursor screen positioning uses display width calculations
57	- ✅ Both active and inactive cursors positioned correctly
58
59	Impact: UTF-8 characters now display correctly!
60
61	## 📋 TODO (Remaining Work)
62
63	### HIGH PRIORITY - Renderer Fixes
64	Files: `src/terminal/renderer_module.f90`
65
66	Specific locations that need fixing:
67	- Line 83: `len(line_content)` → needs UTF-8 char count
68	- Line 208: `len(line)` → needs UTF-8 char count
69	- Line 219-220: Padding calculation needs display width
70	- Line 245: `len(line)` → needs UTF-8 char count
71	- Line 480, 487, 504, 517: Cursor screen position calculations
72	- Line 570-573, 597-600: Viewport scrolling with character positions
73	- Line 754: `len(line_content)` → needs UTF-8 char count
74	- Line 959-960: Viewport range calculation
75	- Line 1036, 1129, 1136, 1156, 1197: More cursor positioning
76
77	### MEDIUM PRIORITY - Word Movement
78	Files: `src/commands/command_handler_module.f90`
79
80	Functions to update:
81	- `move_cursor_word_left()` (line ~1105)
82	- `move_cursor_word_right()` (line ~1176)
83	- `extend_selection_word_left()` (line ~3447)
84	- `extend_selection_word_right()` (line ~3521)
85	- `delete_word_backward()` (line ~3680)
86	- `delete_word_forward()` (line ~690)
87
88	Issue: Word boundaries detected by byte operations, breaks on UTF-8
89
90	### MEDIUM PRIORITY - Editing Operations
91	Files: `src/commands/command_handler_module.f90`
92
93	Functions to update:
94	- `insert_char()` - Insert at character position
95	- `delete_char()` - Delete character (not byte)
96	- `delete_selection()` - Use character positions
97	- `insert_newline()` - Character position aware
98	- All text manipulation that uses `line(i:i)` slicing
99
100	Issue: Inserting/deleting can break UTF-8 sequences
101
102	### MEDIUM PRIORITY - Selection Operations
103	Files: `src/commands/command_handler_module.f90`
104
105	Functions to update:
106	- `extend_selection_left/right/up/down()` - Character boundaries
107	- `select_word_at_cursor()` - UTF-8 word boundaries
108	- `get_selected_text()` - Extract text by character positions
109	- Selection rendering in renderer_module
110
111	Issue: Selection ranges use byte positions, breaks UTF-8
112
113	### LOWER PRIORITY - Search & Find
114	Files: `src/prompts/*.f90`, `src/commands/command_handler_module.f90`
115
116	Functions to update:
117	- `find_next_occurrence()` - Search with UTF-8 awareness
118	- `select_next_match()` - Match by characters
119	- Search prompt operations
120
121	Issue: Pattern matching needs UTF-8 awareness
122
123	### LOWER PRIORITY - Other Operations
124	Various files:
125
126	- Smart home: Character-based indentation detection
127	- Go to column: User enters character position
128	- Transpose characters: Swap UTF-8 characters
129	- Bracket matching: Find brackets in UTF-8 text
130	- Line operations (move, duplicate): Should already work
131
132	## Testing Strategy
133
134	### Test Files
135	- `/tmp/test_unicode.txt` - Box drawing characters
136	- `/tmp/ctrl_d_pagination_test.txt` - For ctrl-d testing
137
138	### Test Cases
139	1. Display: Open UTF-8 file, verify box chars show correctly
140	2. Cursor Movement: Arrow keys move by character (not byte)
141	3. Editing: Type at UTF-8 char boundaries
142	4. Selection: Select text containing UTF-8 chars
143	5. Search: Find UTF-8 characters with ctrl-d
144	6. Word Movement: Alt-left/right across UTF-8 words
145
146	### Success Criteria
147	- Box drawing characters (├─│└) display correctly
148	- Cursor doesn't get "stuck" in middle of UTF-8 sequence
149	- Typing doesn't corrupt UTF-8 sequences
150	- Selections work across UTF-8 boundaries
151	- File saves/loads preserve UTF-8 content
152
153	## Notes
154
155	### Design Decisions
156	1. Cursor column = character position (not byte position)
157	- More intuitive for users
158	- Matches behavior of other editors
159
160	2. Display width vs character count
161	- Most chars: 1 char = 1 column
162	- CJK chars: 1 char = 2 columns
163	- Combining: 1 char = 0 columns
164
165	3. Viewport in character positions
166	- Viewport uses character positions
167	- Converted to byte positions when rendering
168
169	### Performance Considerations
170	- UTF-8 operations have overhead vs byte operations
171	- Caching line char counts could help
172	- Most operations stay O(n) in line length
173
174	### Edge Cases to Handle
175	- Cursor at end of line (column = char_count + 1)
176	- Empty lines (char_count = 0)
177	- Files with invalid UTF-8 (treat as bytes)
178	- Mixed width characters (CJK)
179	- Combining characters
180
181	## Current Build Status
182	✅ Builds successfully
183	✅ UTF-8 module complete and tested (10/10 tests passing)
184	✅ Basic cursor movement works (character-based, not byte-based)
185	✅ Display rendering works (box chars render correctly)
186	✅ Character insertion works at UTF-8 boundaries
187	⏳ Remaining: viewport, word movement, editing ops, selections
188
189	## Test Results
190
191	### Unit Tests
192	Created `test/test_utf8_integration.f90` with 10 comprehensive tests:
193	- ✅ All 10 tests passing
194	- Covers: char counting, byte↔char conversion, display width, buffer integration
195
196	### Manual Testing
197	Tested with `/tmp/test_utf8_simple.txt` containing box-drawing chars (├──):
198	- ✅ Box characters display correctly in editor
199	- ✅ Cursor moves by CHARACTER positions (not bytes)
200	- Moving right through `├` (3 bytes) increments column by 1
201	- Moving right through `─` (3 bytes) increments column by 1
202	- ✅ Character insertion works at correct UTF-8 boundaries
203
204	Last updated: 2025-11-04