Add null terminators when storing parsed alternatives in the
Aho-Corasick optimization path. Without this, alternatives like
"hello" stored in fixed-length arrays would have incorrect lengths.
Use explicit pattern lengths when applying word boundary (-w) and
line regexp (-x) transformations. This prevents null terminators
from being included in the transformed patterns.
Use pattern_len() instead of len_trim() across regex modules to properly
handle whitespace patterns. Update Makefile dependencies to ensure
ferp_kinds is compiled before regex modules that use pattern_len.
Remove trim() calls when printing matched lines to preserve trailing
whitespace. This fixes cases where whitespace patterns would not
display correctly in output.
Append null terminators when storing patterns to preserve their exact
length. This enables correct handling of whitespace-only patterns that
would otherwise be space-padded in Fortran's fixed-length strings.
Fortran's len_trim() returns 0 for whitespace-only strings, which breaks
patterns like " " (two spaces). This function respects null terminators
to track exact pattern length, falling back to full string length when
no terminator is present.
Optimization #13: Groups characters with identical NFA behavior into
equivalence classes, reducing DFA compilation from O(256 * states) to
O(classes * states).
Implementation:
- compute_equiv_classes: builds signatures based on NFA transitions
- Uses FNV-1a hashing to identify characters with same behavior
- Alphabetic chars get unique signatures for case-folding support
- DFA compilation computes transitions per class, fills 256-entry table
Performance: Case-insensitive matching 7.6x faster than grep
Two major optimizations:
1. Bitwise character class representation:
- Replace 256-element boolean array with 4×64-bit integers
- O(1) bit test via btest() instead of array lookup
- [a-z]+ now 2x faster than grep (was 0.5x slower)
- [a-zA-Z0-9]+ now 1.4x faster (was 0.37x slower)
2. Pre-computed epsilon closures:
- Pre-compute epsilon closure for every NFA state
- Merge closures via bitwise OR instead of graph traversal
- Optional quantifier colou?r now 1.1x faster (was 0.24x)
Overall: ferp wins 38/39 benchmarks, average speedup 4.0x vs grep
Also adds benchmark.sh for automated performance testing.
- Partition refinement to identify and merge equivalent states
- O(n log n) algorithm complexity (n = number of states)
- Reduces DFA size for patterns with redundant states
- Better cache utilization from smaller transition tables
- Complex character classes now 2.6x faster than grep
The minimization runs after DFA construction and before use,
transparently improving patterns like [a-zA-Z0-9]+ that
previously created many equivalent states.
- Process 256 lines per batch instead of one at a time
- Zero-copy line extraction from mmap'd memory via line_info_t
- Batch matching reduces function call overhead
- Improved cache locality for line data
- Auto-detects when batch mode can be used (no context/-v/-o/-L/-z)
- Falls back to line-by-line for complex output modes
Benchmarks (50MB file, 800K lines):
- BRE patterns: 2.1x faster than grep
- Fixed strings: 3.1x faster than grep
- ERE patterns: 2.5x faster than grep
Build DFA with case-folded transitions so 'A' and 'a' go to the
same state. This enables O(n) DFA matching for -i flag instead
of falling back to slower NFA simulation.
Use ARM NEON SIMD to scan for newlines 16 bytes at a time instead
of byte-by-byte. Combined with mmap and Boyer-Moore, fixed string
search is now 2x faster than grep on ARM64.
Implement subset construction to compile NFA to DFA for patterns
without anchors. DFA gives O(n) matching vs NFA's O(nm). On some
patterns, now 1.8x faster than grep.
- Implement state_set_t with O(1) bit vector operations
- Extract literal prefixes for Boyer-Moore position skipping
- Add anchored pattern fast path (^ only tests position 1)
- Pre-compute start state epsilon closure
- Add DFA cache infrastructure for future lazy DFA
Performance improvements:
- Add ferp_mmap module for memory-mapped file I/O via POSIX mmap
- Add ferp_search module with Boyer-Moore-Horspool string search
- Use mmap for all file reads (falls back to standard I/O for stdin)
- Use Boyer-Moore for fixed string matching (-F mode)
- Compile patterns for all modes (including fixed strings)
Results on 134MB file (2M lines):
- Fixed string (-F): 8.2s → 0.24s (36x faster, matches GNU grep)
- BRE regex: 55.5s → 11.1s (5x faster, NFA still bottleneck)
- PCRE (-P): ~0.3s (very fast via libpcre2)
- Increase directory queue (MAX_DEPTH) from 100 to 10000
- Increase file collection buffer from 10K to 100K files
- Fixes missing files when a single directory contains >100 subdirectories