Commits

5b04ac617113dd7e35685cb77678ae5694680981
Switch branches/tags
All users
All time
December 2025
Su Mo Tu We Th Fr Sa
30 1 2 3 4 5 6
7 8 9 10 11 12 13
14 15 16 17 18 19 20
21 22 23 24 25 26 27
28 29 30 31 1 2 3
4 5 6 7 8 9 10

Commits on December 20, 2025

  1. mfwolffe committed
  2. Enable whitespace pattern test
    The "edge: spaces" test case now passes after fixing whitespace
    pattern handling throughout the codebase.
    mfwolffe committed
  3. Fix alternation pattern parsing with null terminators
    Add null terminators when storing parsed alternatives in the
    Aho-Corasick optimization path. Without this, alternatives like
    "hello" stored in fixed-length arrays would have incorrect lengths.
    mfwolffe committed
  4. Update matcher to use pattern_len for -w/-x options
    Use explicit pattern lengths when applying word boundary (-w) and
    line regexp (-x) transformations. This prevents null terminators
    from being included in the transformed patterns.
    mfwolffe committed
  5. Update regex modules to use pattern_len
    Use pattern_len() instead of len_trim() across regex modules to properly
    handle whitespace patterns. Update Makefile dependencies to ensure
    ferp_kinds is compiled before regex modules that use pattern_len.
    mfwolffe committed
  6. Preserve trailing whitespace in output
    Remove trim() calls when printing matched lines to preserve trailing
    whitespace. This fixes cases where whitespace patterns would not
    display correctly in output.
    mfwolffe committed
  7. Store patterns with null terminators for exact length tracking
    Append null terminators when storing patterns to preserve their exact
    length. This enables correct handling of whitespace-only patterns that
    would otherwise be space-padded in Fortran's fixed-length strings.
    mfwolffe committed
  8. Add pattern_len() function for null-terminated pattern handling
    Fortran's len_trim() returns 0 for whitespace-only strings, which breaks
    patterns like "  " (two spaces). This function respects null terminators
    to track exact pattern length, falling back to full string length when
    no terminator is present.
    mfwolffe committed
  9. mfwolffe committed
  10. mfwolffe committed
  11. mfwolffe committed
  12. mfwolffe committed
  13. mfwolffe committed
  14. mfwolffe committed
  15. mfwolffe committed
  16. mfwolffe committed
  17. Add character equivalence classes for DFA compilation speedup
    Optimization #13: Groups characters with identical NFA behavior into
    equivalence classes, reducing DFA compilation from O(256 * states) to
    O(classes * states).
    
    Implementation:
    - compute_equiv_classes: builds signatures based on NFA transitions
    - Uses FNV-1a hashing to identify characters with same behavior
    - Alphabetic chars get unique signatures for case-folding support
    - DFA compilation computes transitions per class, fills 256-entry table
    
    Performance: Case-insensitive matching 7.6x faster than grep
    espadonne committed

Commits on December 19, 2025

  1. Add bitwise char class and pre-computed epsilon closures (4x speedup)
    Two major optimizations:
    
    1. Bitwise character class representation:
       - Replace 256-element boolean array with 4×64-bit integers
       - O(1) bit test via btest() instead of array lookup
       - [a-z]+ now 2x faster than grep (was 0.5x slower)
       - [a-zA-Z0-9]+ now 1.4x faster (was 0.37x slower)
    
    2. Pre-computed epsilon closures:
       - Pre-compute epsilon closure for every NFA state
       - Merge closures via bitwise OR instead of graph traversal
       - Optional quantifier colou?r now 1.1x faster (was 0.24x)
    
    Overall: ferp wins 38/39 benchmarks, average speedup 4.0x vs grep
    
    Also adds benchmark.sh for automated performance testing.
    espadonne committed
  2. Add DFA state minimization using Hopcroft's algorithm
    - Partition refinement to identify and merge equivalent states
    - O(n log n) algorithm complexity (n = number of states)
    - Reduces DFA size for patterns with redundant states
    - Better cache utilization from smaller transition tables
    - Complex character classes now 2.6x faster than grep
    
    The minimization runs after DFA construction and before use,
    transparently improving patterns like [a-zA-Z0-9]+ that
    previously created many equivalent states.
    espadonne committed
  3. Add line-oriented batch processing (2-3x faster)
    - Process 256 lines per batch instead of one at a time
    - Zero-copy line extraction from mmap'd memory via line_info_t
    - Batch matching reduces function call overhead
    - Improved cache locality for line data
    - Auto-detects when batch mode can be used (no context/-v/-o/-L/-z)
    - Falls back to line-by-line for complex output modes
    
    Benchmarks (50MB file, 800K lines):
    - BRE patterns: 2.1x faster than grep
    - Fixed strings: 3.1x faster than grep
    - ERE patterns: 2.5x faster than grep
    espadonne committed

Commits on December 18, 2025

  1. Add case-insensitive DFA matching (5-6x faster than grep)
    Build DFA with case-folded transitions so 'A' and 'a' go to the
    same state. This enables O(n) DFA matching for -i flag instead
    of falling back to slower NFA simulation.
    espadonne committed
  2. Add SIMD-accelerated newline scanning (ARM NEON)
    Use ARM NEON SIMD to scan for newlines 16 bytes at a time instead
    of byte-by-byte. Combined with mmap and Boyer-Moore, fixed string
    search is now 2x faster than grep on ARM64.
    espadonne committed
  3. Add full DFA compilation for O(n) regex matching
    Implement subset construction to compile NFA to DFA for patterns
    without anchors. DFA gives O(n) matching vs NFA's O(nm). On some
    patterns, now 1.8x faster than grep.
    espadonne committed
  4. Add lazy DFA caching for 1.5x regex speedup
    - Cache (state_set_hash, char, case_flag) -> next_states transitions
    - Add FNV-1a hash function for state sets
    - 256-entry direct-mapped cache avoids recomputing transitions
    - BRE: 5.67s -> 3.76s (1.5x faster)
    - ERE: 5.51s -> 3.67s (1.5x faster)
    - Now ~3x slower than grep (was ~5x)
    espadonne committed
  5. Integrate NFA optimizer for 2x regex speedup
    - Use optimized_search instead of nfa_search in regex_api
    - Store optimized_nfa_t alongside raw NFA in regex_t
    - Update compiled_patterns_t to intent(inout) for DFA cache
    - Fix empty pattern optimization (was missing optimize_nfa call)
    
    Benchmarks on 145MB file (2M lines):
    - BRE regex: 11.8s -> 5.67s (2.1x faster)
    - ERE regex: 11.8s -> 5.51s (2.1x faster)
    - Fixed string: 0.24s (unchanged, already optimized)
    espadonne committed
  6. Add NFA optimizer module with bit vector state sets
    - Implement state_set_t with O(1) bit vector operations
    - Extract literal prefixes for Boyer-Moore position skipping
    - Add anchored pattern fast path (^ only tests position 1)
    - Pre-compute start state epsilon closure
    - Add DFA cache infrastructure for future lazy DFA
    espadonne committed
  7. Add memory-mapped I/O and Boyer-Moore search for 36x speedup
    Performance improvements:
    - Add ferp_mmap module for memory-mapped file I/O via POSIX mmap
    - Add ferp_search module with Boyer-Moore-Horspool string search
    - Use mmap for all file reads (falls back to standard I/O for stdin)
    - Use Boyer-Moore for fixed string matching (-F mode)
    - Compile patterns for all modes (including fixed strings)
    
    Results on 134MB file (2M lines):
    - Fixed string (-F): 8.2s → 0.24s (36x faster, matches GNU grep)
    - BRE regex: 55.5s → 11.1s (5x faster, NFA still bottleneck)
    - PCRE (-P): ~0.3s (very fast via libpcre2)
    espadonne committed
  8. Fix directory traversal limits for large directory trees
    - Increase directory queue (MAX_DEPTH) from 100 to 10000
    - Increase file collection buffer from 10K to 100K files
    - Fixes missing files when a single directory contains >100 subdirectories
    espadonne committed