Commits

trunk
Switch branches/tags
All users
All time
December 2025
Su Mo Tu We Th Fr Sa
30 1 2 3 4 5 6
7 8 9 10 11 12 13
14 15 16 17 18 19 20
21 22 23 24 25 26 27
28 29 30 31 1 2 3
4 5 6 7 8 9 10

Commits on December 20, 2025

  1. Fix -F newline splitting and flag interactions
    - Split -F patterns on newlines (each line is an OR alternative)
    - Add count_subpatterns() and split_pattern_on_newlines() helpers
    - Use compiled%count for split pattern count in match_line
    - Fix -c with -l: only print filename when count > 0
    - Fix -o with -c: count mode takes priority over only-matching
    mfwolffe committed
  2. Implement backreference support and Latin-1 case folding
    Backreferences:
    - Add has_backrefs flag to detect patterns with \1-\9
    - Implement backtracking matcher for patterns with backreferences
    - Track captured group start/end positions during matching
    - Support case-insensitive backreference matching
    
    Case folding:
    - Extend to_lower_char() to handle Latin-1 characters (192-222)
    - Add UTF-8 continuation byte handling for Latin Extended-A
    - Enables case-insensitive matching for accented characters (é, ñ, etc.)
    mfwolffe committed
  3. Fix -f pattern file reading
    - Change from formatted to unformatted stream access for byte-by-byte reading
    - Fixes issue where only first pattern was read from pattern files
    - Properly handle newlines across different platforms
    mfwolffe committed
  4. Fix symlink handling for -r option
    - Add is_symlink() function using lstat to detect symbolic links
    - Skip symlinks in collect_files when follow_links is false (-r)
    - Symlinks are followed only with -R (dereference-recursive)
    mfwolffe committed
  5. Preserve CRLF line endings like grep
    - Remove CR stripping from mmap line reading
    - Remove CR stripping from regular file I/O
    - Grep preserves carriage returns in output; ferp now matches
    mfwolffe committed
  6. Fix error exit codes and directory handling
    - Add has_error flag to track errors during file processing
    - Return exit code 2 when errors occur (matching grep behavior)
    - Print error message for directories when -d read is used
    - Add has_error to OpenMP reduction clause for thread safety
    mfwolffe committed
  7. Fix -T tab alignment and line number width
    - Only print tab when there's a prefix (filename, line number, or byte offset)
    - Add line_number_width field to grep_options for dynamic width calculation
    - Pad line numbers based on file size for proper -T alignment
    mfwolffe committed
  8. mfwolffe committed
  9. Enable whitespace pattern test
    The "edge: spaces" test case now passes after fixing whitespace
    pattern handling throughout the codebase.
    mfwolffe committed
  10. Fix alternation pattern parsing with null terminators
    Add null terminators when storing parsed alternatives in the
    Aho-Corasick optimization path. Without this, alternatives like
    "hello" stored in fixed-length arrays would have incorrect lengths.
    mfwolffe committed
  11. Update matcher to use pattern_len for -w/-x options
    Use explicit pattern lengths when applying word boundary (-w) and
    line regexp (-x) transformations. This prevents null terminators
    from being included in the transformed patterns.
    mfwolffe committed
  12. Update regex modules to use pattern_len
    Use pattern_len() instead of len_trim() across regex modules to properly
    handle whitespace patterns. Update Makefile dependencies to ensure
    ferp_kinds is compiled before regex modules that use pattern_len.
    mfwolffe committed
  13. Preserve trailing whitespace in output
    Remove trim() calls when printing matched lines to preserve trailing
    whitespace. This fixes cases where whitespace patterns would not
    display correctly in output.
    mfwolffe committed
  14. Store patterns with null terminators for exact length tracking
    Append null terminators when storing patterns to preserve their exact
    length. This enables correct handling of whitespace-only patterns that
    would otherwise be space-padded in Fortran's fixed-length strings.
    mfwolffe committed
  15. Add pattern_len() function for null-terminated pattern handling
    Fortran's len_trim() returns 0 for whitespace-only strings, which breaks
    patterns like "  " (two spaces). This function respects null terminators
    to track exact pattern length, falling back to full string length when
    no terminator is present.
    mfwolffe committed
  16. mfwolffe committed
  17. mfwolffe committed
  18. mfwolffe committed
  19. mfwolffe committed
  20. mfwolffe committed
  21. mfwolffe committed
  22. mfwolffe committed
  23. mfwolffe committed
  24. Add character equivalence classes for DFA compilation speedup
    Optimization #13: Groups characters with identical NFA behavior into
    equivalence classes, reducing DFA compilation from O(256 * states) to
    O(classes * states).
    
    Implementation:
    - compute_equiv_classes: builds signatures based on NFA transitions
    - Uses FNV-1a hashing to identify characters with same behavior
    - Alphabetic chars get unique signatures for case-folding support
    - DFA compilation computes transitions per class, fills 256-entry table
    
    Performance: Case-insensitive matching 7.6x faster than grep
    espadonne committed

Commits on December 19, 2025

  1. Add bitwise char class and pre-computed epsilon closures (4x speedup)
    Two major optimizations:
    
    1. Bitwise character class representation:
       - Replace 256-element boolean array with 4×64-bit integers
       - O(1) bit test via btest() instead of array lookup
       - [a-z]+ now 2x faster than grep (was 0.5x slower)
       - [a-zA-Z0-9]+ now 1.4x faster (was 0.37x slower)
    
    2. Pre-computed epsilon closures:
       - Pre-compute epsilon closure for every NFA state
       - Merge closures via bitwise OR instead of graph traversal
       - Optional quantifier colou?r now 1.1x faster (was 0.24x)
    
    Overall: ferp wins 38/39 benchmarks, average speedup 4.0x vs grep
    
    Also adds benchmark.sh for automated performance testing.
    espadonne committed
  2. Add DFA state minimization using Hopcroft's algorithm
    - Partition refinement to identify and merge equivalent states
    - O(n log n) algorithm complexity (n = number of states)
    - Reduces DFA size for patterns with redundant states
    - Better cache utilization from smaller transition tables
    - Complex character classes now 2.6x faster than grep
    
    The minimization runs after DFA construction and before use,
    transparently improving patterns like [a-zA-Z0-9]+ that
    previously created many equivalent states.
    espadonne committed
  3. Add line-oriented batch processing (2-3x faster)
    - Process 256 lines per batch instead of one at a time
    - Zero-copy line extraction from mmap'd memory via line_info_t
    - Batch matching reduces function call overhead
    - Improved cache locality for line data
    - Auto-detects when batch mode can be used (no context/-v/-o/-L/-z)
    - Falls back to line-by-line for complex output modes
    
    Benchmarks (50MB file, 800K lines):
    - BRE patterns: 2.1x faster than grep
    - Fixed strings: 3.1x faster than grep
    - ERE patterns: 2.5x faster than grep
    espadonne committed

Commits on December 18, 2025

  1. Add case-insensitive DFA matching (5-6x faster than grep)
    Build DFA with case-folded transitions so 'A' and 'a' go to the
    same state. This enables O(n) DFA matching for -i flag instead
    of falling back to slower NFA simulation.
    espadonne committed