Symbolic derivatives and the rust rewrite of RE
a day ago
- #rust
- #regex
- #performance
- RE# has been rewritten in Rust, making it open source and available as a crate on crates.io.
- The Rust version uses symbolic derivatives instead of Brzozowski derivatives, improving efficiency and allowing for more complex regex patterns.
- Performance benchmarks show RE# outperforming other regex engines in certain scenarios, especially with case-insensitive matching and lookarounds.
- Symbolic derivatives allow the engine to compute transitions for all characters at once, improving state machine construction.
- The Rust version supports unrestricted lookaheads, enabling more flexible regex patterns compared to the .NET version.
- Character sets are represented as 256-bit vectors, enabling fast bitwise operations for set operations.
- Skip acceleration optimizes performance by skipping over input bytes that don't affect the state transition.
- Future work includes adding SIMD prefilters, bidirectional scanning, and other optimizations to further improve performance.