Hasty Briefsbeta

Bilingual

Symbolic derivatives and the rust rewrite of RE

a day ago
  • #rust
  • #regex
  • #performance
  • RE# has been rewritten in Rust, making it open source and available as a crate on crates.io.
  • The Rust version uses symbolic derivatives instead of Brzozowski derivatives, improving efficiency and allowing for more complex regex patterns.
  • Performance benchmarks show RE# outperforming other regex engines in certain scenarios, especially with case-insensitive matching and lookarounds.
  • Symbolic derivatives allow the engine to compute transitions for all characters at once, improving state machine construction.
  • The Rust version supports unrestricted lookaheads, enabling more flexible regex patterns compared to the .NET version.
  • Character sets are represented as 256-bit vectors, enabling fast bitwise operations for set operations.
  • Skip acceleration optimizes performance by skipping over input bytes that don't affect the state transition.
  • Future work includes adding SIMD prefilters, bidirectional scanning, and other optimizations to further improve performance.