Adding lookbehinds to rust-lang/regex
10 months ago
- #rust
- #regex
- #lookbehind
- Implementation of unbounded captureless lookbehinds in Rust's regex engine.
- Lookbehinds allow regexes to make assertions about preceding text without including it in the match.
- Negative lookbehinds are also supported, asserting that something is not preceding.
- The regex engine is structured into 'regex-syntax' for parsing and 'regex-automata' for matching.
- The PikeVM engine was modified to support lookbehinds with new NFA states: 'WriteLookAround' and 'CheckLookAround'.
- Performance optimizations were implemented to avoid unnecessary scanning to the end of the haystack.
- Bounded lookbehind optimization improved performance by up to 150x in benchmarks.
- A backtracking engine with memoization was also extended to support lookbehinds.
- Benchmarks showed the implementation is 2-5x slower than Python's 're' but maintains linear time complexity.
- The work lays the foundation for future extensions like lookaheads and benefits the Rust ecosystem.