RE#: how we built the fastest regex engine in F#
4 days ago
- #F#
- #regex-engine
- #performance
- RE# is a regex engine built in F# that outperforms other industrial regex engines, supporting full boolean operators and context-aware lookarounds with O(n) search-time complexity.
- The engine is based on Brzozowski derivatives, allowing for efficient handling of intersection (&) and complement (~) operators, which are not commonly supported in other regex engines.
- RE# uses minterm compression to optimize performance by partitioning the character space into equivalence classes, significantly reducing memory usage and improving speed.
- The engine employs a DFA matching loop with lazy DFA construction, skipping NFA entirely and building the DFA directly from the regex using derivatives.
- RE# supports lookarounds in the form (?<=R1)R2(?=R3), encoding context information directly in the state, enabling linear-time matching with small constants.
- The engine adheres to POSIX semantics, ensuring leftmost-longest matches, which are deterministic and avoid the pitfalls of backtracking engines.
- RE# is open-source and available as a NuGet package, with a web app for interactive exploration of regex patterns and their combinations.