RE#: how we built the fastest regex engine in F#

2 months ago

RE# is a regex engine built in F# that outperforms other industrial regex engines, supporting full boolean operators and context-aware lookarounds with O(n) search-time complexity.
The engine is based on Brzozowski derivatives, allowing for efficient handling of intersection (&) and complement (~) operators, which are not commonly supported in other regex engines.
RE# uses minterm compression to optimize performance by partitioning the character space into equivalence classes, significantly reducing memory usage and improving speed.
The engine employs a DFA matching loop with lazy DFA construction, skipping NFA entirely and building the DFA directly from the regex using derivatives.
RE# supports lookarounds in the form (?<=R1)R2(?=R3), encoding context information directly in the state, enabling linear-time matching with small constants.
The engine adheres to POSIX semantics, ensuring leftmost-longest matches, which are deterministic and avoid the pitfalls of backtracking engines.
RE# is open-source and available as a NuGet package, with a web app for interactive exploration of regex patterns and their combinations.

Hasty Briefsbeta