Hasty Briefsbeta

Bilingual

Yes, all longest regex matches in linear time is possible

9 hours ago
  • #algorithms
  • #regex
  • #performance
  • Regex engines like RE2, Go, and Rust advertise linear-time matching for a single match, but finding all matches can be quadratic.
  • The problem arises from overlapping match candidates and the need to restart the search from each position.
  • Earliest match semantics can avoid quadratic time but changes the results, providing shorter matches than leftmost-longest semantics.
  • RE# introduces a hardened mode that guarantees linear time for all matches without altering semantics, using a two-pass algorithm.
  • Hardened mode is opt-in due to a performance trade-off, as most real-world patterns don't trigger the quadratic case.
  • RE# includes CPU optimizations like AVX2 and NEON for skip acceleration, improving performance on literal-heavy patterns.
  • Streaming semantics are limited by leftmost-longest matching, which may require unbounded buffering, but chunked approaches can help.
  • RE# does not support capture groups or lazy quantifiers, focusing on leftmost-longest (POSIX) semantics and linear-time guarantees.