Hasty Briefsbeta

A Ruby Regular Expression Engine

11 days ago
  • #Regular Expressions
  • #Ruby
  • #Unicode
  • The exreg gem is a pure-Ruby implementation of a Unicode regular expression engine.
  • It uses a Thompson-style NFA virtual machine, making it immune to ReDoS caused by catastrophic backtracking.
  • Traditional backtracking engines can take exponential time on certain inputs, while Thompson-style engines run in linear time.
  • Exreg consists of a parser, compiler, and virtual machine, with support for Unicode properties and case folding.
  • It uses a custom binary format for efficient storage and loading of Unicode data.
  • USet objects efficiently represent sets of Unicode codepoints and support various set operations.
  • ByteSet objects are used in bytecode for fast membership testing using bitwise operations.
  • Exreg only supports Unicode encodings, assuming UTF-8 by default, and treats strings as byte arrays.
  • Future improvements could include adding a backtracking engine for features like backreferences and lookaround assertions.
  • A JIT compiler could be introduced to speed up the currently interpreted VM.