Regular expressions that work "everywhere"
3 days ago
- #regular-expressions
- #programming-tools
- #compatibility
- Regular expression implementations vary, leading to frustration when features are missing or syntax differs.
- Learning regex in Perl, a maximalist environment, can create issues when tools lack expected features.
- To work on restricted computers, a subset of regex features that work 'everywhere' is needed, with literals, character classes, and special characters like . * ^ $ as the strictest core.
- For personal use, targeting tools like sed, awk, grep, and Emacs defines a broader 'everywhere' with common features.
- Gnu sed, awk, and grep with -E option share many features; awk's word boundaries differ (\< and \> vs \b and \B).
- Emacs requires backslashes before characters like + ? ( ) { } | to match awk equivalents and uses \s- and \S- for space/nonspace analogs.
- Features that work across all targeted tools include . ^, $ [...], [^...] * \w, \W, \s, \S \1 - \9 backreferences \b \B ? + | alternation {n,m} counting, and (...) capturing.
- Gawk supports backreferences in replacement strings but not within regex patterns.