Losing 1½ Million Lines of Go
2 months ago
- #Automata
- #Go Programming
- #Unicode
- The article discusses implementing Unicode character property matching in Quamina, a pattern-matching library.
- The author initially generated 775K lines of code for automata but abandoned the approach due to inefficiency.
- Go's Unicode library is outdated, prompting the author to parse UnicodeData.txt directly for the latest character properties.
- The solution involves caching automata for Unicode properties, improving performance from 135 to 4,330 regexps per second.
- The author reflects on the potential use of GenAI (like Claude) for routine programming tasks but hasn't adopted it yet.
- Quamina's upcoming features include full regexp support, with numeric quantifiers being the next milestone.
- The author expresses skepticism about GenAI's hype but acknowledges its potential in code-related tasks.
- The article ends with a note on the challenges of maintaining open-source projects and the loss of community contributors.