Beating the fastest lexer generator in Rust
a year ago
- #performance
- #compiler
- #lexer
- Lexer generators aim to simplify lexer creation and improve performance over hand-written implementations.
- Performance comparisons between logos and a naive lexer implementation show logos is faster on Apple M1 but slower on x86_64.
- Speculative execution differences between architectures affect lexer performance.
- Perfect hash functions are used for efficient keyword matching, leveraging that keywords fit into a 64-bit register.
- Optimizations for ASCII text can significantly improve lexer performance, as most source code is ASCII.
- Vectorization and SIMD instructions can be used to optimize lexer performance, especially for predictable patterns like whitespace.
- Benchmarking with realistic data shows the naive implementation can outperform logos by 20-30% in some scenarios.
- Keyword frequency and identifier patterns in real-world code affect the relative performance of lexer implementations.