Ohm's Peg-to-WASM Compiler
2 days ago
- #WebAssembly
- #parsing
- #performance
- Ohm is a parsing toolkit for JavaScript and TypeScript, useful for custom file formats or building language tools.
- Version 18 is a complete rewrite that compiles grammars into WebAssembly, achieving over 50x speed improvement and 10% memory usage compared to previous versions.
- Previous versions used AST interpretation with PExpr trees, where parsing expressions were evaluated via methods like 'eval'.
- The new engine compiles grammars to WebAssembly, avoiding interpretation overhead and inlining code for expressions.
- CST nodes are managed with a bump allocator in Wasm linear memory, using region-based management to reduce overhead.
- Terminal nodes are optimized using tagged 32-bit values to avoid per-character allocations.
- Chunked bindings with fixed-size chunks improve performance by eliminating array resizes and making backtracking cheap.
- Memoization uses a block-sparse table for efficient storage of parsing results, with entries packed into i32 values.
- Parameterized rules are handled via static specialization, generating unique rule bodies for each parameter combination.
- Optimized space skipping avoids creating CST nodes for whitespace until needed, improving performance in many grammars.
- Additional optimizations include single-use rule inlining and preallocated nodes for fixed-structure elements.
- The release is available as a beta via npm, with acknowledgments to funders and contributors like Alex Warth.