It is surprising that Earley can efficiently parse C, ambiguities and all
a day ago
- #computer science
- #parsing
- #Earley algorithm
- Earley parsing algorithm overview and optimizations, including Leo's right-recursion fixes.
- Parsing challenges in computer science, especially with grammars requiring localized ambiguity or arbitrary recursion.
- Definitions: recognizer, scannerless parser, parse tree, AST, parse forest, and complexity notations (O(n), O(n^2), etc.).
- Earley parsers handle arbitrary context-free grammars with O(n^3) worst-case complexity, optimized to O(n) for sane grammars.
- Elizabeth Scott's contribution: generating parse forests in the same complexity as Earley recognition, addressing ambiguity.
- Challenges with left-to-right disambiguation rules in right-to-left parse forests, especially in C's dangling else problem.
- Proposed fixes: reversing grammar and input token stream, with trade-offs in implementation complexity and practicality.
- Special considerations for parsing C, involving typedefs and the need for multiple recognition and parsing phases.
- Comparison with other parsing algorithms: CYK, GLR, GLL, Parsing With Derivatives, and Parsing With Zippers.
- Personal implementation experience confirming the effectiveness of the described approach for C parsing.