Grammars of Formal Uncertainty
a year ago
- #Uncertainty Quantification
- #LLMs
- #Formal Verification
- LLMs show promise for automated reasoning but face a tension between probabilistic nature and formal verification's deterministic needs.
- The paper investigates failure modes and uncertainty quantification in LLM-generated formal artifacts.
- Evaluation of five frontier LLMs shows domain-specific impact on accuracy, with SMT-based autoformalization varying from +34.8% to -44.5%.
- Existing UQ techniques like token probability entropy fail to identify errors in LLM outputs.
- A probabilistic context-free grammar (PCFG) framework is introduced to model LLM outputs and refine uncertainty taxonomy.
- Uncertainty signals are task-dependent, e.g., grammar entropy for logic tasks shows AUROC>0.93.
- A lightweight fusion of uncertainty signals enables selective verification, reducing errors (14-100%) with minimal abstention.
- This approach transforms LLM-driven formalization into a reliable engineering discipline.