Grammars of Formal Uncertainty

a year ago

LLMs show promise for automated reasoning but face a tension between probabilistic nature and formal verification's deterministic needs.
The paper investigates failure modes and uncertainty quantification in LLM-generated formal artifacts.
Evaluation of five frontier LLMs shows domain-specific impact on accuracy, with SMT-based autoformalization varying from +34.8% to -44.5%.
Existing UQ techniques like token probability entropy fail to identify errors in LLM outputs.
A probabilistic context-free grammar (PCFG) framework is introduced to model LLM outputs and refine uncertainty taxonomy.
Uncertainty signals are task-dependent, e.g., grammar entropy for logic tasks shows AUROC>0.93.
A lightweight fusion of uncertainty signals enables selective verification, reducing errors (14-100%) with minimal abstention.
This approach transforms LLM-driven formalization into a reliable engineering discipline.

Hasty Briefsbeta