Hasty Briefsbeta

Bilingual

Grammars of Formal Uncertainty

a year ago
  • #Uncertainty Quantification
  • #LLMs
  • #Formal Verification
  • LLMs show promise for automated reasoning but face a tension between probabilistic nature and formal verification's deterministic needs.
  • The paper investigates failure modes and uncertainty quantification in LLM-generated formal artifacts.
  • Evaluation of five frontier LLMs shows domain-specific impact on accuracy, with SMT-based autoformalization varying from +34.8% to -44.5%.
  • Existing UQ techniques like token probability entropy fail to identify errors in LLM outputs.
  • A probabilistic context-free grammar (PCFG) framework is introduced to model LLM outputs and refine uncertainty taxonomy.
  • Uncertainty signals are task-dependent, e.g., grammar entropy for logic tasks shows AUROC>0.93.
  • A lightweight fusion of uncertainty signals enables selective verification, reducing errors (14-100%) with minimal abstention.
  • This approach transforms LLM-driven formalization into a reliable engineering discipline.