Hasty Briefsbeta

Deep Think with Confidence

17 days ago
  • #Confidence Metrics
  • #LLM Reasoning
  • #Efficiency Optimization
  • DeepConf enhances LLM reasoning by using internal log-probabilities for localized confidence scores.
  • Operates in two modes: offline (filters completed traces) and online (dynamic termination of low-confidence traces).
  • Achieves state-of-the-art accuracy (e.g., 99.9% on AIME 2025) while reducing token generation by up to 84.7%.
  • Introduces fine-grained confidence metrics like Group Confidence, Bottom 10% Group Confidence, and Tail Confidence.
  • Offline mode uses confidence-weighted majority voting; online mode dynamically prunes low-confidence traces.
  • Experimental results show significant accuracy improvements and computational efficiency across multiple benchmarks.
  • Addresses the 'confidently wrong' problem as a key limitation for future work.