Deep Think with Confidence
17 days ago
- #Confidence Metrics
- #LLM Reasoning
- #Efficiency Optimization
- DeepConf enhances LLM reasoning by using internal log-probabilities for localized confidence scores.
- Operates in two modes: offline (filters completed traces) and online (dynamic termination of low-confidence traces).
- Achieves state-of-the-art accuracy (e.g., 99.9% on AIME 2025) while reducing token generation by up to 84.7%.
- Introduces fine-grained confidence metrics like Group Confidence, Bottom 10% Group Confidence, and Tail Confidence.
- Offline mode uses confidence-weighted majority voting; online mode dynamically prunes low-confidence traces.
- Experimental results show significant accuracy improvements and computational efficiency across multiple benchmarks.
- Addresses the 'confidently wrong' problem as a key limitation for future work.