How Confident Are AI Classifiers About Their Own Confidence?
6 hours ago
- #AI Classification
- #LLM Applications
- #Confidence Calibration
- LLMs are widely used for text classification tasks, often replacing older models like BERT for NLP applications.
- Obtaining classification probabilities from LLMs is challenging; common methods include prompting for confidence scores or extracting token-level probabilities.
- An experiment used NEISS data to extract primary injury classifications from medical narratives with an LLM, achieving 86% accuracy.
- AI-generated confidence scores and token probabilities were evaluated; both showed calibration issues, with token probabilities being overly confident.
- Calibration techniques like isotonic regression can adjust probabilities to better reflect observed accuracy, improving reliability for practical use.