LLMs Encode How Difficult Problems Are
16 days ago
- #Problem Difficulty
- #Reinforcement Learning
- #Large Language Models
- Large language models (LLMs) solve complex problems but often fail on simpler ones.
- Study investigates if LLMs internally encode problem difficulty aligned with human judgment.
- Human-labeled difficulty is strongly decodable and scales with model size, unlike LLM-derived difficulty.
- Steering models toward 'easier' representations reduces hallucination and improves accuracy.
- Human-difficulty probe strengthens during training and correlates with test accuracy, unlike LLM-difficulty probe.
- Results suggest human annotations provide a stable difficulty signal that reinforcement learning amplifies.
- Probe code and evaluation scripts are released for replication.