LLMs Encode How Difficult Problems Are

16 days ago

Copy Link

Large language models (LLMs) solve complex problems but often fail on simpler ones.
Study investigates if LLMs internally encode problem difficulty aligned with human judgment.
Human-labeled difficulty is strongly decodable and scales with model size, unlike LLM-derived difficulty.
Steering models toward 'easier' representations reduces hallucination and improves accuracy.
Human-difficulty probe strengthens during training and correlates with test accuracy, unlike LLM-difficulty probe.
Results suggest human annotations provide a stable difficulty signal that reinforcement learning amplifies.
Probe code and evaluation scripts are released for replication.

Hasty Briefsbeta