Experts Have World Models. LLMs Have Word Models

4 hours ago

Copy Link

Experts possess world models, while LLMs (Large Language Models) have word models, focusing on next token prediction rather than next state prediction.
Three types of world models are discussed: 3D video world models, Meta's JEPA and related models, and multiagent world models for adversarial reasoning.
The essay highlights the difference between evaluating text in isolation versus simulating how it will be received in a real-world context with other agents.
Examples illustrate how domain experts anticipate adversarial reactions and hidden incentives, which LLMs currently fail to model effectively.
Perfect-information games like chess differ from imperfect-information games like poker, where hidden state and adversarial adaptation are crucial.
LLMs are optimized for producing coherent outputs but lack the ability to simulate multiagent environments where other parties adapt and counter.
The core issue is not raw intelligence but the training loop—LLMs need to be graded on outcomes in adversarial settings rather than static outputs.
Experts judge artifacts by their robustness under pressure, while outsiders focus on surface-level qualities like coherence and professionalism.
The poker vs. chess analogy underscores the challenge of hidden state and adversarial dynamics in real-world applications.
Future solutions may require multi-agent training environments where LLMs learn from outcomes and adapt to being modeled by others.

Hasty Briefsbeta