The sample efficiency black hole

5 hours ago

Sample efficiency measures how much data is needed for fluent operation in a domain, and progress in AI may not have significantly improved it.
AI improvement mainly comes from adding more and better data, scaling compute, and using RL as synthetic data generation.
Human expert data is highly task-specific and abundant, with specialized roles generating vast amounts of domain-specific content.
AI models require vastly more data than humans—trillions of tokens compared to millions—highlighting a massive sample efficiency gap.
Scaling laws suggest that even infinite parameter increases would only modestly reduce data needs, indicating humans are on a different curve.
Human-level sample efficiency may not be necessary for automating common white-collar tasks, as AI can amortize training across many sessions.
The future of AI research automation depends on whether AI can solve complex problems despite its lower sample efficiency.
Evolutionary arguments and sensory data comparisons suggest AI's data demands exceed human learning mechanisms.

Hasty Briefsbeta