The sample efficiency black hole
5 hours ago
- #Sample Efficiency
- #AI Training
- #Data Scaling
- Sample efficiency measures how much data is needed for fluent operation in a domain, and progress in AI may not have significantly improved it.
- AI improvement mainly comes from adding more and better data, scaling compute, and using RL as synthetic data generation.
- Human expert data is highly task-specific and abundant, with specialized roles generating vast amounts of domain-specific content.
- AI models require vastly more data than humans—trillions of tokens compared to millions—highlighting a massive sample efficiency gap.
- Scaling laws suggest that even infinite parameter increases would only modestly reduce data needs, indicating humans are on a different curve.
- Human-level sample efficiency may not be necessary for automating common white-collar tasks, as AI can amortize training across many sessions.
- The future of AI research automation depends on whether AI can solve complex problems despite its lower sample efficiency.
- Evolutionary arguments and sensory data comparisons suggest AI's data demands exceed human learning mechanisms.