Hasty Briefsbeta

Bilingual

The sample efficiency black hole

5 hours ago
  • #Sample Efficiency
  • #AI Training
  • #Data Scaling
  • Sample efficiency measures how much data is needed for fluent operation in a domain, and progress in AI may not have significantly improved it.
  • AI improvement mainly comes from adding more and better data, scaling compute, and using RL as synthetic data generation.
  • Human expert data is highly task-specific and abundant, with specialized roles generating vast amounts of domain-specific content.
  • AI models require vastly more data than humans—trillions of tokens compared to millions—highlighting a massive sample efficiency gap.
  • Scaling laws suggest that even infinite parameter increases would only modestly reduce data needs, indicating humans are on a different curve.
  • Human-level sample efficiency may not be necessary for automating common white-collar tasks, as AI can amortize training across many sessions.
  • The future of AI research automation depends on whether AI can solve complex problems despite its lower sample efficiency.
  • Evolutionary arguments and sensory data comparisons suggest AI's data demands exceed human learning mechanisms.