LLMs Are Complicated Now

9 hours ago

Meta had two distinct machine learning branches: LLMs like Llama and complex recommendation systems, with the latter now simplified.
Modern LLMs feature diverse attention mechanisms, MoE, integrated vision/audio encoders, and multi-GPU inference, increasing complexity.
Recommendation systems evolved from simple two-tower networks to complex models balancing capability and efficiency.
Agents may not solve optimization issues; a reliable baseline is needed for verifying generated kernels.
Performance improvements are critical; small gaps between optimization and necessity require partially optimized variants for research.
FlexAttention in PyTorch exemplifies composable design, enabling exploration with minimal performance loss.
Andrej Karpathy emphasizes composability and architectural simplification as key to advancing AI research loops.

Hasty Briefsbeta