LLMs Are Complicated Now
9 hours ago
- #LLMs
- #model architecture
- #machine learning
- Meta had two distinct machine learning branches: LLMs like Llama and complex recommendation systems, with the latter now simplified.
- Modern LLMs feature diverse attention mechanisms, MoE, integrated vision/audio encoders, and multi-GPU inference, increasing complexity.
- Recommendation systems evolved from simple two-tower networks to complex models balancing capability and efficiency.
- Agents may not solve optimization issues; a reliable baseline is needed for verifying generated kernels.
- Performance improvements are critical; small gaps between optimization and necessity require partially optimized variants for research.
- FlexAttention in PyTorch exemplifies composable design, enabling exploration with minimal performance loss.
- Andrej Karpathy emphasizes composability and architectural simplification as key to advancing AI research loops.