Distillation Makes AI Models Smaller and Cheaper
9 months ago
- #Distillation
- #AI
- #Machine Learning
- DeepSeek's R1 chatbot gained attention for rivaling top AI models with less computing power and cost, causing stock drops in Western tech companies.
- Accusations suggested DeepSeek used distillation to obtain knowledge from OpenAI's proprietary model, though distillation is a common AI tool.
- Distillation, or knowledge distillation, was introduced in a 2015 Google paper by Geoffrey Hinton and others to simplify ensemble models.
- The technique uses 'dark knowledge' from a large 'teacher' model to train a smaller 'student' model more efficiently.
- Distillation became crucial as AI models grew larger and more expensive, leading to widespread adoption by companies like Google and OpenAI.
- Recent applications include training chain-of-thought reasoning models, such as NovaSky's Sky-T1, which achieved high performance at low cost.