Distillation Makes AI Models Smaller and Cheaper

10 months ago

DeepSeek's R1 chatbot gained attention for rivaling top AI models with less computing power and cost, causing stock drops in Western tech companies.
Accusations suggested DeepSeek used distillation to obtain knowledge from OpenAI's proprietary model, though distillation is a common AI tool.
Distillation, or knowledge distillation, was introduced in a 2015 Google paper by Geoffrey Hinton and others to simplify ensemble models.
The technique uses 'dark knowledge' from a large 'teacher' model to train a smaller 'student' model more efficiently.
Distillation became crucial as AI models grew larger and more expensive, leading to widespread adoption by companies like Google and OpenAI.
Recent applications include training chain-of-thought reasoning models, such as NovaSky's Sky-T1, which achieved high performance at low cost.

Hasty Briefsbeta