Hasty Briefsbeta

Bilingual

Distillation Makes AI Models Smaller and Cheaper

9 months ago
  • #Distillation
  • #AI
  • #Machine Learning
  • DeepSeek's R1 chatbot gained attention for rivaling top AI models with less computing power and cost, causing stock drops in Western tech companies.
  • Accusations suggested DeepSeek used distillation to obtain knowledge from OpenAI's proprietary model, though distillation is a common AI tool.
  • Distillation, or knowledge distillation, was introduced in a 2015 Google paper by Geoffrey Hinton and others to simplify ensemble models.
  • The technique uses 'dark knowledge' from a large 'teacher' model to train a smaller 'student' model more efficiently.
  • Distillation became crucial as AI models grew larger and more expensive, leading to widespread adoption by companies like Google and OpenAI.
  • Recent applications include training chain-of-thought reasoning models, such as NovaSky's Sky-T1, which achieved high performance at low cost.