Build a DeepSeek Model from Scratch

5 days ago

Copy Link

DeepSeek introduced innovative strategies like Mixture of Experts, Latent Attention, and Multi-token Prediction to achieve high performance with low costs.
The course 'Build a DeepSeek Model (From Scratch)' teaches how to implement DeepSeek's core innovations such as Multi-Head Latent Attention and Mixture-of-Experts layers.
Participants will learn to build a production-ready training pipeline with Multi-Token Prediction and FP8 quantization for efficiency.
The course covers parallelism strategies like DualPipe to maximize hardware utilization.
Post-training methods such as supervised fine-tuning and reinforcement learning are included to enhance reasoning capabilities.
Techniques for compressing and distilling large models into smaller, deployable versions are also taught.
The course starts with a review of LLM fundamentals, highlighting how DeepSeek's innovations address common limitations.

Hasty Briefsbeta