DeepSeek V4 Flash

6 hours ago

DeepSeek-V4-Pro (1.6T parameters, 49B activated) and DeepSeek-V4-Flash (284B parameters, 13B activated) are new MoE models with 1 million token context.
Key architectural improvements include Hybrid Attention (CSA+HCA) for efficiency, mHC for stability, and the Muon optimizer.
Both models were pretrained on >32T tokens and post-trained via two-stage SFT, RL, and distillation.
DeepSeek-V4-Pro-Max is positioned as the best open-source model, excelling in coding, reasoning, and agentic tasks.
Models feature three reasoning modes: Non-think (fast), Think High (slower, analytical), and Think Max (full reasoning).
Benchmarks show strong performance in knowledge, reasoning, coding, math, long-context, and agentic evaluations.
The release includes base and instruct models, downloadable with mixed FP4/FP8 precision, under MIT license.
Local deployment guidance is provided, and chat encoding uses custom Python scripts instead of a Jinja template.

Hasty Briefsbeta