Hasty Briefsbeta

Bilingual

DeepSeek V4 Flash

6 hours ago
  • #mixture-of-experts
  • #large-language-models
  • #long-context
  • DeepSeek-V4-Pro (1.6T parameters, 49B activated) and DeepSeek-V4-Flash (284B parameters, 13B activated) are new MoE models with 1 million token context.
  • Key architectural improvements include Hybrid Attention (CSA+HCA) for efficiency, mHC for stability, and the Muon optimizer.
  • Both models were pretrained on >32T tokens and post-trained via two-stage SFT, RL, and distillation.
  • DeepSeek-V4-Pro-Max is positioned as the best open-source model, excelling in coding, reasoning, and agentic tasks.
  • Models feature three reasoning modes: Non-think (fast), Think High (slower, analytical), and Think Max (full reasoning).
  • Benchmarks show strong performance in knowledge, reasoning, coding, math, long-context, and agentic evaluations.
  • The release includes base and instruct models, downloadable with mixed FP4/FP8 precision, under MIT license.
  • Local deployment guidance is provided, and chat encoding uses custom Python scripts instead of a Jinja template.