Hasty Briefsbeta

Bilingual

DeepSeek-V4: a million-token context that agents can use

4 hours ago
  • #DeepSeek-V4
  • #Long-Context Models
  • #Agentic AI
  • DeepSeek-V4 introduces a million-token context window specifically designed for long-running agentic workloads.
  • It solves previous agent failures such as model stopping, KV cache overflow, and degraded tool-call round trips.
  • The architecture reduces KV cache memory to 2% and FLOPs significantly compared to previous versions like V3.2.
  • Hybrid attention combines Compressed Sparse Attention (CSA) and Heavily Compressed Attention (HCA) for efficiency.
  • Agents benefit from interleaved thinking across tool calls, preserving reasoning history across user turns.
  • V4 uses a new XML-based tool-call format with dedicated tokens to reduce parsing errors.
  • DeepSeek Elastic Compute (DSec) provides a sandbox for RL rollouts, enabling fast and safe agent training.
  • V4-Pro-Max shows strong performance in agent benchmarks like Terminal Bench 2.0, SWE Verified, and Toolathlon.
  • The model maintains high retrieval accuracy (MRCR 8-needle) up to 1M tokens.
  • Four checkpoints are available: V4-Pro and V4-Flash, each in instruct and base versions.
  • Instruct models support multiple reasoning modes, including Non-think, Think High, and Think Max.