DeepSeek-V4: a million-token context that agents can use
4 hours ago
- #DeepSeek-V4
- #Long-Context Models
- #Agentic AI
- DeepSeek-V4 introduces a million-token context window specifically designed for long-running agentic workloads.
- It solves previous agent failures such as model stopping, KV cache overflow, and degraded tool-call round trips.
- The architecture reduces KV cache memory to 2% and FLOPs significantly compared to previous versions like V3.2.
- Hybrid attention combines Compressed Sparse Attention (CSA) and Heavily Compressed Attention (HCA) for efficiency.
- Agents benefit from interleaved thinking across tool calls, preserving reasoning history across user turns.
- V4 uses a new XML-based tool-call format with dedicated tokens to reduce parsing errors.
- DeepSeek Elastic Compute (DSec) provides a sandbox for RL rollouts, enabling fast and safe agent training.
- V4-Pro-Max shows strong performance in agent benchmarks like Terminal Bench 2.0, SWE Verified, and Toolathlon.
- The model maintains high retrieval accuracy (MRCR 8-needle) up to 1M tokens.
- Four checkpoints are available: V4-Pro and V4-Flash, each in instruct and base versions.
- Instruct models support multiple reasoning modes, including Non-think, Think High, and Think Max.