DeepSeek-V4: a million-token context that agents can use

4 hours ago

DeepSeek-V4 introduces a million-token context window specifically designed for long-running agentic workloads.
It solves previous agent failures such as model stopping, KV cache overflow, and degraded tool-call round trips.
The architecture reduces KV cache memory to 2% and FLOPs significantly compared to previous versions like V3.2.
Hybrid attention combines Compressed Sparse Attention (CSA) and Heavily Compressed Attention (HCA) for efficiency.
Agents benefit from interleaved thinking across tool calls, preserving reasoning history across user turns.
V4 uses a new XML-based tool-call format with dedicated tokens to reduce parsing errors.
DeepSeek Elastic Compute (DSec) provides a sandbox for RL rollouts, enabling fast and safe agent training.
V4-Pro-Max shows strong performance in agent benchmarks like Terminal Bench 2.0, SWE Verified, and Toolathlon.
The model maintains high retrieval accuracy (MRCR 8-needle) up to 1M tokens.
Four checkpoints are available: V4-Pro and V4-Flash, each in instruct and base versions.
Instruct models support multiple reasoning modes, including Non-think, Think High, and Think Max.

Hasty Briefsbeta