DeepSeek V4–almost on the frontier, a fraction of the price

8 hours ago

DeepSeek released two new models: DeepSeek-V4-Pro and DeepSeek-V4-Flash, both with 1 million token context and using Mixture of Experts architecture.
DeepSeek-V4-Pro is the largest open weights model at 1.6 trillion total parameters with 49 billion active, while Flash has 284 billion total and 13 billion active parameters.
Pricing is very low: Flash costs $0.14 per million tokens input and $0.28 output; Pro costs $1.74 input and $3.48 output, making them among the cheapest frontier models.
The models show significant efficiency gains: Pro uses only 27% of FLOPs and 10% of KV cache compared to V3.2 for 1M-token contexts, with Flash achieving even better efficiency.
Performance is competitive but slightly behind state-of-the-art frontier models like GPT-5.4 and Gemini-3.1-Pro by about 3 to 6 months, as noted in DeepSeek's paper.
The models are available under the MIT license, with potential for local deployment on hardware like a 128GB MacBook Pro, especially with quantized versions expected soon.

Hasty Briefsbeta