DeepSeek releases 'sparse attention' model that cuts API costs in half
10 hours ago
- #DeepSeek
- #AI
- #InferenceCosts
- DeepSeek released a new experimental model called V3.2-exp with lower inference costs for long-context operations.
- The model features DeepSeek Sparse Attention, which uses a 'lightning indexer' and 'fine-grained token selection system' to optimize server loads.
- Preliminary testing shows API call costs could be reduced by up to 50% in long-context scenarios.
- The model is open-weight and available on Hugging Face, allowing third-party verification.
- DeepSeek aims to improve transformer architecture efficiency to reduce inference costs.
- DeepSeek, based in China, previously gained attention with its R1 model but has since receded from the spotlight.
- The new sparse attention approach may help U.S. providers lower inference costs.