Hasty Briefsbeta

DeepSeek releases 'sparse attention' model that cuts API costs in half

8 hours ago
  • #DeepSeek
  • #AI
  • #InferenceCosts
  • DeepSeek released a new experimental model called V3.2-exp with lower inference costs for long-context operations.
  • The model features DeepSeek Sparse Attention, which uses a 'lightning indexer' and 'fine-grained token selection system' to optimize server loads.
  • Preliminary testing shows API call costs could be reduced by up to 50% in long-context scenarios.
  • The model is open-weight and available on Hugging Face, allowing third-party verification.
  • DeepSeek aims to improve transformer architecture efficiency to reduce inference costs.
  • DeepSeek, based in China, previously gained attention with its R1 model but has since receded from the spotlight.
  • The new sparse attention approach may help U.S. providers lower inference costs.