The DeepSeek v3.2 Breakthrough Simplified
9 hours ago
- #DeepSeek
- #AI Efficiency
- #Sparse Attention
- DeepSeek-V3.2-Exp introduces DeepSeek Sparse Attention (DSA) for faster attention calculations.
- DSA consists of two submodules: Lightning Indexer and Multi-Latent Attention (MLA).
- Lightning Indexer creates an attention mask with top-k interactions, reducing computation by using fewer heads and smaller dimensions.
- MLA performs sparse attention, computing only the top-k interactions from the mask, reducing time complexity to O(kn).
- DSA reuses interaction importance information, differing from techniques like YOCO and Multi-Query Attention.