The DeepSeek v3.2 Breakthrough Simplified

9 hours ago

Copy Link

DeepSeek-V3.2-Exp introduces DeepSeek Sparse Attention (DSA) for faster attention calculations.
DSA consists of two submodules: Lightning Indexer and Multi-Latent Attention (MLA).
Lightning Indexer creates an attention mask with top-k interactions, reducing computation by using fewer heads and smaller dimensions.
MLA performs sparse attention, computing only the top-k interactions from the mask, reducing time complexity to O(kn).
DSA reuses interaction importance information, differing from techniques like YOCO and Multi-Query Attention.

Hasty Briefsbeta