The context window has been shattered: Subquadratic debuts a 12M token window
5 hours ago
- #Startup Innovation
- #Context Window
- #AI Models
- Major frontier models often have large context windows (e.g., millions of tokens) but struggle to effectively utilize all that information, as seen in benchmarks like MRCR v2 where GPT-5.5 leads with 74.0%.
- Subquadratic, a Miami-based startup, introduces a model with a 12-million-token context window, claiming linear scaling in compute and memory via its Subquadratic Selective Attention (SSA) architecture, which avoids the quadratic cost of traditional attention mechanisms.
- SSA reportedly runs 52 times faster than dense attention at a million tokens, achieves 92.1% on needle-in-a-haystack retrieval at 12 million tokens, and scores 83 on MRCR v2, outperforming OpenAI by nine points.
- Benchmarks show Subquadratic edging out competitors: 82.4% on SWE-bench (vs. Anthropic Opus 4.6's 81.42% and Google Gemini 3.1 Pro's 80.6%), though tests were limited due to high inference costs and the model is smaller than those from major labs.
- The company offers an API with a 12-million-token window, a coding agent (SubQ Code), and a deep research tool (SubQ Search), with plans for a 50-million-token model by Q4, but it's not open-sourcing weights, instead providing training tools for enterprises.
- Subquadratic's approach differs from previous attempts (e.g., fixed-pattern sparse attention, state-space models like Mamba, hybrid architectures) by using content-dependent selection without quadratic scaling, aiming for a scaling-law advantage rather than just scalar benefits.
- The startup has raised $29 million at a $500 million valuation, with investors including former SoftBank and Tinder co-founders, and pivoted from speech models, though the field has seen hype (e.g., Magic.dev's claims) without widespread adoption yet.