The context window has been shattered: Subquadratic debuts a 12M token window

4 hours ago

#Startup Innovation
#Context Window
#AI Models

Major frontier models often have large context windows (e.g., millions of tokens) but struggle to effectively utilize all that information, as seen in benchmarks like MRCR v2 where GPT-5.5 leads with 74.0%.
Subquadratic, a Miami-based startup, introduces a model with a 12-million-token context window, claiming linear scaling in compute and memory via its Subquadratic Selective Attention (SSA) architecture, which avoids the quadratic cost of traditional attention mechanisms.
SSA reportedly runs 52 times faster than dense attention at a million tokens, achieves 92.1% on needle-in-a-haystack retrieval at 12 million tokens, and scores 83 on MRCR v2, outperforming OpenAI by nine points.
Benchmarks show Subquadratic edging out competitors: 82.4% on SWE-bench (vs. Anthropic Opus 4.6's 81.42% and Google Gemini 3.1 Pro's 80.6%), though tests were limited due to high inference costs and the model is smaller than those from major labs.
The company offers an API with a 12-million-token window, a coding agent (SubQ Code), and a deep research tool (SubQ Search), with plans for a 50-million-token model by Q4, but it's not open-sourcing weights, instead providing training tools for enterprises.
Subquadratic's approach differs from previous attempts (e.g., fixed-pattern sparse attention, state-space models like Mamba, hybrid architectures) by using content-dependent selection without quadratic scaling, aiming for a scaling-law advantage rather than just scalar benefits.
The startup has raised $29 million at a $500 million valuation, with investors including former SoftBank and Tinder co-founders, and pivoted from speech models, though the field has seen hype (e.g., Magic.dev's claims) without widespread adoption yet.

Hasty Briefsbeta

The context window has been shattered: Subquadratic debuts a 12M token window