Hasty Briefsbeta

Bilingual

The context window has been shattered: Subquadratic debuts a 12M token window

4 hours ago
  • #Startup Innovation
  • #Context Window
  • #AI Models
  • Major frontier models often have large context windows (e.g., millions of tokens) but struggle to effectively utilize all that information, as seen in benchmarks like MRCR v2 where GPT-5.5 leads with 74.0%.
  • Subquadratic, a Miami-based startup, introduces a model with a 12-million-token context window, claiming linear scaling in compute and memory via its Subquadratic Selective Attention (SSA) architecture, which avoids the quadratic cost of traditional attention mechanisms.
  • SSA reportedly runs 52 times faster than dense attention at a million tokens, achieves 92.1% on needle-in-a-haystack retrieval at 12 million tokens, and scores 83 on MRCR v2, outperforming OpenAI by nine points.
  • Benchmarks show Subquadratic edging out competitors: 82.4% on SWE-bench (vs. Anthropic Opus 4.6's 81.42% and Google Gemini 3.1 Pro's 80.6%), though tests were limited due to high inference costs and the model is smaller than those from major labs.
  • The company offers an API with a 12-million-token window, a coding agent (SubQ Code), and a deep research tool (SubQ Search), with plans for a 50-million-token model by Q4, but it's not open-sourcing weights, instead providing training tools for enterprises.
  • Subquadratic's approach differs from previous attempts (e.g., fixed-pattern sparse attention, state-space models like Mamba, hybrid architectures) by using content-dependent selection without quadratic scaling, aiming for a scaling-law advantage rather than just scalar benefits.
  • The startup has raised $29 million at a $500 million valuation, with investors including former SoftBank and Tinder co-founders, and pivoted from speech models, though the field has seen hype (e.g., Magic.dev's claims) without widespread adoption yet.