SubQ: a sub-quadratic LLM with 12M-token context

9 hours ago

Transformers enable modern AI but have quadratic scaling in compute with context length, making long contexts expensive and impractical.
SubQ introduces the first fully subquadratic LLM architecture, where compute grows linearly with context, enabling millions of tokens.
SubQ 1M-Preview achieves state-of-the-art accuracy (95% on RULER 128K) and efficiency (52x faster attention, 63% less compute) compared to frontier models.
Products include an API, SubQ Code for full-codebase processing, and SubQ Search for long-context research, all available in private beta.
SubQ's architecture reduces attention compute by nearly 1,000x, supports up to 12 million tokens, and improves cost-effectiveness for AI applications.
The team comprises 11 PhD researchers from top institutions, backed by $29M in seed funding, aiming to break quadratic scaling constraints in AI.

Hasty Briefsbeta