SubQ: a sub-quadratic LLM with 12M-token context
9 hours ago
- #subquadratic scaling
- #long-context LLM
- #AI architecture
- Transformers enable modern AI but have quadratic scaling in compute with context length, making long contexts expensive and impractical.
- SubQ introduces the first fully subquadratic LLM architecture, where compute grows linearly with context, enabling millions of tokens.
- SubQ 1M-Preview achieves state-of-the-art accuracy (95% on RULER 128K) and efficiency (52x faster attention, 63% less compute) compared to frontier models.
- Products include an API, SubQ Code for full-codebase processing, and SubQ Search for long-context research, all available in private beta.
- SubQ's architecture reduces attention compute by nearly 1,000x, supports up to 12 million tokens, and improves cost-effectiveness for AI applications.
- The team comprises 11 PhD researchers from top institutions, backed by $29M in seed funding, aiming to break quadratic scaling constraints in AI.