Hasty Briefsbeta

Bilingual

SubQ: a sub-quadratic LLM with 12M-token context

9 hours ago
  • #subquadratic scaling
  • #long-context LLM
  • #AI architecture
  • Transformers enable modern AI but have quadratic scaling in compute with context length, making long contexts expensive and impractical.
  • SubQ introduces the first fully subquadratic LLM architecture, where compute grows linearly with context, enabling millions of tokens.
  • SubQ 1M-Preview achieves state-of-the-art accuracy (95% on RULER 128K) and efficiency (52x faster attention, 63% less compute) compared to frontier models.
  • Products include an API, SubQ Code for full-codebase processing, and SubQ Search for long-context research, all available in private beta.
  • SubQ's architecture reduces attention compute by nearly 1,000x, supports up to 12 million tokens, and improves cost-effectiveness for AI applications.
  • The team comprises 11 PhD researchers from top institutions, backed by $29M in seed funding, aiming to break quadratic scaling constraints in AI.