Subquadratic – Introducing SubQ 1.1 Small

3 hours ago

SubQ 1.1 Small is the second iteration of a Subquadratic Sparse Attention (SSA) model, designed for reasoning over large artifacts like codebases and documents.
It achieves near-perfect long-context retrieval up to 12M tokens with up to 1,000x attention compute reduction, balancing long-context optimization with strong general reasoning.
Key benchmarks include high scores on Needle-In-A-Haystack and RULER tests, with strong performance in knowledge, coding, and agentic tasks like GPQA Diamond and LiveCodeBench.
The model uses SSA for linear scaling with context length, requiring 64.5x less compute than dense attention and running 56x faster than FlashAttention-2 at 1M tokens.
Training involved replacing dense attention with SSA and extended pretraining on long artifacts, enabling efficient multi-million-token experiments.
Use cases include financial analysis, legal contract work, and software engineering, where reasoning across complete artifacts is essential.
Plans include deployment with design partners, broader rollout, and general model releases by year-end.

Hasty Briefsbeta