ZAYA1-8B: An 8B Moe Model with 760M Active Params Matching DeepSeek-R1 on Math

5 hours ago

ZAYA1-8B is a mixture of experts model with 8.4B total parameters and only 760M active during inference, making it cost-effective while maintaining high performance.
The model was trained entirely on AMD hardware (Instinct MI300X GPUs), challenging NVIDIA's dominance and proving the viability of alternative infrastructure.
ZAYA1-8B competes with or outperforms frontier models like DeepSeek-R1, Claude Sonnet 4.5, and Gemini 2.5 Pro on math and coding benchmarks, despite its small active parameter count.
It features Markovian RSA, a novel inference method that allows the model to improve with more compute by reasoning in bounded chunks, avoiding context window limitations.
Limitations include weaker performance in agentic tasks (e.g., function calling, instruction following) and general chat quality, making it a specialist for math, science, and coding.
Available under Apache 2.0 on Hugging Face, but requires a custom vLLM fork for local deployment; also accessible via Zyphra Cloud.

Hasty Briefsbeta