ZAYA1-8B: An 8B Moe Model with 760M Active Params Matching DeepSeek-R1 on Math
5 hours ago
- #AMD hardware
- #AI model
- #Mixture of Experts
- ZAYA1-8B is a mixture of experts model with 8.4B total parameters and only 760M active during inference, making it cost-effective while maintaining high performance.
- The model was trained entirely on AMD hardware (Instinct MI300X GPUs), challenging NVIDIA's dominance and proving the viability of alternative infrastructure.
- ZAYA1-8B competes with or outperforms frontier models like DeepSeek-R1, Claude Sonnet 4.5, and Gemini 2.5 Pro on math and coding benchmarks, despite its small active parameter count.
- It features Markovian RSA, a novel inference method that allows the model to improve with more compute by reasoning in bounded chunks, avoiding context window limitations.
- Limitations include weaker performance in agentic tasks (e.g., function calling, instruction following) and general chat quality, making it a specialist for math, science, and coding.
- Available under Apache 2.0 on Hugging Face, but requires a custom vLLM fork for local deployment; also accessible via Zyphra Cloud.