Hasty Briefsbeta

Bilingual

ZAYA1-8B: An 8B Moe Model with 760M Active Params Matching DeepSeek-R1 on Math

5 hours ago
  • #AMD hardware
  • #AI model
  • #Mixture of Experts
  • ZAYA1-8B is a mixture of experts model with 8.4B total parameters and only 760M active during inference, making it cost-effective while maintaining high performance.
  • The model was trained entirely on AMD hardware (Instinct MI300X GPUs), challenging NVIDIA's dominance and proving the viability of alternative infrastructure.
  • ZAYA1-8B competes with or outperforms frontier models like DeepSeek-R1, Claude Sonnet 4.5, and Gemini 2.5 Pro on math and coding benchmarks, despite its small active parameter count.
  • It features Markovian RSA, a novel inference method that allows the model to improve with more compute by reasoning in bounded chunks, avoiding context window limitations.
  • Limitations include weaker performance in agentic tasks (e.g., function calling, instruction following) and general chat quality, making it a specialist for math, science, and coding.
  • Available under Apache 2.0 on Hugging Face, but requires a custom vLLM fork for local deployment; also accessible via Zyphra Cloud.