MiniMax-M1 open-weight, large-scale hybrid-attention reasoning model
a year ago
- #AI
- #Machine Learning
- #Natural Language Processing
- MiniMax-M1 is the world's first open-weight, large-scale hybrid-attention reasoning model.
- It features a hybrid Mixture-of-Experts (MoE) architecture and lightning attention mechanism.
- Supports a context length of 1 million tokens, 8x that of DeepSeek R1.
- Consumes 25% of the FLOPs compared to DeepSeek R1 at 100K token generation.
- Trained using large-scale reinforcement learning on diverse tasks.
- Introduces CISPO, a novel algorithm for efficient RL scaling.
- Two versions available: MiniMax-M1-40K and MiniMax-M1-80K.
- Outperforms models like DeepSeek-R1 and Qwen3-235B on complex tasks.
- Benchmarked across mathematics, coding, software engineering, and more.
- Supports function calling and can be deployed using vLLM or Transformers.