MiniMax-M1 open-weight, large-scale hybrid-attention reasoning model

a year ago

MiniMax-M1 is the world's first open-weight, large-scale hybrid-attention reasoning model.
It features a hybrid Mixture-of-Experts (MoE) architecture and lightning attention mechanism.
Supports a context length of 1 million tokens, 8x that of DeepSeek R1.
Consumes 25% of the FLOPs compared to DeepSeek R1 at 100K token generation.
Trained using large-scale reinforcement learning on diverse tasks.
Introduces CISPO, a novel algorithm for efficient RL scaling.
Two versions available: MiniMax-M1-40K and MiniMax-M1-80K.
Outperforms models like DeepSeek-R1 and Qwen3-235B on complex tasks.
Benchmarked across mathematics, coding, software engineering, and more.
Supports function calling and can be deployed using vLLM or Transformers.

Hasty Briefsbeta