Hasty Briefsbeta

Bilingual

MiniMax-M1 open-weight, large-scale hybrid-attention reasoning model

a year ago
  • #AI
  • #Machine Learning
  • #Natural Language Processing
  • MiniMax-M1 is the world's first open-weight, large-scale hybrid-attention reasoning model.
  • It features a hybrid Mixture-of-Experts (MoE) architecture and lightning attention mechanism.
  • Supports a context length of 1 million tokens, 8x that of DeepSeek R1.
  • Consumes 25% of the FLOPs compared to DeepSeek R1 at 100K token generation.
  • Trained using large-scale reinforcement learning on diverse tasks.
  • Introduces CISPO, a novel algorithm for efficient RL scaling.
  • Two versions available: MiniMax-M1-40K and MiniMax-M1-80K.
  • Outperforms models like DeepSeek-R1 and Qwen3-235B on complex tasks.
  • Benchmarked across mathematics, coding, software engineering, and more.
  • Supports function calling and can be deployed using vLLM or Transformers.