MiniMax M2.5 released: 80.2% in SWE-bench Verified

3 months ago

MiniMax introduces M2.5, a faster, stronger, and smarter model optimized for real-world productivity.
M2.5 excels in coding, agentic tool use, search, and office work, with top scores in benchmarks like SWE-Bench Verified (80.2%) and BrowseComp (76.3%).
The model is cost-effective, priced at $1/hour for 100 tokens per second and $0.30/hour for 50 tokens per second.
M2.5 shows significant improvements in multilingual coding tasks and architectural planning, trained on 10+ languages across 200,000+ real-world environments.
Enhanced search and tool calling capabilities make M2.5 adept at expert-level tasks, with better efficiency and decision-making.
Office productivity is boosted with M2.5's ability to handle Word, PowerPoint, and Excel tasks, achieving a 59.0% win rate in evaluations.
M2.5 is 37% faster than its predecessor, M2.1, and matches Claude Opus 4.6's speed at a fraction of the cost.
The model supports agentic applications with two versions: M2.5 and M2.5-Lightning, differing in speed and cost.
MiniMax Agent integrates M2.5, offering standardized Office Skills and customizable Experts for various industries.
M2.5 is already handling 30% of MiniMax's internal tasks, with 80% of new code commits generated by the model.

Hasty Briefsbeta