Claude Opus 4.1

10 months ago

Claude Opus 4.1 is released as an upgrade to Claude Opus 4, focusing on agentic tasks, real-world coding, and reasoning.
The model is now available to paid Claude users, in Claude Code, and on API platforms like Amazon Bedrock and Google Cloud's Vertex AI, with pricing unchanged from Opus 4.
Opus 4.1 achieves 74.5% on SWE-bench Verified, improving coding performance and excelling in multi-file code refactoring and debugging tasks.
Notable improvements include better detail tracking, agentic search, and precision in code corrections without unnecessary adjustments or bugs.
Users are recommended to upgrade from Opus 4 to Opus 4.1, with developers advised to use 'claude-opus-4-1-20250805' via the API.
Feedback is encouraged to aid in the development of future models, with larger improvements expected in the coming weeks.
Benchmark results vary, with some achieved without extended thinking (e.g., SWE-bench Verified) and others with extended thinking (e.g., TAU-bench, GPQA Diamond).
Methodologies for benchmarks like TAU-bench and SWE-bench are detailed, highlighting changes in tools and thinking modes used.

Hasty Briefsbeta