Claude Opus 4.1
9 months ago
- #AI
- #Machine Learning
- #Coding
- Claude Opus 4.1 is released as an upgrade to Claude Opus 4, focusing on agentic tasks, real-world coding, and reasoning.
- The model is now available to paid Claude users, in Claude Code, and on API platforms like Amazon Bedrock and Google Cloud's Vertex AI, with pricing unchanged from Opus 4.
- Opus 4.1 achieves 74.5% on SWE-bench Verified, improving coding performance and excelling in multi-file code refactoring and debugging tasks.
- Notable improvements include better detail tracking, agentic search, and precision in code corrections without unnecessary adjustments or bugs.
- Users are recommended to upgrade from Opus 4 to Opus 4.1, with developers advised to use 'claude-opus-4-1-20250805' via the API.
- Feedback is encouraged to aid in the development of future models, with larger improvements expected in the coming weeks.
- Benchmark results vary, with some achieved without extended thinking (e.g., SWE-bench Verified) and others with extended thinking (e.g., TAU-bench, GPQA Diamond).
- Methodologies for benchmarks like TAU-bench and SWE-bench are detailed, highlighting changes in tools and thinking modes used.