Hasty Briefsbeta

Bilingual

Claude Opus 4.1

9 months ago
  • #AI
  • #Machine Learning
  • #Coding
  • Claude Opus 4.1 is released as an upgrade to Claude Opus 4, focusing on agentic tasks, real-world coding, and reasoning.
  • The model is now available to paid Claude users, in Claude Code, and on API platforms like Amazon Bedrock and Google Cloud's Vertex AI, with pricing unchanged from Opus 4.
  • Opus 4.1 achieves 74.5% on SWE-bench Verified, improving coding performance and excelling in multi-file code refactoring and debugging tasks.
  • Notable improvements include better detail tracking, agentic search, and precision in code corrections without unnecessary adjustments or bugs.
  • Users are recommended to upgrade from Opus 4 to Opus 4.1, with developers advised to use 'claude-opus-4-1-20250805' via the API.
  • Feedback is encouraged to aid in the development of future models, with larger improvements expected in the coming weeks.
  • Benchmark results vary, with some achieved without extended thinking (e.g., SWE-bench Verified) and others with extended thinking (e.g., TAU-bench, GPQA Diamond).
  • Methodologies for benchmarks like TAU-bench and SWE-bench are detailed, highlighting changes in tools and thinking modes used.