Claude Opus 4.1 - anthropic.com
Aug 5, 2025 · Today we're releasing Claude Opus 4.1, an upgrade to Claude Opus 4 on agentic tasks, real-world coding, and reasoning. We plan to release substantially larger improvements …
Create strong empirical evaluations - Claude Docs
Learn how to craft prompts that maximize your eval scores. More code examples of human-, code-, and LLM-graded evals.
Claude Opus 4.1 vs Claude Opus 4 – How good is this upgrade?
Aug 6, 2025 · GitHub’s Evaluation: Claude Opus 4.1 demonstrates notable performance gains in multi-file code refactoring, surpassing Opus 4 in tasks that require nuanced understanding and …
Introducing Claude Opus 4.5 \ Anthropic
Nov 24, 2025 · Claude Opus 4.5 sets a new standard for Excel automation and financial modeling. Accuracy on our internal evals improved 20%, efficiency rose 15%, and complex tasks that once …
Claude Opus 4.5 \ Anthropic
Aug 5, 2025 · Claude Opus 4.5 sets a new standard for Excel automation and financial modeling. Accuracy on our internal evals improved 20%, efficiency rose 15%, and complex tasks that once …
An update on our preliminary evaluations of Claude 3.5 Sonnet ...
Jan 31, 2025 · METR conducted preliminary evaluations of Anthropic’s upgraded Claude 3.5 Sonnet (October 2024 release), and a pre-deployment checkpoint of OpenAI’s o1. In both cases, …
Claude Opus 4 and Claude Sonnet 4 Evaluation Results
May 25, 2025 · A detailed analysis of Claude Opus 4 and Claude Sonnet 4 performance on coding and writing tasks, with comparisons to GPT-4.1, DeepSeek V3, and other leading models.