Quit Emailing Yourself

Claude Opus 4.6: System Card Part 1: Mundane Alignment + MW

7 min read | Saved February 14, 2026 | Copied!

claude 🤖 opus 🤖 evaluation 🤖 safety 🤖 ai 🤖

Do you care about this?

This article reviews the Claude Opus 4.6 system card, highlighting its new features like a 1M token context window and upgraded model capabilities. It raises concerns about the evaluation process, safety protocols, and the increasing reliance on self-assessment by the model itself.

If you do, here's more

Claude Opus 4.6 has been released with significant updates, including a 1 million token context window and improved performance on various everyday tasks. The model shows advancements in evaluations like Terminal-Bench 2.0 and GDPval-AA. Claude Code now features Agent Teams and a new fast mode, though this is expensive. The pricing remains unchanged from Opus 4.5 at $5/$25, unless opting for the ultra-fast mode. Notably, refusals for harmless requests have dropped to just 0.04%.

The article raises concerns about safety and evaluation processes. Anthropic's testing procedures seem to be deteriorating, with capabilities advancing faster than the ability to maintain effective evaluations. The reliance on subjective assessments, or "vibes," instead of robust testing is alarming. Automation in evaluation, where Claude evaluates itself with minimal human oversight, increases the risk of misalignment. The article highlights the need for independent third-party evaluations with real authority, especially as the stakes grow higher. Critics like Peter Wildeford point out that Anthropic is using Opus 4.6 to test its evaluation infrastructure, which poses risks if the model's capabilities influence its own assessments.

Overall, the rapid pace of model releases, with just a month or two between them, raises serious questions about the adequacy of testing. Many experts express skepticism about Anthropic's ability to ensure safety, especially with the potential for groupthink among employees during evaluations. The article suggests that as the consequences of AI development become more significant, the rigor of safety protocols must also increase, yet the opposite seems to be happening.

Questions about this article

No questions yet.