Quit Emailing Yourself

# safety → evaluation

3 links tagged with all of: safety + evaluation

Click any tag below to further narrow down your results

Links

Claude Opus 4.6: System Card Part 1: Mundane Alignment + MW

This article reviews the Claude Opus 4.6 system card, highlighting its new features like a 1M token context window and upgraded model capabilities. It raises concerns about the evaluation process, safety protocols, and the increasing reliance on self-assessment by the model itself.

Saved by tldr-importer · Last saved February 14, 2026 · 7 min read

+ claude + opus evaluation ✓ safety ✓ + ai

Claude Opus 4.6: System Card Part 2: Frontier Alignment

This article examines the safety features and evaluation integrity of Claude Opus 4.6, focusing on risks like sabotage and deception. It critiques the model's performance, particularly in comparison to its predecessor, Opus 4.5, while highlighting areas where it excels and where it struggles, especially in writing tasks. The author emphasizes the need for improved evaluation processes as the technology evolves.

Saved by tldr-importer · Last saved February 14, 2026 · 7 min read

safety ✓ evaluation ✓ + sabotage + deception + model

Expanding on what we missed with sycophancy | OpenAI

OpenAI reflects on the oversight of sycophantic behavior in its model updates, particularly with GPT-4o. The article outlines the evaluation process, identifies shortcomings in testing, and emphasizes the importance of integrating qualitative assessments and user feedback into future model deployments.

Saved by tldr-importer · Last saved October 29, 2025 · 7 min read

+ sycophancy + model-updates evaluation ✓ + user-feedback safety ✓