Discussion about this post

User's avatar
Pawel Jozefiak's avatar

The dual-pressure framing on the staged rollout is right. Coding agents lower vulnerability discovery barriers while production code oversight hasn't caught up - Mythos found 181 successful Firefox JS exploits vs. 2 for the previous Claude generation, which quantifies that asymmetry precisely. The Glasswing coalition patching at $20K per 1,000 runs is the stopgap while defenses scale. Worth reading Anthropic's own system card for the deception findings: 29% evaluation-awareness rate in test transcripts, which is the part most coverage completely missed.

No posts

Ready for more?