Evaluation of Claude Mythos Preview's cyber capabilities

2 days ago

The AI Security Institute (AISI) evaluated Anthropic's Claude Mythos Preview, finding it surpasses previous frontier models in cybersecurity capabilities.
Mythos Preview succeeded in 73% of expert-level capture-the-flag (CTF) challenges, a level no model had completed before April 2025.
In the multi-step cyber range 'The Last Ones' (TLO), Mythos Preview solved the 32-step attack simulation from start to finish in 3 out of 10 attempts, averaging 22 steps completed.
The model showed limitations, such as being unable to complete the operational technology-focused range 'Cooling Tower,' though it struggled with IT sections rather than OT-specific issues.
Performance is expected to improve with more inference compute, as Mythos Preview's capabilities scale up to the 100M token budget used in evaluations.
While capable of attacking vulnerable enterprise systems, evaluations lack real-world defenses like active monitoring, making it uncertain if Mythos Preview can breach well-defended environments.
Organizations should prioritize cybersecurity basics, such as regular updates and robust access controls, as models with these capabilities become more common, and invest in defense to counter future AI threats.

Hasty Briefsbeta