Hasty Briefsbeta

Bilingual

Evaluation of Claude Mythos Preview's cyber capabilities

2 days ago
  • #AI Security
  • #Autonomous Attacks
  • #Cybersecurity Evaluation
  • The AI Security Institute (AISI) evaluated Anthropic's Claude Mythos Preview, finding it surpasses previous frontier models in cybersecurity capabilities.
  • Mythos Preview succeeded in 73% of expert-level capture-the-flag (CTF) challenges, a level no model had completed before April 2025.
  • In the multi-step cyber range 'The Last Ones' (TLO), Mythos Preview solved the 32-step attack simulation from start to finish in 3 out of 10 attempts, averaging 22 steps completed.
  • The model showed limitations, such as being unable to complete the operational technology-focused range 'Cooling Tower,' though it struggled with IT sections rather than OT-specific issues.
  • Performance is expected to improve with more inference compute, as Mythos Preview's capabilities scale up to the 100M token budget used in evaluations.
  • While capable of attacking vulnerable enterprise systems, evaluations lack real-world defenses like active monitoring, making it uncertain if Mythos Preview can breach well-defended environments.
  • Organizations should prioritize cybersecurity basics, such as regular updates and robust access controls, as models with these capabilities become more common, and invest in defense to counter future AI threats.