Project Glasswing: what Mythos showed us

2 days ago

Cloudflare tested Anthropic's Mythos Preview, a security-focused LLM, on their infrastructure to identify vulnerabilities and understand potential attacker capabilities.
Mythos Preview stands out for its ability to construct exploit chains by combining multiple vulnerabilities into working proofs of concept, akin to a senior researcher.
The model can generate and test proofs of concept by writing, compiling, and running code to verify vulnerabilities, reducing speculation.
Mythos Preview exhibited inconsistent refusals on legitimate security tasks, showing emergent guardrails that are unreliable as safety boundaries.
Signal-to-noise issues persist in AI vulnerability scanning, with false positives common in memory-unsafe languages and model over-reporting.
A generic coding agent approach is ineffective for vulnerability research due to context and throughput limitations relative to real codebases.
Cloudflare developed a harness to manage the model, focusing on narrow scopes, adversarial review, parallel tasks, and structured workflows.
The harness includes stages like Recon, Hunt, Validate, Gapfill, Dedupe, Trace, Feedback, and Report for comprehensive vulnerability discovery.
Security teams should prioritize architectural defenses and orchestrated patching over mere speed, as faster patching without robust processes can introduce new bugs.
Cloudflare emphasizes applying these principles to protect customer applications, acknowledging that offensive and defensive AI capabilities will evolve rapidly.

Hasty Briefsbeta

Project Glasswing: what Mythos showed us