Project Glasswing: what Mythos showed us
2 days ago
- #p
- #A
- #g
- #e
- #R
- #r
- #H
- #a
- #u
- #c
- #t
- #o
- #m
- #v
- #h
- #D
- #M
- #i
- #V
- #
- #,
- #y
- #L
- #s
- #b
- #S
- #I
- #n
- #T
- #l
- Cloudflare tested Anthropic's Mythos Preview, a security-focused LLM, on their infrastructure to identify vulnerabilities and understand potential attacker capabilities.
- Mythos Preview stands out for its ability to construct exploit chains by combining multiple vulnerabilities into working proofs of concept, akin to a senior researcher.
- The model can generate and test proofs of concept by writing, compiling, and running code to verify vulnerabilities, reducing speculation.
- Mythos Preview exhibited inconsistent refusals on legitimate security tasks, showing emergent guardrails that are unreliable as safety boundaries.
- Signal-to-noise issues persist in AI vulnerability scanning, with false positives common in memory-unsafe languages and model over-reporting.
- A generic coding agent approach is ineffective for vulnerability research due to context and throughput limitations relative to real codebases.
- Cloudflare developed a harness to manage the model, focusing on narrow scopes, adversarial review, parallel tasks, and structured workflows.
- The harness includes stages like Recon, Hunt, Validate, Gapfill, Dedupe, Trace, Feedback, and Report for comprehensive vulnerability discovery.
- Security teams should prioritize architectural defenses and orchestrated patching over mere speed, as faster patching without robust processes can introduce new bugs.
- Cloudflare emphasizes applying these principles to protect customer applications, acknowledging that offensive and defensive AI capabilities will evolve rapidly.