GPT-5.5 matches hyped Mythos Preview

23 days ago

Anthropic restricted the release of its Mythos Preview model to 'critical industry partners' citing cybersecurity threats.
AISI research indicates OpenAI's GPT-5.5 performs similarly to Mythos Preview on cybersecurity evaluations, including expert tasks and complex challenges.
In expert Capture the Flag tasks, GPT-5.5 averaged 71.4% success versus Mythos Preview's 68.6%, with GPT-5.5 solving a difficult Rust disassembler challenge in 10 minutes.
Both GPT-5.5 and Mythos Preview succeeded in AISI's TLO test simulating data extraction attacks, where no previous model had succeeded.
GPT-5.5 and other models fail at AISI's more difficult 'Cooling Tower' simulation, which involves disrupting power plant control software.

Hasty Briefsbeta