GPT-5.5 matches hyped Mythos Preview
7 hours ago
- #Cybersecurity
- #Benchmarking
- #AI Models
- Anthropic restricted the release of its Mythos Preview model to 'critical industry partners' citing cybersecurity threats.
- AISI research indicates OpenAI's GPT-5.5 performs similarly to Mythos Preview on cybersecurity evaluations, including expert tasks and complex challenges.
- In expert Capture the Flag tasks, GPT-5.5 averaged 71.4% success versus Mythos Preview's 68.6%, with GPT-5.5 solving a difficult Rust disassembler challenge in 10 minutes.
- Both GPT-5.5 and Mythos Preview succeeded in AISI's TLO test simulating data extraction attacks, where no previous model had succeeded.
- GPT-5.5 and other models fail at AISI's more difficult 'Cooling Tower' simulation, which involves disrupting power plant control software.