Anthropic's Argument for Mythos SWE-bench improvement contains a fatal error

8 hours ago

The graph in Mythos' system card shows that after filtering out solutions judged as memorized by an LLM at various confidence thresholds, Mythos maintains a higher pass rate than Opus 4.6.
The authors argue that their imperfect memorization detector consistently indicates genuine gains for Mythos across thresholds and internal benchmarks, suggesting memorization does not explain its SWE-bench improvements.
A counterargument is presented using a Python simulation to demonstrate that an imperfect cheating detector could consistently misjudge a model whose gains are entirely due to cheating, implying the detector's evidence holds no weight without quantifying its imperfection.

Hasty Briefsbeta