Designing AI-resistant technical evaluations

2 months ago

Anthropic's performance engineering team uses a take-home test to evaluate candidates, which has evolved due to advancements in AI capabilities.
The original take-home test involved optimizing code for a simulated accelerator, designed to be engaging and representative of real work.
Claude Opus 4 outperformed most human applicants, leading to a redesign of the test to maintain its effectiveness.
Claude Opus 4.5 matched the performance of top human candidates, necessitating further changes to the test to distinguish human skills.
The new version of the take-home test focuses on unusual, constrained problems inspired by Zachtronics games to resist AI solutions.
Anthropic is releasing the original take-home as an open challenge, inviting candidates to outperform Claude's best performance.
The test has helped hire dozens of engineers, including those who contributed to significant projects like the Trainium cluster and Claude 3 Opus.

Hasty Briefsbeta