GPT-5 vs. Sonnet: Complex Agentic Coding

9 months ago

OpenAI released GPT-5, promoting it as their best model for agentic coding.
GPT-5 was tested against Claude 4 Sonnet for porting a TypeScript tool (Ruler) to Rust.
GPT-5 demonstrated high intelligence, agency, and precise instruction-following but produced messy code.
Claude 4 Sonnet was faster and produced elegant, maintainable code but was less disciplined and required more iterations.
GPT-5 got stuck twice but recovered, while Claude never got stuck but had unreliable status reports.
GitHub Copilot Chat requires manual approval for terminal commands, limiting autonomy.
Both models performed well, with GPT-5 excelling in intelligence and Claude in code elegance.

Hasty Briefsbeta