Claude Fable 5 vs. GPT-5.5: Better Planning, Similar Execution

4 hours ago

Claude Fable 5 outperformed GPT-5.5 in planning, scoring 9.1 vs 8.3 on a rubric, due to better judgment and attention to failure modes.
When implementing the same detailed plan, both models produced functionally identical services, passing all acceptance checks with identical rollout behavior.
GPT-5.5's implementation was significantly cheaper ($6.30) compared to Claude Fable 5 ($16.66), offering a 62% cost reduction for execution.
Mixing models—planning with Claude Fable 5 and executing with GPT-5.5—resulted in a 59% cost saving while maintaining the same quality.
Both plans agreed on core algorithm for sticky feature flag rollouts but differed in design decisions like environment inclusion in hashing and API key hashing methods.
Both implementations adhered closely to the plan, with GPT-5.5 even following decisions contrary to its own planning output without deviation.
The gap in model performance was most evident in planning phase; once a detailed plan was provided, execution quality converged regardless of model used.

Hasty Briefsbeta