AI made every test pass, but the code was still wrong

7 days ago

Doodledapp converts visual node graphs into Solidity smart contracts.
The team tested 17 real-world contracts using roundtrip testing to validate the converter.
AI-generated tests passed all checks on the first run, revealing a flaw in testing methodology.
The AI tested the implementation, not the intent, confirming what the code did rather than its correctness.
Researchers identified this as the 'ground truth problem'—AI lacks an independent source of truth.
The team restructured the approach to compare contracts at the AST level for semantic correctness.
The revised method successfully identified and fixed bugs by comparing against original contracts.
Key takeaway: AI needs a reference point to validate correctness, not just implementation.

Hasty Briefsbeta