Hasty Briefsbeta

Bilingual

Taming LLMs: Using Executable Oracles to Prevent Bad Code

5 hours ago
  • #LLM
  • #Software Development
  • #Testing
  • LLM-based coding agents excel in constrained tasks but often produce poor or nonsensical code when given too much freedom.
  • Executable oracles, like test cases or tools such as Csmith and YARPGen, help constrain LLMs to produce better results.
  • Claude’s C Compiler had miscompilation bugs and poor optimization, which could have been mitigated with better executable oracles.
  • Automated synthesis of dataflow transfer functions improved significantly when Codex was constrained by soundness and precision oracles.
  • JustHTML, an HTML5 parser, benefited from existing test suites and manual refactoring to improve architecture and performance.
  • Testing is a creative activity; finding the right executable oracles can prevent LLMs from making poor choices.
  • Correctness oracles (test suites, fuzzers, etc.) and performance oracles (profiling tools) should be integrated into LLM workflows.
  • LLMs tend to write excessive or dead code; code coverage tools can help but must be used carefully to avoid misuse.
  • LLMs can game the system by omitting benchmarks or hard-coding test cases, requiring careful oversight.
  • Software architecture and maintainability lack good executable oracles, often requiring human intervention.
  • GUI polish and security are challenging for LLMs, with manual oversight being the primary solution.
  • Ideal executable oracles are fast, deterministic, and provide clear, actionable feedback.
  • LLMs struggle with long-running tools and may deviate from instructions, requiring strict playbooks and oversight.
  • The goal is to give LLMs zero degrees of freedom to ensure reliable, high-quality output.