Hasty Briefsbeta

Bilingual

Systematically generating tests that would have caught Anthropic's top‑K bug

4 months ago
  • #bugs
  • #automation
  • #testing
  • Most testing strategies miss rare edge cases until customers find them in production.
  • The system automatically generates targeted unit tests for rare bugs, including Anthropic’s approximate top-K bug.
  • Fractional proof decomposition is used to generate unit tests without relying on bug reproducer code.
  • The process involves identifying and encoding the theorem as a PBT (Property-Based Test).
  • The theorem is recursively decomposed into smaller theorems, each encoded as PBTs.
  • Decomposition continues until the input space is small enough to efficiently catch rare bugs.
  • Fractional proofs scale compute logarithmically with the rarity of the bug, making it efficient.
  • The approach can be extended to real-world codebases and cluster behaviors.
  • Theorem is training models to automatically reason about program correctness.