Hasty Briefsbeta

Bilingual

Type-constrained code generation with language models

a year ago
  • #code generation
  • #machine learning
  • #type systems
  • Large language models (LLMs) have achieved success in code generation but often produce uncompilable output due to lack of formal code modeling.
  • Constrained decoding has been used for domain-specific languages or syntactic features but struggles with typing errors in general-purpose programming languages.
  • A type-constrained decoding approach is introduced, leveraging type systems to guide code generation and enforce well-typedness.
  • Novel prefix automata and a search over inhabitable types are developed to ensure soundness in LLM-generated code.
  • The approach is formalized on a simply-typed language and extended to TypeScript for practicality.
  • Evaluation on HumanEval and MBPP datasets shows the approach reduces compilation errors by more than half and improves functional correctness.
  • The method is effective across various LLM sizes and model families, including models with over 30B parameters.
  • Results demonstrate the generality and effectiveness of constraining LLM code generation with formal type system rules.