Type-constrained code generation with language models

a year ago

Large language models (LLMs) have achieved success in code generation but often produce uncompilable output due to lack of formal code modeling.
Constrained decoding has been used for domain-specific languages or syntactic features but struggles with typing errors in general-purpose programming languages.
A type-constrained decoding approach is introduced, leveraging type systems to guide code generation and enforce well-typedness.
Novel prefix automata and a search over inhabitable types are developed to ensure soundness in LLM-generated code.
The approach is formalized on a simply-typed language and extended to TypeScript for practicality.
Evaluation on HumanEval and MBPP datasets shows the approach reduces compilation errors by more than half and improves functional correctness.
The method is effective across various LLM sizes and model families, including models with over 30B parameters.
Results demonstrate the generality and effectiveness of constraining LLM code generation with formal type system rules.

Hasty Briefsbeta