Type-constrained code generation with language models
a year ago
- #code generation
- #machine learning
- #type systems
- Large language models (LLMs) have achieved success in code generation but often produce uncompilable output due to lack of formal code modeling.
- Constrained decoding has been used for domain-specific languages or syntactic features but struggles with typing errors in general-purpose programming languages.
- A type-constrained decoding approach is introduced, leveraging type systems to guide code generation and enforce well-typedness.
- Novel prefix automata and a search over inhabitable types are developed to ensure soundness in LLM-generated code.
- The approach is formalized on a simply-typed language and extended to TypeScript for practicality.
- Evaluation on HumanEval and MBPP datasets shows the approach reduces compilation errors by more than half and improves functional correctness.
- The method is effective across various LLM sizes and model families, including models with over 30B parameters.
- Results demonstrate the generality and effectiveness of constraining LLM code generation with formal type system rules.