Types and Neural Networks
5 hours ago
- #Type Systems
- #Differentiable Programming
- #Neural Networks
- Neural networks currently generate code in typed languages, but LLMs are trained to output raw tokens, not typed structures, relying on post-hoc typechecking.
- Two main post-training approaches are retry loops (low granularity, high bandwidth) and constrained decoding (high granularity, low bandwidth), both of which are inefficient and don't update model weights.
- Integrating type systems into training, as seen in AlphaZero for chess, could dramatically improve performance by enabling models to learn structural rules, unlike current methods.
- Differentiating through structure, as in CHAD, fixes output types but prevents learning the structure itself, requiring predefined partitions.
- Differentiating with respect to structure allows models to learn type choices via distributions, producing well-typed output and enabling gradient-based learning of structured outputs.
- This approach leverages containers and dependent lenses to handle complex types uniformly, promising more meaningful and efficient code generation.
- Scaling on structured representations, rather than flat tokens, aligns with encoding domain rules into training for better performance, as demonstrated in chess.