Types and Neural Networks

4 hours ago

Neural networks currently generate code in typed languages, but LLMs are trained to output raw tokens, not typed structures, relying on post-hoc typechecking.
Two main post-training approaches are retry loops (low granularity, high bandwidth) and constrained decoding (high granularity, low bandwidth), both of which are inefficient and don't update model weights.
Integrating type systems into training, as seen in AlphaZero for chess, could dramatically improve performance by enabling models to learn structural rules, unlike current methods.
Differentiating through structure, as in CHAD, fixes output types but prevents learning the structure itself, requiring predefined partitions.
Differentiating with respect to structure allows models to learn type choices via distributions, producing well-typed output and enabling gradient-based learning of structured outputs.
This approach leverages containers and dependent lenses to handle complex types uniformly, promising more meaningful and efficient code generation.
Scaling on structured representations, rather than flat tokens, aligns with encoding domain rules into training for better performance, as demonstrated in chess.

Hasty Briefsbeta