Hasty Briefsbeta

Bilingual

Types and Neural Networks

4 hours ago
  • #Type Systems
  • #Differentiable Programming
  • #Neural Networks
  • Neural networks currently generate code in typed languages, but LLMs are trained to output raw tokens, not typed structures, relying on post-hoc typechecking.
  • Two main post-training approaches are retry loops (low granularity, high bandwidth) and constrained decoding (high granularity, low bandwidth), both of which are inefficient and don't update model weights.
  • Integrating type systems into training, as seen in AlphaZero for chess, could dramatically improve performance by enabling models to learn structural rules, unlike current methods.
  • Differentiating through structure, as in CHAD, fixes output types but prevents learning the structure itself, requiring predefined partitions.
  • Differentiating with respect to structure allows models to learn type choices via distributions, producing well-typed output and enabling gradient-based learning of structured outputs.
  • This approach leverages containers and dependent lenses to handle complex types uniformly, promising more meaningful and efficient code generation.
  • Scaling on structured representations, rather than flat tokens, aligns with encoding domain rules into training for better performance, as demonstrated in chess.