LLMs understand nullability

a year ago

Large language models (LLMs) like ChatGPT, Claude, and DeepSeek can write code in many domains, making programming accessible to non-technical users.
Key questions remain about LLMs' ability to write correct code independently and whether they truly 'understand' the code they generate.
Understanding in LLMs is measured through internal representations and 'thought processes,' which can be studied via model activations.
Code properties, such as nullability (whether a variable can be null), are easier to study rigorously than natural language concepts due to static analysis tools.
Experiments show that LLMs learn to infer nullability rules, with larger models performing better on complex type inference tasks.
A 'nullability probe' was developed to measure internal model states, revealing how LLMs represent and reason about nullable variables.
Models' understanding of nullability improves with training, but smaller models may regress in performance as training continues.
The study provides insights into LLMs' internal representations of programming concepts, paving the way for future research on higher-level code understanding.

Hasty Briefsbeta