Do we understand how neural networks work?

11 days ago

Copy Link

Neural networks are fundamentally made of matrices, a well-understood mathematical concept.
Training neural networks involves gradient descent, a calculus-based optimization method.
The objective of training (e.g., predicting the next token in LLMs) is clearly defined but leads to complex outcomes.
Despite understanding the training process, the end results (learned statistics) are complex and not fully understood.
LLMs are essentially advanced autocomplete systems, bundling statistics about language or images.
Mechanistic interpretability is a subfield attempting to reverse-engineer how neural networks function internally.
Examples like 'Golden Gate Claude' show limited but precise understanding of specific neural network features.
LLMs can perform tasks like arithmetic by developing internal, non-human-like methods.
Understanding neural networks is more art than science at the edge of current knowledge.
Practical use of neural networks doesn't require deep understanding, but research and safety do.

Hasty Briefsbeta