Do we understand how neural networks work?
10 days ago
- #interpretability
- #neural-networks
- #machine-learning
- Neural networks are fundamentally made of matrices, a well-understood mathematical concept.
- Training neural networks involves gradient descent, a calculus-based optimization method.
- The objective of training (e.g., predicting the next token in LLMs) is clearly defined but leads to complex outcomes.
- Despite understanding the training process, the end results (learned statistics) are complex and not fully understood.
- LLMs are essentially advanced autocomplete systems, bundling statistics about language or images.
- Mechanistic interpretability is a subfield attempting to reverse-engineer how neural networks function internally.
- Examples like 'Golden Gate Claude' show limited but precise understanding of specific neural network features.
- LLMs can perform tasks like arithmetic by developing internal, non-human-like methods.
- Understanding neural networks is more art than science at the edge of current knowledge.
- Practical use of neural networks doesn't require deep understanding, but research and safety do.