Can you reverse engineer our neural network?
3 days ago
- #ML Puzzle
- #Mechanistic Interpretability
- #Neural Networks
- The article discusses a unique ML puzzle where users are given a complete specification of a neural network, including weights, and must use mechanistic interpretability to reverse engineer it.
- The puzzle was designed to output 0 for almost all inputs, making it challenging to brute force a solution without understanding the network's underlying mechanism.
- A solver named Alex used various methods, including linear programming and SAT solvers, to reduce the network's complexity and identify its core function.
- Alex discovered that the network was implementing the MD5 hash function, but with a bug that caused incorrect outputs for inputs longer than 32 characters.
- Despite extensive efforts, brute-forcing the hash with a large word list eventually led to the solution, revealing the puzzle's intended simplicity.
- The success of this puzzle inspired the creation of another ML puzzle, involving reassembling a jumbled neural network.