Can you reverse engineer our neural network?

3 days ago

The article discusses a unique ML puzzle where users are given a complete specification of a neural network, including weights, and must use mechanistic interpretability to reverse engineer it.
The puzzle was designed to output 0 for almost all inputs, making it challenging to brute force a solution without understanding the network's underlying mechanism.
A solver named Alex used various methods, including linear programming and SAT solvers, to reduce the network's complexity and identify its core function.
Alex discovered that the network was implementing the MD5 hash function, but with a bug that caused incorrect outputs for inputs longer than 32 characters.
Despite extensive efforts, brute-forcing the hash with a large word list eventually led to the solution, revealing the puzzle's intended simplicity.
The success of this puzzle inspired the creation of another ML puzzle, involving reassembling a jumbled neural network.

Hasty Briefsbeta