Chess engines do weird stuff
7 days ago
- #machine-learning
- #chess-engines
- #reinforcement-learning
- Chess engines can be trained using RL by having the engine play itself and predict game outcomes, but distillation from a bad model + search to a good model is more efficient.
- Distillation from search is powerful, offering significant improvements over best-of-n in RL, making test-time search less necessary.
- A recent technique applies distillation at runtime, allowing the network to adapt live by adjusting evaluations based on search results.
- The training objective focuses on winning rather than just position estimation, leading to techniques like SPSA that perturb weights randomly to find winning directions.
- SPSA is expensive but effective, providing substantial ELO gains equivalent to model size increases or years of development effort.
- SPSA can tune any parameter in a chess program, including heuristic values in C++ code, optimizing for winning outcomes.
- lc0 uses a transformer architecture with 'smolgen' for attention biases, offering significant accuracy improvements despite a throughput hit.