Chess engines do weird stuff

7 days ago

Chess engines can be trained using RL by having the engine play itself and predict game outcomes, but distillation from a bad model + search to a good model is more efficient.
Distillation from search is powerful, offering significant improvements over best-of-n in RL, making test-time search less necessary.
A recent technique applies distillation at runtime, allowing the network to adapt live by adjusting evaluations based on search results.
The training objective focuses on winning rather than just position estimation, leading to techniques like SPSA that perturb weights randomly to find winning directions.
SPSA is expensive but effective, providing substantial ELO gains equivalent to model size increases or years of development effort.
SPSA can tune any parameter in a chess program, including heuristic values in C++ code, optimizing for winning outcomes.
lc0 uses a transformer architecture with 'smolgen' for attention biases, offering significant accuracy improvements despite a throughput hit.

Hasty Briefsbeta