Hasty Briefsbeta

Bilingual

Chess engines do weird stuff

7 days ago
  • #machine-learning
  • #chess-engines
  • #reinforcement-learning
  • Chess engines can be trained using RL by having the engine play itself and predict game outcomes, but distillation from a bad model + search to a good model is more efficient.
  • Distillation from search is powerful, offering significant improvements over best-of-n in RL, making test-time search less necessary.
  • A recent technique applies distillation at runtime, allowing the network to adapt live by adjusting evaluations based on search results.
  • The training objective focuses on winning rather than just position estimation, leading to techniques like SPSA that perturb weights randomly to find winning directions.
  • SPSA is expensive but effective, providing substantial ELO gains equivalent to model size increases or years of development effort.
  • SPSA can tune any parameter in a chess program, including heuristic values in C++ code, optimizing for winning outcomes.
  • lc0 uses a transformer architecture with 'smolgen' for attention biases, offering significant accuracy improvements despite a throughput hit.