Hasty Briefsbeta

Bilingual

Better Activation Functions for NNUE

3 days ago
  • #NNUE
  • #Activation Functions
  • #Deep Learning
  • Experimented with replacing SCReLUs in Viridithas's NNUE with Swish in layers L₁ and L₂.
  • Encountered teething problems with Hard-Swish leading to lower sparsity in L₀ output activations, affecting performance.
  • Solved the sparsity issue by adding a regularization term to the loss function, penalizing dense activations.
  • Swish networks showed smoother evaluation scale and significant Elo improvements over SCReLU baselines.
  • Further strength improvements were achieved by replacing Swish with SwiGLU in L₂.
  • Final activation sequence in Viridithas resembles smooth versions of CReLU and SCReLU, similar to findings in PlentyChess.
  • Author expresses enthusiasm for integrating more deep learning techniques into NNUE design, hinting at future explorations.