Ultrafast machine learning on FPGAs via Kolmogorov-Arnold Networks
3 hours ago
- #Machine Learning
- #FPGA
- #KAN
- Master's thesis explores hardware architectures for ultrafast inference and online learning using Kolmogorov-Arnold Networks (KANs) on FPGAs.
- KANs replace learnable weights and fixed activation functions in MLPs with learnable univariate activation functions, offering potential improvements in scaling and parameter efficiency.
- Fixed-point quantization is used to encode real numbers as bitstrings, enabling neural networks to be implemented as digital logic on FPGAs with minimal approximation error.
- KANs are naturally suited for lookup-table neural networks (LUT-NNs) due to their univariate activations, avoiding exponential scaling issues of multivariate LUTs and enabling efficient pruning.
- For inference, KAN activations are stored as LUTs on FPGAs, achieving a 2700x speedup over prior implementations and surpassing state-of-the-art FPGA accelerators in latency and resource usage.
- Online learning on FPGAs is enabled by storing B-spline basis functions in LUTs and updating coefficients in real-time, leveraging locality and boundedness for stable, sub-microsecond gradient updates.
- B-spline locality ensures only a small subset of basis functions are active per input, scaling hardware logic with polynomial order rather than grid size, improving expressivity without resource overhead.
- Stable fixed-point training in KANs is achieved because activations and gradients are bounded within coefficient ranges, reducing quantization error and enhancing learning stability compared to MLPs.
- Implementation demonstrates KAN-based online learners can handle over 100,000 parameters with sub-microsecond latency, showing better hardware scaling and convergence on benchmarks like function approximation and quantum control.
- Conclusion highlights that KAN properties, such as activation mapping to LUTs and B-spline characteristics, are highly advantageous for custom hardware accelerators, enabling nanosecond-latency inference and efficient real-time learning.