Hasty Briefsbeta

Show HN: Luminal – Open-source, search-based GPU compiler

4 days ago
  • #rust
  • #deep-learning
  • #compiler
  • Luminal is a deep learning library using search-based compilation for high performance.
  • To run the demo on Mac, clone the repo and follow the given commands.
  • Transitioning to '2.0' with large-scale kernel search, simplifying the compiler stack.
  • Example code provided for setting up a graph and performing matrix multiplication.
  • Llama 3 8B can be run locally using Luminal, with setup and run instructions provided.
  • Luminal aims to be the fastest ML framework, supporting Q8 Llama 3 8B on M-series Macbooks.
  • Core library is minimal, with 12 primitive ops supporting transformers and convnets.
  • Compiles ops into complex GPU kernels for high performance.
  • Uses exhaustive search for optimizations, enabling automatic derivation of complex rewrites.
  • Written in Rust, interacting directly with CUDA/Metal APIs without abstractions.
  • Emphasizes correctness with extensive testing against PyTorch implementations.
  • Ahead-of-time compilation approach, similar to XLA and tinygrad, for better performance.
  • Supports aggressive kernel fusion, shape-specific kernels, and handling devices/dtypes via compilers.
  • Current features include Metal/CUDA support, full training, and implementations of models like Llama 3.
  • Roadmap includes expanding search space, improving CUDA, adding Blackwell intrinsics, and more.