Show HN: Luminal – Open-source, search-based GPU compiler
4 days ago
- #rust
- #deep-learning
- #compiler
- Luminal is a deep learning library using search-based compilation for high performance.
- To run the demo on Mac, clone the repo and follow the given commands.
- Transitioning to '2.0' with large-scale kernel search, simplifying the compiler stack.
- Example code provided for setting up a graph and performing matrix multiplication.
- Llama 3 8B can be run locally using Luminal, with setup and run instructions provided.
- Luminal aims to be the fastest ML framework, supporting Q8 Llama 3 8B on M-series Macbooks.
- Core library is minimal, with 12 primitive ops supporting transformers and convnets.
- Compiles ops into complex GPU kernels for high performance.
- Uses exhaustive search for optimizations, enabling automatic derivation of complex rewrites.
- Written in Rust, interacting directly with CUDA/Metal APIs without abstractions.
- Emphasizes correctness with extensive testing against PyTorch implementations.
- Ahead-of-time compilation approach, similar to XLA and tinygrad, for better performance.
- Supports aggressive kernel fusion, shape-specific kernels, and handling devices/dtypes via compilers.
- Current features include Metal/CUDA support, full training, and implementations of models like Llama 3.
- Roadmap includes expanding search space, improving CUDA, adding Blackwell intrinsics, and more.