Helion: A High-Level DSL for Performant and Portable ML Kernels

7 months ago

Helion is a high-level Python-embedded domain-specific language (DSL) that compiles into optimized Triton code, bridging PyTorch's simplicity with low-level performance.
It automates tensor indexing, memory management, and hardware-specific tuning, allowing developers to focus on algorithmic logic rather than implementation details.
Helion's programming model, 'PyTorch with Tiles', minimizes boilerplate and leverages existing PyTorch knowledge, making kernel development more intuitive.
The autotuning engine in Helion automatically constructs and explores a vast search space for optimal kernel configurations, significantly reducing manual effort.
Performance benchmarks show Helion outperforming torch.compile and hand-written Triton kernels, with notable speedups on both NVIDIA and AMD GPUs.
Case studies demonstrate Helion's ability to match or exceed the performance of highly optimized hand-written kernels, such as those written in CuTe DSL or TileLang.
Helion's compiler architecture efficiently lowers Python functions into optimized Triton code, applying performance-critical configurations only at the final code generation stage.
Helion is set to be released in Beta on Oct. 22nd, 2025, aiming to provide a productive paradigm for performant machine learning kernels.

Hasty Briefsbeta