Modern GPU Programming for MLSys
2 days ago
- #Machine Learning Systems
- #GPU Programming
- #Kernel Optimization
- Modern machine learning systems rely heavily on GPU kernels for performance.
- Recent GPU architectures have complex memory spaces and specialized execution units.
- The book covers GPU hardware understanding, programming with TIRx DSL, and building advanced kernels.
- Key optimization topics include data layout, asynchronous operations, and coordination.
- Examples include fast matrix multiplication (GEMM) and FlashAttention kernels.
- The book is organized into sections on GPU basics, TIRx overview, GEMM optimization, and Flash Attention implementation.