Hasty Briefsbeta

Bilingual

Modern GPU Programming for MLSys

2 days ago
  • #Machine Learning Systems
  • #GPU Programming
  • #Kernel Optimization
  • Modern machine learning systems rely heavily on GPU kernels for performance.
  • Recent GPU architectures have complex memory spaces and specialized execution units.
  • The book covers GPU hardware understanding, programming with TIRx DSL, and building advanced kernels.
  • Key optimization topics include data layout, asynchronous operations, and coordination.
  • Examples include fast matrix multiplication (GEMM) and FlashAttention kernels.
  • The book is organized into sections on GPU basics, TIRx overview, GEMM optimization, and Flash Attention implementation.