Triton Bespoke Layouts
4 days ago
- #Layouts
- #Triton
- #GPU
- Bespoke layouts are traditional layouts like blocked/shared/MMA layouts, tailored for specific needs.
- Bespoke layouts were initially used for straightforward hardware tensor ownership patterns but became complex with kernel optimizations.
- Linear layouts were introduced as a generic mechanism to unify layout conversions and optimizations.
- Blocked layouts describe tensor element distribution for efficient global memory access in the SIMT model.
- MMA layouts represent vendor-specific tensor/matrix core unit layouts for computation.
- Dot operand layouts describe A/B matrix layouts for tensor/matrix core operations.
- Shared layouts (swizzled and padded) manage shared memory usage to avoid bank conflicts.
- Swizzled shared layout uses XOR operations for bank conflict resolution, while padded shared layout uses padding.
- Padded shared layout is less generic but necessary for certain hardware features like AMD's GLOBAL_LOAD_LDS_* intrinsics.
- Bespoke and linear layouts complement each other, with bespoke layouts being intuitive and linear layouts enabling generic optimizations.