Triton Bespoke Layouts

4 days ago

Copy Link

Bespoke layouts are traditional layouts like blocked/shared/MMA layouts, tailored for specific needs.
Bespoke layouts were initially used for straightforward hardware tensor ownership patterns but became complex with kernel optimizations.
Linear layouts were introduced as a generic mechanism to unify layout conversions and optimizations.
Blocked layouts describe tensor element distribution for efficient global memory access in the SIMT model.
MMA layouts represent vendor-specific tensor/matrix core unit layouts for computation.
Dot operand layouts describe A/B matrix layouts for tensor/matrix core operations.
Shared layouts (swizzled and padded) manage shared memory usage to avoid bank conflicts.
Swizzled shared layout uses XOR operations for bank conflict resolution, while padded shared layout uses padding.
Padded shared layout is less generic but necessary for certain hardware features like AMD's GLOBAL_LOAD_LDS_* intrinsics.
Bespoke and linear layouts complement each other, with bespoke layouts being intuitive and linear layouts enabling generic optimizations.

Hasty Briefsbeta