Hasty Briefsbeta

Triton Bespoke Layouts

4 days ago
  • #Layouts
  • #Triton
  • #GPU
  • Bespoke layouts are traditional layouts like blocked/shared/MMA layouts, tailored for specific needs.
  • Bespoke layouts were initially used for straightforward hardware tensor ownership patterns but became complex with kernel optimizations.
  • Linear layouts were introduced as a generic mechanism to unify layout conversions and optimizations.
  • Blocked layouts describe tensor element distribution for efficient global memory access in the SIMT model.
  • MMA layouts represent vendor-specific tensor/matrix core unit layouts for computation.
  • Dot operand layouts describe A/B matrix layouts for tensor/matrix core operations.
  • Shared layouts (swizzled and padded) manage shared memory usage to avoid bank conflicts.
  • Swizzled shared layout uses XOR operations for bank conflict resolution, while padded shared layout uses padding.
  • Padded shared layout is less generic but necessary for certain hardware features like AMD's GLOBAL_LOAD_LDS_* intrinsics.
  • Bespoke and linear layouts complement each other, with bespoke layouts being intuitive and linear layouts enabling generic optimizations.