Hasty Briefsbeta

Bilingual

Fp8 runs ~100 tflops faster when the kernel name has "cutlass" in it

10 months ago
  • #Performance
  • #Gluon
  • #GitHub
  • Users must be signed in to change notification settings on GitHub.
  • A tutorial on persistent attention in Gluon was discussed in issue #7298.
  • Performance improvements were noted when converting the kernel to persistent, especially for fp8 with 'cutlass' in the kernel name.
  • There was a performance drop in fp16 at large contexts due to a ptxas instruction scheduling issue in the softmax partition.
  • The discussion includes layout mismatches and optimizations related to kernel naming affecting performance.
  • Accuracy checks and the impact of kernel names on performance were debated among contributors.