- Users must be signed in to change notification settings on GitHub.
- A tutorial on persistent attention in Gluon was discussed in issue #7298.
- Performance improvements were noted when converting the kernel to persistent, especially for fp8 with 'cutlass' in the kernel name.
- There was a performance drop in fp16 at large contexts due to a ptxas instruction scheduling issue in the softmax partition.
- The discussion includes layout mismatches and optimizations related to kernel naming affecting performance.
- Accuracy checks and the impact of kernel names on performance were debated among contributors.