Hasty Briefsbeta

Bilingual

Qwen2.5-Coder-3B Fine-Tuned for Triton Kernel Gen

9 months ago
  • #AI
  • #Triton
  • #PyTorch
  • Qwen2.5-Coder-3B-KernelBook is a fine-tuned model for transpiling PyTorch nn.Module code into Triton kernels.
  • Trained on the GPUMODE/KernelBook dataset with 18,162 PyTorch-Triton code pairs generated by torch.compile.
  • Uses Low-Rank Adaptation (LoRA) for fine-tuning with PyTorch 2.5.0, Transformers, PEFT, and TRL frameworks.
  • Achieved a final training loss of 0.0922 and mean token accuracy of 98.34% in 1 hour 37 minutes on an NVIDIA H100 80GB.
  • Key hyperparameters include learning_rate: 2e-4, batch size: 1, gradient accumulation steps: 8, and max_seq_length: 4096.
  • Example usage provided for generating Triton kernels from PyTorch code using the Hugging Face Transformers library.
  • Dataset details: 18,162 PyTorch-Triton pairs, created with torch.compile, recommended for use with torch==2.5.0.
  • Base model Qwen2.5-Coder-3B has 3.09B parameters, 32,768 token context length, and uses RoPE, SwiGLU, RMSNorm.
  • Citations provided for both the KernelBook dataset and the Qwen2.5-Coder base model.