Qwen2.5-Coder-3B Fine-Tuned for Triton Kernel Gen
9 months ago
- #AI
- #Triton
- #PyTorch
- Qwen2.5-Coder-3B-KernelBook is a fine-tuned model for transpiling PyTorch nn.Module code into Triton kernels.
- Trained on the GPUMODE/KernelBook dataset with 18,162 PyTorch-Triton code pairs generated by torch.compile.
- Uses Low-Rank Adaptation (LoRA) for fine-tuning with PyTorch 2.5.0, Transformers, PEFT, and TRL frameworks.
- Achieved a final training loss of 0.0922 and mean token accuracy of 98.34% in 1 hour 37 minutes on an NVIDIA H100 80GB.
- Key hyperparameters include learning_rate: 2e-4, batch size: 1, gradient accumulation steps: 8, and max_seq_length: 4096.
- Example usage provided for generating Triton kernels from PyTorch code using the Hugging Face Transformers library.
- Dataset details: 18,162 PyTorch-Triton pairs, created with torch.compile, recommended for use with torch==2.5.0.
- Base model Qwen2.5-Coder-3B has 3.09B parameters, 32,768 token context length, and uses RoPE, SwiGLU, RMSNorm.
- Citations provided for both the KernelBook dataset and the Qwen2.5-Coder base model.