Qwen2.5-Coder-3B Fine-Tuned for Triton Kernel Gen

9 months ago

Qwen2.5-Coder-3B-KernelBook is a fine-tuned model for transpiling PyTorch nn.Module code into Triton kernels.
Trained on the GPUMODE/KernelBook dataset with 18,162 PyTorch-Triton code pairs generated by torch.compile.
Uses Low-Rank Adaptation (LoRA) for fine-tuning with PyTorch 2.5.0, Transformers, PEFT, and TRL frameworks.
Achieved a final training loss of 0.0922 and mean token accuracy of 98.34% in 1 hour 37 minutes on an NVIDIA H100 80GB.
Key hyperparameters include learning_rate: 2e-4, batch size: 1, gradient accumulation steps: 8, and max_seq_length: 4096.
Example usage provided for generating Triton kernels from PyTorch code using the Hugging Face Transformers library.
Dataset details: 18,162 PyTorch-Triton pairs, created with torch.compile, recommended for use with torch==2.5.0.
Base model Qwen2.5-Coder-3B has 3.09B parameters, 32,768 token context length, and uses RoPE, SwiGLU, RMSNorm.
Citations provided for both the KernelBook dataset and the Qwen2.5-Coder base model.

Hasty Briefsbeta