Apple: Embarrassingly Simple Self-Distillation Improves Code Generation
10 hours ago
- #LLMs
- #code-generation
- #self-distillation
- Simple self-distillation (SSD) improves code generation in LLMs by fine-tuning on the model's own raw outputs without external verifiers or reinforcement learning.
- SSD significantly boosts performance; e.g., Qwen3-30B-Instruct's pass@1 on LiveCodeBench v6 improved from 42.4% to 55.3%, especially on harder problems.
- The method works by addressing a precision-exploration conflict, reshaping token distributions to suppress distractors while preserving diversity.
- SSD generalizes across Qwen and Llama models at various scales (4B, 8B, 30B) and across instruct and thinking variants.
- It offers a complementary post-training direction for enhancing LLM code generation without complex setups.