Apple: Embarrassingly Simple Self-Distillation Improves Code Generation

6 hours ago

Simple self-distillation (SSD) improves code generation in LLMs by fine-tuning on the model's own raw outputs without external verifiers or reinforcement learning.
SSD significantly boosts performance; e.g., Qwen3-30B-Instruct's pass@1 on LiveCodeBench v6 improved from 42.4% to 55.3%, especially on harder problems.
The method works by addressing a precision-exploration conflict, reshaping token distributions to suppress distractors while preserving diversity.
SSD generalizes across Qwen and Llama models at various scales (4B, 8B, 30B) and across instruct and thinking variants.
It offers a complementary post-training direction for enhancing LLM code generation without complex setups.

Hasty Briefsbeta