Introspective Diffusion Language Models
9 hours ago
- #diffusion models
- #parallel decoding
- #language models
- AR models agree with their own generation, but diffusion language models (DLMs) often do not, causing a quality gap.
- I-DLM introduces introspective strided decoding (ISD) to verify previously generated tokens while generating new ones in the same forward pass.
- I-DLM-8B matches the quality of its same-scale AR counterpart, outperforms LLaDA-2.1-mini (16B) on benchmarks like AIME-24 (+26) and LiveCodeBench-v6 (+15) with half the parameters.
- It achieves 2.9-4.1x throughput at high concurrency and, with gated LoRA, enables bit-for-bit lossless acceleration.
- I-DLM is the first DLM to match same-scale AR quality, surpassing all prior DLMs across 15 benchmarks.
- The method integrates directly into SGLang for production deployment with no custom infrastructure, featuring paged KV cache and continuous batching.
- The paper provides a model zoo, training recipes, and benchmark evaluations for reproducibility and deployment.