Introspective Diffusion Language Models

9 hours ago

AR models agree with their own generation, but diffusion language models (DLMs) often do not, causing a quality gap.
I-DLM introduces introspective strided decoding (ISD) to verify previously generated tokens while generating new ones in the same forward pass.
I-DLM-8B matches the quality of its same-scale AR counterpart, outperforms LLaDA-2.1-mini (16B) on benchmarks like AIME-24 (+26) and LiveCodeBench-v6 (+15) with half the parameters.
It achieves 2.9-4.1x throughput at high concurrency and, with gated LoRA, enables bit-for-bit lossless acceleration.
I-DLM is the first DLM to match same-scale AR quality, surpassing all prior DLMs across 15 benchmarks.
The method integrates directly into SGLang for production deployment with no custom infrastructure, featuring paged KV cache and continuous batching.
The paper provides a model zoo, training recipes, and benchmark evaluations for reproducibility and deployment.

Hasty Briefsbeta