Hasty Briefsbeta

Bilingual

Introspective Diffusion Language Models

9 hours ago
  • #diffusion models
  • #parallel decoding
  • #language models
  • AR models agree with their own generation, but diffusion language models (DLMs) often do not, causing a quality gap.
  • I-DLM introduces introspective strided decoding (ISD) to verify previously generated tokens while generating new ones in the same forward pass.
  • I-DLM-8B matches the quality of its same-scale AR counterpart, outperforms LLaDA-2.1-mini (16B) on benchmarks like AIME-24 (+26) and LiveCodeBench-v6 (+15) with half the parameters.
  • It achieves 2.9-4.1x throughput at high concurrency and, with gated LoRA, enables bit-for-bit lossless acceleration.
  • I-DLM is the first DLM to match same-scale AR quality, surpassing all prior DLMs across 15 benchmarks.
  • The method integrates directly into SGLang for production deployment with no custom infrastructure, featuring paged KV cache and continuous batching.
  • The paper provides a model zoo, training recipes, and benchmark evaluations for reproducibility and deployment.