Hasty Briefsbeta

BERT Is Just a Single Text Diffusion Step

8 hours ago
  • #diffusion models
  • #machine learning
  • #text generation
  • Google DeepMind introduced Gemini Diffusion, a language model that generates text using diffusion, differing from traditional GPT models by creating whole blocks of text by refining noise step-by-step.
  • Discrete language diffusion is a generalization of masked language modeling (MLM), similar to BERT's approach since 2018.
  • The original Transformer architecture (2017) was encoder-decoder, but in 2018, BERT (encoder-only) and GPT (decoder-only) models emerged, each excelling in different tasks.
  • Diffusion models, popular in image generation, were adapted for text by using masking-based noise processes, where text is gradually masked and then denoised.
  • RoBERTa, an enhanced BERT model, was fine-tuned using HuggingFace libraries on WikiText to perform text generation via diffusion, showing promising results.
  • The fine-tuned RoBERTa model demonstrated coherent text generation, though with some quirks from the WikiText dataset formatting.
  • Comparison with GPT-2 showed GPT-2's output was more coherent and slightly faster, but the RoBERTa diffusion model was a successful proof of concept.
  • The experiment validated that BERT-style models can be repurposed for generative tasks by treating variable-rate masking as a discrete diffusion process.