Hasty Briefsbeta

Lumina-DiMOO: An open-source discrete multimodal diffusion model

8 hours ago
  • #diffusion models
  • #multimodal AI
  • #open-source
  • Lumina-DiMOO is an open-source foundational model for multimodal generation and understanding.
  • It uses discrete diffusion modeling for handling inputs and outputs across various modalities.
  • Achieves higher sampling efficiency compared to autoregressive or hybrid AR-diffusion paradigms.
  • Supports tasks like text-to-image generation, image editing, inpainting, and image understanding.
  • State-of-the-art performance on multiple benchmarks, surpassing existing open-source models.
  • Code and checkpoints released to foster advancements in multimodal and discrete diffusion research.
  • Outperforms models like SDXL, Emu3-Gen, SD3-Medium, DALL-E 3, and GPT-4o in benchmarks.
  • Excels in tasks involving single objects, counting, colors, positions, and attributes.
  • Strong performance in global, entity, attribute, relation, and other understanding tasks.
  • Competitive scores in POPE, MME-P, MMB, SEED, and MMMU benchmarks.