Lumina-DiMOO: An open-source discrete multimodal diffusion model
8 hours ago
- #diffusion models
- #multimodal AI
- #open-source
- Lumina-DiMOO is an open-source foundational model for multimodal generation and understanding.
- It uses discrete diffusion modeling for handling inputs and outputs across various modalities.
- Achieves higher sampling efficiency compared to autoregressive or hybrid AR-diffusion paradigms.
- Supports tasks like text-to-image generation, image editing, inpainting, and image understanding.
- State-of-the-art performance on multiple benchmarks, surpassing existing open-source models.
- Code and checkpoints released to foster advancements in multimodal and discrete diffusion research.
- Outperforms models like SDXL, Emu3-Gen, SD3-Medium, DALL-E 3, and GPT-4o in benchmarks.
- Excels in tasks involving single objects, counting, colors, positions, and attributes.
- Strong performance in global, entity, attribute, relation, and other understanding tasks.
- Competitive scores in POPE, MME-P, MMB, SEED, and MMMU benchmarks.