Hasty Briefsbeta

Bilingual

Stanford study reveals AI vision models invent images they never see

11 hours ago
  • #multimodal AI
  • #AI hallucinations
  • #benchmark design
  • Multimodal AI models can generate detailed image descriptions for images never provided, a phenomenon termed 'mirage reasoning'.
  • Models achieve high scores on general and medical multimodal benchmarks without any image input, questioning benchmark utility and design.
  • Explicitly instructing models to guess answers without image access reduces performance compared to implicit prompting.
  • The findings reveal vulnerabilities in visual-language model reasoning and evaluation methods.
  • There is a need for private benchmarks, like B-Clean, that eliminate textual cues enabling non-visual inference, especially in medical contexts.