Stanford study reveals AI vision models invent images they never see
13 hours ago
- #multimodal AI
- #AI hallucinations
- #benchmark design
- Multimodal AI models can generate detailed image descriptions for images never provided, a phenomenon termed 'mirage reasoning'.
- Models achieve high scores on general and medical multimodal benchmarks without any image input, questioning benchmark utility and design.
- Explicitly instructing models to guess answers without image access reduces performance compared to implicit prompting.
- The findings reveal vulnerabilities in visual-language model reasoning and evaluation methods.
- There is a need for private benchmarks, like B-Clean, that eliminate textual cues enabling non-visual inference, especially in medical contexts.