Study shows vision-language models can't handle queries with negation words
a year ago
- #vision-language models
- #negation understanding
- #machine learning
- MIT researchers found vision-language models (VLMs) struggle with understanding negation words like 'no' and 'doesn't'.
- VLMs often perform poorly in tasks involving negation, such as retrieving images without certain objects or answering questions with negated captions.
- The researchers created a dataset with negated captions to improve VLM performance, showing a 10% boost in image retrieval and 30% in question answering.
- Affirmation bias causes VLMs to ignore negation words, focusing only on objects present in images.
- The study highlights the risks of using VLMs in high-stakes settings without addressing their inability to understand negation.
- Future work may involve training VLMs to process text and images separately or developing specialized datasets for fields like healthcare.