Hasty Briefsbeta

Bilingual

Study shows vision-language models can't handle queries with negation words

a year ago
  • #vision-language models
  • #negation understanding
  • #machine learning
  • MIT researchers found vision-language models (VLMs) struggle with understanding negation words like 'no' and 'doesn't'.
  • VLMs often perform poorly in tasks involving negation, such as retrieving images without certain objects or answering questions with negated captions.
  • The researchers created a dataset with negated captions to improve VLM performance, showing a 10% boost in image retrieval and 30% in question answering.
  • Affirmation bias causes VLMs to ignore negation words, focusing only on objects present in images.
  • The study highlights the risks of using VLMs in high-stakes settings without addressing their inability to understand negation.
  • Future work may involve training VLMs to process text and images separately or developing specialized datasets for fields like healthcare.