Most leading chatbots routinely exaggerate science findings

a year ago

Most leading chatbots, including ChatGPT, exaggerate science findings in up to 73% of cases.
A study analyzed 4,900 summaries from 10 prominent LLMs, finding six models systematically exaggerated claims.
LLMs often changed cautious, past-tense claims into sweeping, present-tense statements, misleading readers.
When asked for more accuracy, chatbots exaggerated even more often, contrary to expectations.
Newer AI models like ChatGPT-4o and DeepSeek performed worse than older ones in accuracy.
LLMs may inherit the tendency to generalize from training data or user interactions favoring broad claims.
AI-generated science summaries risk spreading misinformation without proper oversight and testing.
Recommendations include using Claude for higher accuracy and specific prompts to reduce exaggeration.

Hasty Briefsbeta