AI bots ignore evidence. Can we trust them with science?

24 days ago

AI chatbots like ChatGPT, Gemini, and Grok failed to update predictions based on experimental evidence in a pen demonstration, sticking to incorrect assumptions.
A study showed AI agents ignored evidence in 68% of scientific reasoning tasks, made unsupported claims in 53%, and used contradictory evidence to change output only 26% of the time.
AI systems lack an iterative reasoning process similar to human scientists, often refusing to revise hypotheses despite clear evidence, limiting their reliability in science and medicine.
Researchers developed a benchmark to evaluate AI agents' reasoning process rather than just outcomes, revealing gaps in their ability to incorporate new data transparently.
Reasoning models, which mimic step-by-step thinking, may not truly reason but imitate patterns without verification, making it hard to trust their problem-solving process.
AI is best suited for well-defined tasks in science, not open-ended reasoning, contradicting claims of emergent intelligence and raising concerns about knowledge erosion.
Understanding AI's limitations allows for improvement towards meaningful discoveries, though current systems risk undermining scientific trust and transparency.

Hasty Briefsbeta