AI bots ignore evidence. Can we trust them with science?
13 hours ago
- #Science Trust
- #AI Reasoning
- #Evidence Ignorance
- AI chatbots like ChatGPT, Gemini, and Grok failed to update predictions based on experimental evidence in a pen demonstration, sticking to incorrect assumptions.
- A study showed AI agents ignored evidence in 68% of scientific reasoning tasks, made unsupported claims in 53%, and used contradictory evidence to change output only 26% of the time.
- AI systems lack an iterative reasoning process similar to human scientists, often refusing to revise hypotheses despite clear evidence, limiting their reliability in science and medicine.
- Researchers developed a benchmark to evaluate AI agents' reasoning process rather than just outcomes, revealing gaps in their ability to incorporate new data transparently.
- Reasoning models, which mimic step-by-step thinking, may not truly reason but imitate patterns without verification, making it hard to trust their problem-solving process.
- AI is best suited for well-defined tasks in science, not open-ended reasoning, contradicting claims of emergent intelligence and raising concerns about knowledge erosion.
- Understanding AI's limitations allows for improvement towards meaningful discoveries, though current systems risk undermining scientific trust and transparency.