AI bots ignore evidence. Can we trust them with science?

2 hours ago

AI systems, including chatbots and AI agents, often fail to update their predictions or hypotheses based on new experimental evidence, as shown in videos where chatbots ignored live demonstrations and a study where agents disregarded evidence in scientific tasks.
The study found that in 68% of tasks, AI agents ignored evidence; in 53%, they made claims without supporting evidence; and only 26% of the time did they use contradictory evidence to change their output, highlighting a serious flaw in their reasoning process.
Unlike human scientists who iteratively revise hypotheses based on experiments, AI agents typically stick to initial plans even when evidence contradicts them, raising concerns about their reliability in science and medicine due to lack of transparent and meaningful processes.
Reasoning models, which are LLMs trained to follow step-by-step reasoning, may only imitate reasoning without truly thinking, as evidenced by their ability to get correct answers despite flawed or nonsense intermediate steps, making it hard to trust their process.
AI is best suited for well-defined tasks in science but not yet ready for open-ended scientific reasoning, with experts warning that overhyping AI as a new form of intelligence could erode knowledge systems, though some remain optimistic about improving the technology for meaningful discoveries.

Hasty Briefsbeta