Superhuman performance of an LLM on the reasoning tasks of a physician

a year ago

A large language model (LLM) was evaluated against physician performance on clinical reasoning tasks.
Five experiments measured clinical reasoning: differential diagnosis, diagnostic reasoning display, triage differential diagnosis, probabilistic reasoning, and management reasoning.
The LLM demonstrated superhuman diagnostic and reasoning abilities in both vignettes and real-world emergency room second opinions.
The study suggests LLMs have achieved superhuman performance in medical diagnostic and management reasoning.
The findings motivate the need for prospective trials to further validate LLM capabilities in clinical settings.

Hasty Briefsbeta