LLM-assisted systematic review of large language models in clinical medicine - PubMed

2 months ago

LLM-assisted systematic review identified 4,609 peer-reviewed studies in clinical medicine from January 2022 to September 2025.
Only 1,048 studies used real-world patient data, with just 19 being prospective randomized trials.
Most studies addressed simulated scenarios (1,857) or exam-style tasks (1,704).
ChatGPT and related OpenAI models were evaluated in 65.7% of studies, followed by Gemini/Bard at 13.1%.
Patient-facing communication and education comprised 17% of tasks, followed by knowledge retrieval and education/assessment simulation.
LLMs outperformed humans in 33% of 1,046 head-to-head comparisons, depending on task realism and training level.
At least 25% of studies had sample sizes less than 30.
Rigorous, patient-centered evidence remains scarce, highlighting the need for larger prospective trials before clinical adoption.

Hasty Briefsbeta