The Rise of Deepfake Medical Imaging: Radiologists' Diagnostic Accuracy in Detecting ChatGPT-generated Radiographs - PubMed

a month ago

Study assesses radiologists' and LLMs' ability to distinguish ChatGPT-generated synthetic radiographs from authentic images.
17 radiologists from six countries participated, evaluating 154 radiographs (77 synthetic, 77 authentic) in blinded and informed phases.
41% of radiologists spontaneously identified AI-generated radiographs when blinded to the study's purpose.
No significant difference in accuracy was found between radiologists distinguishing GPT-4o-generated vs. RoentGen-generated synthetic images (75% vs. 70%).
LLMs varied in accuracy: GPT-4o (85%) and GPT-5 (83%) outperformed Llama 4 Maverick (59%) and Gemini 2.5 Pro (56%).
Common synthetic image features included bilateral symmetry, uniform grain, unnatural textures, and overly smooth bone surfaces.
Synthetic radiographs were not easily distinguishable by radiologists or LLMs, highlighting the need for training to mitigate risks.
A curated deepfake dataset is available to support training in recognizing synthetic medical images.

Hasty Briefsbeta