The Rise of Deepfake Medical Imaging: Radiologists' Diagnostic Accuracy in Detecting ChatGPT-generated Radiographs - PubMed
5 hours ago
- #radiology
- #AI
- #deepfake
- Study assesses radiologists' and LLMs' ability to distinguish ChatGPT-generated synthetic radiographs from authentic images.
- 17 radiologists from six countries participated, evaluating 154 radiographs (77 synthetic, 77 authentic) in blinded and informed phases.
- 41% of radiologists spontaneously identified AI-generated radiographs when blinded to the study's purpose.
- No significant difference in accuracy was found between radiologists distinguishing GPT-4o-generated vs. RoentGen-generated synthetic images (75% vs. 70%).
- LLMs varied in accuracy: GPT-4o (85%) and GPT-5 (83%) outperformed Llama 4 Maverick (59%) and Gemini 2.5 Pro (56%).
- Common synthetic image features included bilateral symmetry, uniform grain, unnatural textures, and overly smooth bone surfaces.
- Synthetic radiographs were not easily distinguishable by radiologists or LLMs, highlighting the need for training to mitigate risks.
- A curated deepfake dataset is available to support training in recognizing synthetic medical images.