Large Language Model-Driven Analysis and Report Generation of Endoscopy Videos-A Pilot Study - PubMed
3 days ago
- #clinical validation
- #endoscopy
- #artificial intelligence
- Multimodal large language models (MLLMs) were tested for generating clinically adequate esophagogastroduodenoscopy (EGD) reports.
- The study compared clean EGD videos versus those with computer-aided detection (CAD) overlays to assess MLLM performance.
- Five blinded endoscopists rated report adequacy in completeness, visualization, and lesion characteristics.
- MLLM completeness was rated adequate in 56.0% of clean videos versus 48.0% with CAD overlays (p = 0.500).
- Visualization and lesion characteristics showed no significant difference between clean and overlay videos.
- Landmark agreement accuracy was higher for clean videos (0.55) compared to overlay videos (0.33) (p = 0.029).
- Gemini 2.5 Pro demonstrated inadequate performance for clinical EGD reporting, indicating a need for further optimization.
- The study suggests larger-scale validation is required before deploying MLLMs in clinical settings.