Large Language Model-Driven Analysis and Report Generation of Endoscopy Videos-A Pilot Study - PubMed

2 months ago

Multimodal large language models (MLLMs) were tested for generating clinically adequate esophagogastroduodenoscopy (EGD) reports.
The study compared clean EGD videos versus those with computer-aided detection (CAD) overlays to assess MLLM performance.
Five blinded endoscopists rated report adequacy in completeness, visualization, and lesion characteristics.
MLLM completeness was rated adequate in 56.0% of clean videos versus 48.0% with CAD overlays (p = 0.500).
Visualization and lesion characteristics showed no significant difference between clean and overlay videos.
Landmark agreement accuracy was higher for clean videos (0.55) compared to overlay videos (0.33) (p = 0.029).
Gemini 2.5 Pro demonstrated inadequate performance for clinical EGD reporting, indicating a need for further optimization.
The study suggests larger-scale validation is required before deploying MLLMs in clinical settings.

Hasty Briefsbeta