Hasty Briefsbeta

Bilingual

Merlin: a computed tomography vision-language foundation model and dataset - PubMed

12 hours ago
  • #radiology
  • #artificial intelligence
  • #medical imaging
  • Merlin is a 3D vision-language model (VLM) designed for automated analysis of abdominal CT scans.
  • It learns from volumetric CT scans, electronic health records, and radiology reports without requiring additional manual annotations.
  • Trained on a high-quality dataset of over 6 million images from 15,331 CT scans, 1.8 million diagnosis codes, and 6 million tokens of radiology reports.
  • Evaluated on 6 task types and 752 individual tasks, including diagnostic, prognostic, and quality-related tasks.
  • Demonstrated high generalization across institutions and anatomies, outperforming 2D VLMs and CT foundation models.
  • Released trained models, code, and a dataset of 25,494 pairs of abdominal CT scans and radiology reports.
  • Potential applications include assisting radiologists, biomarker discovery, and disease risk stratification.