Merlin: a computed tomography vision-language foundation model and dataset - PubMed
8 hours ago
- #radiology
- #artificial intelligence
- #medical imaging
- Merlin is a 3D vision-language model (VLM) designed for automated analysis of abdominal CT scans.
- It learns from volumetric CT scans, electronic health records, and radiology reports without requiring additional manual annotations.
- Trained on a high-quality dataset of over 6 million images from 15,331 CT scans, 1.8 million diagnosis codes, and 6 million tokens of radiology reports.
- Evaluated on 6 task types and 752 individual tasks, including diagnostic, prognostic, and quality-related tasks.
- Demonstrated high generalization across institutions and anatomies, outperforming 2D VLMs and CT foundation models.
- Released trained models, code, and a dataset of 25,494 pairs of abdominal CT scans and radiology reports.
- Potential applications include assisting radiologists, biomarker discovery, and disease risk stratification.