Hasty Briefsbeta

Bilingual

RareArena: a comprehensive benchmark dataset unveiling the potential of large language models in rare disease diagnosis - PubMed

3 hours ago
  • #rare diseases
  • #medical diagnosis
  • #large language models
  • RareArena is a benchmark dataset designed to evaluate large language models (LLMs) in rare disease diagnosis.
  • The dataset addresses gaps in existing evaluations by offering high sample sizes, broad disease coverage, and clinical relevance.
  • Two tasks were created: Rare Disease Screening (RDS) with 49,760 cases and Rare Disease Confirmation (RDC) with 22,901 cases.
  • Human evaluations confirmed the dataset's high quality in terms of leakage, fidelity, and complexity.
  • Ten state-of-the-art LLMs were benchmarked, with GPT-4o achieving the best performance in both RDS and RDC tasks.
  • GPT-4o performed better on genetically inherited diseases and excelled in systemic or rheumatologic diseases.
  • RareArena is the largest rare disease diagnostic benchmark to date, supporting improved global care for rare diseases.