Evaluation of validity, reliability, and readability of AI chatbots for gestational diabetes mellitus: a multi-model comparative study - PubMed
a day ago
- #gestational diabetes mellitus
- #AI chatbots
- #health information
- The study evaluates the validity, reliability, and readability of six AI chatbots for gestational diabetes mellitus (GDM) information.
- ChatGPT-5 achieved the highest accuracy (92.17%) in answering GDM-related multiple-choice questions.
- Newer AI models consistently outperformed their predecessors across all domains of GDM knowledge.
- ChatGPT-5 also scored highest in reliability for public-education questions but had poor transparency scores.
- All AI models produced text above the recommended sixth-grade reading level, making them unsuitable as stand-alone patient education resources.
- The study concludes that AI chatbots should be used as adjuncts to clinician counseling, not as primary resources.