Evaluation of validity, reliability, and readability of AI chatbots for gestational diabetes mellitus: a multi-model comparative study - PubMed

a day ago

The study evaluates the validity, reliability, and readability of six AI chatbots for gestational diabetes mellitus (GDM) information.
ChatGPT-5 achieved the highest accuracy (92.17%) in answering GDM-related multiple-choice questions.
Newer AI models consistently outperformed their predecessors across all domains of GDM knowledge.
ChatGPT-5 also scored highest in reliability for public-education questions but had poor transparency scores.
All AI models produced text above the recommended sixth-grade reading level, making them unsuitable as stand-alone patient education resources.
The study concludes that AI chatbots should be used as adjuncts to clinician counseling, not as primary resources.

Hasty Briefsbeta