Hasty Briefsbeta

Bilingual

ChatGPT Health performance in a structured test of triage recommendations

4 hours ago
  • #AI in healthcare
  • #triage systems
  • #patient safety
  • ChatGPT Health, launched in January 2026, is OpenAI’s consumer health tool with millions of users.
  • A structured stress test evaluated triage recommendations using 60 clinician-authored vignettes across 21 clinical domains under 16 factorial conditions (960 total responses).
  • Performance showed an inverted U-shaped pattern, with the most dangerous failures at clinical extremes: non-urgent presentations (35%) and emergency conditions (48%).
  • The system under-triaged 52% of gold-standard emergencies, misdirecting cases like diabetic ketoacidosis and impending respiratory failure to delayed evaluation instead of emergency care.
  • Anchoring bias (family/friends minimizing symptoms) significantly shifted triage recommendations in edge cases (OR 11.7), mostly toward less urgent care.
  • Crisis intervention for suicidal ideation activated unpredictably—more frequently when no specific method was described.
  • Patient demographics (race, gender, barriers to care) showed no significant effects, though confidence intervals didn’t rule out meaningful differences.
  • Findings highlight missed high-risk emergencies and inconsistent crisis safeguards, raising safety concerns for AI triage systems at scale.