Hasty Briefsbeta

Bilingual

ChatGPT Health performance in a structured test of triage recommendations - PubMed

4 hours ago
  • #Medical Safety
  • #ChatGPT Health
  • #AI Triage
  • ChatGPT Health, launched in January 2026, was tested for triage recommendations using 60 clinician-authored vignettes across 21 clinical domains under 16 factorial conditions (960 total responses).
  • Performance showed an inverted U-shaped pattern, with the most dangerous failures at clinical extremes: non-urgent presentations (35%) and emergency conditions (48%).
  • The system under-triaged 52% of gold-standard emergencies, directing patients with diabetic ketoacidosis and impending respiratory failure to 24-48-hour evaluation instead of emergency departments.
  • Classical emergencies like stroke and anaphylaxis were correctly triaged.
  • Anchoring bias (when family or friends minimized symptoms) significantly shifted triage recommendations in edge cases (OR 11.7, 95% CI 3.7-36.6), mostly toward less urgent care.
  • Crisis intervention messages activated unpredictably in suicidal ideation cases, firing more when no specific method was described.
  • Patient race, gender, and barriers to care showed no significant effects, though confidence intervals did not exclude clinically meaningful differences.
  • Findings highlight missed high-risk emergencies and inconsistent crisis safeguard activation, raising safety concerns for AI triage systems before large-scale deployment.