Latest Breaking News

highplainsdem

(63,862 posts)

5. Yes, it's the AI-assisted triage. The sort of thing the study I mentioned tested.

Sat Jul 4, 2026, 08:47 PM

21 hrs ago

From my OP a few months ago and the study I linked to:

https://www.democraticunderground.com/100221066192

https://www.nature.com/articles/s41591-026-04297-7

Brief Communication
Published: 23 February 2026
ChatGPT Health performance in a structured test of triage recommendations
Ashwin Ramaswamy, Alvira Tyagi, Hannah Hugo, Joy Jiang, Pushkala Jayaraman, Mateen Jangda, Alexis E. Te, Steven A. Kaplan, Joshua Lampert, Robert Freeman, Nicholas Gavin, Ashutosh K. Tewari, Ankit Sakhuja, Bilal Naved, Alexander W. Charney, Mahmud Omar, Michael A. Gorin, Eyal Klang & Girish N. Nadkarni

Abstract
ChatGPT Health launched in January 2026 as OpenAI’s consumer health tool, reaching millions of users. Here, we conducted a structured stress test of triage recommendations using 60 clinician-authored vignettes across 21 clinical domains under 16 factorial conditions (960 total responses). Performance followed an inverted U-shaped pattern, with the most dangerous failures concentrated at clinical extremes: non-urgent presentations (35%) and emergency conditions (48%). Among gold-standard emergencies, the system under-triaged 52% of cases, directing patients with diabetic ketoacidosis and impending respiratory failure to 24–48-hour evaluation rather than the emergency department, while correctly triaging classical emergencies such as stroke and anaphylaxis. When family or friends minimized symptoms (anchoring bias), triage recommendations shifted significantly in edge cases (OR 11.7, 95% CI 3.7-36.6), with the majority of shifts toward less urgent care. Crisis intervention messages activated unpredictably across suicidal ideation presentations, firing more when patients described no specific method than when they did. Patient race, gender, and barriers to care showed no significant effects, though confidence intervals did not exclude clinically meaningful differences. Our findings reveal missed high-risk emergencies and inconsistent activation of crisis safeguards, raising safety concerns that warrant prospective validation before consumer-scale deployment of artificial intelligence triage systems.