New benchmark shows top LLMs struggle in real mental health care

2 days ago

Copy Link

Introduction of MindEval, a new framework to measure LLM clinical competence in mental health support.
Developed by Sword Health, MindEval is open-source and expert-validated.
Addresses the global demand for mental health support with over one billion people affected.
Current AI evaluations fall short in measuring clinical competence, dynamic interactions, and expert validation.
MindEval framework includes Patient LLM (PLM), Clinician LLM (CLM), and Judge LLM (JLM) for dynamic evaluation.
Evaluates on 5 core criteria: Clinical Accuracy & Competence, Ethical & Professional Conduct, Assessment & Response, Therapeutic Relationship & Alliance, and AI-Specific Communication Quality.
Validation shows MindEval's Patient Realism and Judge Quality are reliable and correlate with human experts.
Benchmark results reveal significant gaps in current AI capabilities, with average scores below 4 out of 6.
Models struggle with severe symptoms and longer interactions, indicating a need for improved alignment and evaluation.
Sword Health is open-sourcing MindEval to encourage transparency and industry-wide improvements in AI mental health support.

Hasty Briefsbeta