New benchmark shows top LLMs struggle in real mental health care
2 days ago
- #Mental Health
- #AI in Healthcare
- #LLM Evaluation
- Introduction of MindEval, a new framework to measure LLM clinical competence in mental health support.
- Developed by Sword Health, MindEval is open-source and expert-validated.
- Addresses the global demand for mental health support with over one billion people affected.
- Current AI evaluations fall short in measuring clinical competence, dynamic interactions, and expert validation.
- MindEval framework includes Patient LLM (PLM), Clinician LLM (CLM), and Judge LLM (JLM) for dynamic evaluation.
- Evaluates on 5 core criteria: Clinical Accuracy & Competence, Ethical & Professional Conduct, Assessment & Response, Therapeutic Relationship & Alliance, and AI-Specific Communication Quality.
- Validation shows MindEval's Patient Realism and Judge Quality are reliable and correlate with human experts.
- Benchmark results reveal significant gaps in current AI capabilities, with average scores below 4 out of 6.
- Models struggle with severe symptoms and longer interactions, indicating a need for improved alignment and evaluation.
- Sword Health is open-sourcing MindEval to encourage transparency and industry-wide improvements in AI mental health support.