Comparison of the performance of large language models in answering patient questions related to cataract - PubMed
5 hours ago
- #ophthalmology
- #large language models
- #health informatics
- Study evaluated four large language models (ChatGPT o3-mini, Gemini 2.0 pro experimental, Deep Seek Thinking R1, Kimi Thinking K1.5) for answering cataract-related patient questions in Chinese.
- DeepSeek Thinking R1 matched Gemini 2.0 pro experimental in accuracy and outperformed ChatGPT o3-mini and Kimi Thinking K1.5.
- DeepSeek Thinking R1 excelled in completeness and consistency compared to the other models.
- Legibility and safety were comparable among DeepSeek Thinking R1, Gemini 2.0 pro experimental, and ChatGPT o3-mini, all better than Kimi Thinking K1.5.
- DeepSeek Thinking R1 showed the strongest overall performance in the evaluation.
- Modern LLMs are promising for ophthalmology public education but require human oversight.