Comparison of the performance of large language models in answering patient questions related to cataract - PubMed

3 hours ago

Study evaluated four large language models (ChatGPT o3-mini, Gemini 2.0 pro experimental, Deep Seek Thinking R1, Kimi Thinking K1.5) for answering cataract-related patient questions in Chinese.
DeepSeek Thinking R1 matched Gemini 2.0 pro experimental in accuracy and outperformed ChatGPT o3-mini and Kimi Thinking K1.5.
DeepSeek Thinking R1 excelled in completeness and consistency compared to the other models.
Legibility and safety were comparable among DeepSeek Thinking R1, Gemini 2.0 pro experimental, and ChatGPT o3-mini, all better than Kimi Thinking K1.5.
DeepSeek Thinking R1 showed the strongest overall performance in the evaluation.
Modern LLMs are promising for ophthalmology public education but require human oversight.

Hasty Briefsbeta