Comparative performance of LLMs and machine learning in predicting complications after percutaneous kyphoplasty for osteoporotic vertebral compression fractures - PubMed

10 hours ago

Assesses LLMs (GPT-5 and DeepSeek R1) vs. traditional machine learning (TML) and spine surgeons in predicting complications (bone cement leakage BCL and new vertebral fractures NVF) after percutaneous kyphoplasty (PKP) for osteoporotic vertebral compression fractures.
For BCL prediction, zero-shot LLMs showed acceptable performance (F1-score ~0.857-0.871), comparable to TML models and slightly superior to surgeons alone. Few-shot prompting improved specificity but overall gains were uncertain.
For NVF prediction, zero-shot LLM performance was poor but improved with few-shot learning. The RBF-SVM TML model performed best for NVF prediction (F1-score 0.536). LLM explanations enhanced surgeon performance only for BCL prediction, not NVF.
LLMs performed poorly in predicting complication subtypes. The study concludes current LLMs show diverse predictive performance for different complications and are still immature for real clinical application, requiring further improvement.

Hasty Briefsbeta