The effects of multitype prompt engineering for large language models in hypertension treatment decisions - PubMed

5 hours ago

Multitype prompt engineering significantly affects large language model (LLM) performance in hypertension treatment decision-making.
A study using 300 simulated hypertension cases found ChatGPT-4.1 with Guidance-Self-Consistency achieved optimal accuracy (91.3%), nearing expert level.
Optimal LLM assistance improved physician accuracy across hospital levels (e.g., community hospital from 73.4% to 82.5%) and reduced inappropriate regimen rates.
Poor LLM configurations, like zero-shot prompting, decreased physician performance and increased inappropriate regimen rates from 26.6% to 35.2%.
Effectively designed prompt strategies enable LLMs to provide reliable hypertension treatment recommendations, supporting clinical decisions.

Hasty Briefsbeta