He asked AI to count carbs 27000 times. It couldn't give the same answer twice

5 hours ago

AI models show high variability in carb estimates from the same food photo, posing risks for diabetes insulin dosing.
Four models (OpenAI GPT-5.4, Anthropic Claude Sonnet 4.6, Google Gemini 2.5 Pro and 3.1 Pro) were tested with repeated queries, revealing inconsistent and sometimes dangerous estimates.
Claude had the lowest variation (2.4% median CV) but still showed systematic bias, while Gemini models had high variability (up to 11% median CV).
Models often misidentified foods (e.g., Bakewell tart called 'Linzer torte') and provided unreliable confidence scores with near-zero correlation to accuracy.
The study advises against blind trust in AI carb counting, recommending multiple queries and cross-checking for safer use in diabetes apps.

Hasty Briefsbeta