I asked my local LLM to add 23 numbers and got seven wrong answers

6 hours ago

The author attempted to use a local LLM for summing 23 stock transactions, encountering multiple failures.
Smaller models omitted data and computed incorrectly, while larger models still failed at arithmetic due to token prediction instead of calculation.
Tool-calling systems like Open Interpreter failed due to format mismatches, not executing code as expected.
Enabling a harness with code execution (Open WebUI's Code Interpreter) eventually produced the correct answer with clear prompts.
Success requires four layers: model, inference engine, orchestrator, and harness, with the harness providing reliability through tools like code execution.
Local LLMs need proper harnesses with code execution for computational tasks; chat interfaces alone are insufficient.

Hasty Briefsbeta