I asked my local LLM to add 23 numbers and got seven wrong answers
7 hours ago
- #local AI setup
- #code execution
- #LLM limitations
- The author attempted to use a local LLM for summing 23 stock transactions, encountering multiple failures.
- Smaller models omitted data and computed incorrectly, while larger models still failed at arithmetic due to token prediction instead of calculation.
- Tool-calling systems like Open Interpreter failed due to format mismatches, not executing code as expected.
- Enabling a harness with code execution (Open WebUI's Code Interpreter) eventually produced the correct answer with clear prompts.
- Success requires four layers: model, inference engine, orchestrator, and harness, with the harness providing reliability through tools like code execution.
- Local LLMs need proper harnesses with code execution for computational tasks; chat interfaces alone are insufficient.