Keep Deterministic Work Deterministic
6 hours ago
- #LLM pipelines
- #deterministic programming
- #AI reliability
- The article discusses the challenges of reliability in LLM-based systems, using a blackjack simulation as an example.
- Early runs of the simulation had a 37% pass rate, with errors compounding due to miscounts and rule violations.
- The 'March of Nines' concept is introduced, illustrating the increasing effort required to improve system reliability from 90% to 99% and beyond.
- An exercise demonstrates cascading failures in LLMs, showing how small errors in early steps can lead to significant deviations in the final result.
- The article highlights the importance of making deterministic work deterministic, using code instead of LLMs for tasks like arithmetic and rule validation.
- Iterative improvements to the blackjack pipeline, including restructuring data, using chain of thought, and replacing LLM validators with code, increased the pass rate to 94%.
- The key takeaway is to identify and remove deterministic tasks from LLM pipelines, using code for such tasks to achieve higher reliability.