The Unreliability of LLMs and What Lies Ahead
a year ago
- #AI
- #LLMs
- #Startups
- Large Language Models (LLMs) are fundamentally unreliable, which limits their real-world utility.
- LLM reliability issues persist even in well-defined tasks and worsen with multi-step actions or autonomy.
- Hallucination rates in LLMs are around 50% for top models, making them unsuitable for high-stakes applications.
- Code generation is a mature LLM use case, but achieving 99% correctness remains challenging.
- LLMs are highly input-sensitive, with minor prompt changes leading to vastly different outputs.
- Alignment issues in LLMs highlight their opacity and potential risks in agentic applications.
- Short-to-medium-term improvements in LLM reliability are unlikely due to compounding error rates.
- Developers can work around LLM variance by focusing on autonomy or human-in-the-loop strategies.
- Autonomy strategies aim for determinism or 'accurate enough' outputs without user verification.
- Human-in-the-loop approaches involve end-user verification or provider-level quality control.
- Successful AI products must anticipate LLM failures and design systems that work despite them.
- Verissimo Ventures invests in enterprise software, focusing on AI and tech startups.