The Unreliability of LLMs and What Lies Ahead

a year ago

Large Language Models (LLMs) are fundamentally unreliable, which limits their real-world utility.
LLM reliability issues persist even in well-defined tasks and worsen with multi-step actions or autonomy.
Hallucination rates in LLMs are around 50% for top models, making them unsuitable for high-stakes applications.
Code generation is a mature LLM use case, but achieving 99% correctness remains challenging.
LLMs are highly input-sensitive, with minor prompt changes leading to vastly different outputs.
Alignment issues in LLMs highlight their opacity and potential risks in agentic applications.
Short-to-medium-term improvements in LLM reliability are unlikely due to compounding error rates.
Developers can work around LLM variance by focusing on autonomy or human-in-the-loop strategies.
Autonomy strategies aim for determinism or 'accurate enough' outputs without user verification.
Human-in-the-loop approaches involve end-user verification or provider-level quality control.
Successful AI products must anticipate LLM failures and design systems that work despite them.
Verissimo Ventures invests in enterprise software, focusing on AI and tech startups.

Hasty Briefsbeta