A/B Tests over Evals

9 days ago

Copy Link

Evals are crucial for AI product development, but their definitions and applications vary widely.
Raindrop focuses on monitoring real-world AI performance, contrasting with traditional eval-driven development.
The debate between evals and monitoring highlights the need for tools that adapt to AI's unpredictability.
Personalized AI challenges the practicality of evals, suggesting monitoring as a more scalable solution.
LLMs as judges in evals face issues with calibration, cost, and pattern discovery.

Hasty Briefsbeta