A/B Tests over Evals
9 days ago
- #Evals vs Monitoring
- #Raindrop
- #AI Development
- Evals are crucial for AI product development, but their definitions and applications vary widely.
- Raindrop focuses on monitoring real-world AI performance, contrasting with traditional eval-driven development.
- The debate between evals and monitoring highlights the need for tools that adapt to AI's unpredictability.
- Personalized AI challenges the practicality of evals, suggesting monitoring as a more scalable solution.
- LLMs as judges in evals face issues with calibration, cost, and pattern discovery.