Hasty Briefsbeta

  • #AI evaluations
  • #agent testing
  • #LLM optimization
  • Models change and improve, but evaluations (evals) remain essential.
  • Always look at the data; evals can't replace this step.
  • Start with end-to-end (e2e) evals to define success criteria (yes/no outcomes).
  • E2E evals help identify edge cases, refine prompts, and compare model performance.
  • Move to 'N-1' evals to simulate previous interactions for targeted improvements.
  • Keep 'N-1' evals updated to reflect changes in the agent's behavior.
  • Use 'checkpoints' in prompts (exact strings) to validate complex conversation patterns.
  • External tools simplify setup but don't replace custom evals tailored to your use case.
  • Build your own evals instead of relying solely on standard ones.