Testing LLM Responses: A Fast, Cost-Effective Alternative to LLM-as-Judge
9 months ago
- #evaluation
- #cosine-similarity
- #LLM
- The 'LLM-as-judge' approach is thorough but expensive and slow for personal projects.
- Proposed solution: Length-adjusted cosine similarity for fast, budget-friendly monitoring.
- Implementation involves TF-IDF vectorization and cosine similarity with length adjustment.
- Benefits include speed, cost-effectiveness, automation-friendliness, and good enough accuracy.
- Real-world testing shows it catches major regressions and allows natural variation.
- Best used as a first line of defense with threshold monitoring.
- Limitations include not being semantically perfect and domain-specific performance.
- Ideal for regression testing, continuous monitoring, and budget-conscious evaluation.
- Bottom line: A practical middle ground for personal projects needing fast, cost-effective monitoring.