Arena AI Model ELO History
4 hours ago
- #LMSYS-Arena
- #Performance-Monitoring
- #AI-Model-Updates
- AI labs frequently update models post-launch, potentially introducing nerfs like censorship, quantization, or performance degradation.
- LMSYS Arena uses API endpoints for raw model testing, but web interfaces may differ due to added system prompts, filters, or quantized versions.
- Data is sourced daily from the official LM Arena Leaderboard Dataset on Hugging Face, based on human evaluations for robust capability metrics.
- The chart shows each major AI lab's flagship lineage curve, tracking the highest-rated eligible model over time, not just the latest release.
- Flagship models (e.g., Opus) remain on the curve even if mid-tier models (e.g., Sonnet) are released, with inference variants collapsed to avoid fluctuations.
- New releases appear as labeled markers, often with score jumps, and degradation trends between releases are highlighted for visibility.