Hasty Briefsbeta

Bilingual

Arena AI Model ELO History

4 hours ago
  • #LMSYS-Arena
  • #Performance-Monitoring
  • #AI-Model-Updates
  • AI labs frequently update models post-launch, potentially introducing nerfs like censorship, quantization, or performance degradation.
  • LMSYS Arena uses API endpoints for raw model testing, but web interfaces may differ due to added system prompts, filters, or quantized versions.
  • Data is sourced daily from the official LM Arena Leaderboard Dataset on Hugging Face, based on human evaluations for robust capability metrics.
  • The chart shows each major AI lab's flagship lineage curve, tracking the highest-rated eligible model over time, not just the latest release.
  • Flagship models (e.g., Opus) remain on the curve even if mid-tier models (e.g., Sonnet) are released, with inference variants collapsed to avoid fluctuations.
  • New releases appear as labeled markers, often with score jumps, and degradation trends between releases are highlighted for visibility.