Hasty Briefsbeta

Bilingual

CursorBench 3.1

16 hours ago
  • #Code Agents
  • #AI Benchmark
  • #Performance Evaluation
  • CursorBench 3.1 evaluates agents on ambiguous, multi-file tasks from real Cursor sessions, with higher scores indicating better performance.
  • The benchmark includes models like Fable 5 Max (72.9%), Fable 5 Extra High (72.0%), and others, with scores ranging down to 31.9% for Kimi 2.5.
  • Avg cost per task is calculated using each model's published per-million-token pricing applied to tokens used on CursorBench 3.1 tasks, averaged across tasks.