Cline-Bench: A Real-World, Open-Source Benchmark for Agentic Coding

6 months ago

Introducing cline-bench, a real-world, open-source benchmark for agentic coding derived from actual open-source development scenarios.
Aims to address the gap in current coding benchmarks, which often resemble LeetCode-style puzzles rather than real engineering challenges.
Cline-bench environments include repository snapshots, authentic problem definitions, and automated verification criteria for reproducibility.
Tasks are sourced from real open-source projects where models fail or require manual intervention, ensuring relevance and difficulty.
Open call for contributions: engineers can opt-in via the Cline Provider or manually submit tasks from open-source repositories.
Benchmark goals include reliable evaluation, open scientific progress, and training data for fine-tuning and reinforcement learning.
Privacy and security are prioritized, with user control over participation and enterprise data excluded by default.
$1M sponsorship program launched to support open-source maintainers contributing high-value tasks to cline-bench.
Cline-bench remains fully open-source and freely accessible to foster communal progress in AI agentic coding.

Hasty Briefsbeta