Hasty Briefsbeta

Bilingual

Study: Self-generated Agent Skills are useless

8 days ago
  • #Benchmarking
  • #Agent Skills
  • #Artificial Intelligence
  • Agent Skills are structured packages of procedural knowledge that enhance LLM agents during inference.
  • SkillsBench is introduced as a benchmark with 86 tasks across 11 domains, paired with curated Skills and deterministic verifiers.
  • Tasks are evaluated under three conditions: no Skills, curated Skills, and self-generated Skills.
  • Curated Skills improve average pass rates by 16.2 percentage points, with significant variation across domains.
  • Self-generated Skills show no average benefit, indicating models cannot reliably create the procedural knowledge they benefit from.
  • Focused Skills with 2–3 modules outperform comprehensive documentation.
  • Smaller models with Skills can match the performance of larger models without them.