Study: Self-generated Agent Skills are useless

3 months ago

Agent Skills are structured packages of procedural knowledge that enhance LLM agents during inference.
SkillsBench is introduced as a benchmark with 86 tasks across 11 domains, paired with curated Skills and deterministic verifiers.
Tasks are evaluated under three conditions: no Skills, curated Skills, and self-generated Skills.
Curated Skills improve average pass rates by 16.2 percentage points, with significant variation across domains.
Self-generated Skills show no average benefit, indicating models cannot reliably create the procedural knowledge they benefit from.
Focused Skills with 2–3 modules outperform comprehensive documentation.
Smaller models with Skills can match the performance of larger models without them.

Hasty Briefsbeta