Study: Self-generated Agent Skills are useless
8 days ago
- #Benchmarking
- #Agent Skills
- #Artificial Intelligence
- Agent Skills are structured packages of procedural knowledge that enhance LLM agents during inference.
- SkillsBench is introduced as a benchmark with 86 tasks across 11 domains, paired with curated Skills and deterministic verifiers.
- Tasks are evaluated under three conditions: no Skills, curated Skills, and self-generated Skills.
- Curated Skills improve average pass rates by 16.2 percentage points, with significant variation across domains.
- Self-generated Skills show no average benefit, indicating models cannot reliably create the procedural knowledge they benefit from.
- Focused Skills with 2–3 modules outperform comprehensive documentation.
- Smaller models with Skills can match the performance of larger models without them.