I benchmarked Claude Code's caveman plugin against "be brief."
5 hours ago
- #AI Benchmarking
- #Claude Plugins
- #Prompt Engineering
- The Caveman compression plugin was benchmarked against the simple prompt 'be brief.' across 24 prompts in six categories.
- Both 'be brief.' and Caveman performed similarly in token reduction and quality, with no significant differences in correctness scores.
- Caveman's 'Ultra' mode unexpectedly produced longer responses in some categories due to its Auto-Clarity rule, which disables compression for safety warnings and multi-step sequences.
- Caveman's main value lies in providing consistent output structure, intensity adjustments, and persistence across sessions, not just token savings.
- The benchmark revealed minor issues: one missed required term in 'Lite' mode and tool-use behavior triggered in 'Ultra' mode.
- For basic compression, 'be brief.' is sufficient; Caveman is recommended for structured and consistent outputs in extended interactions.