The ways we contain Claude across products
4 hours ago
- #AI Security
- #Agent Containment
- #Risk Management
- Evolution of Claude's security approach: A year ago, granting Claude high-level access was unthinkable; now it's routine, improving developer productivity despite increased blast radius risks.
- Two main strategies for capping blast radius: Human-in-the-loop supervision (e.g., permission prompts) and containment (e.g., sandboxes, VMs, egress controls).
- Security risks fall into three categories: User misuse, model misbehavior, and external attacks.
- Defenses applied to three components: The agent's environment (e.g., sandboxes), the model (e.g., system prompts), and external content (e.g., tool outputs).
- Claude.ai uses server-side ephemeral containers with minimal blast radius but limited functionality.
- Claude Code employs OS-level sandboxes to reduce approval fatigue but faced vulnerabilities like pre-consent execution and phishing via prompt injection.
- Claude Cowork uses sealed VMs for strong isolation, balancing access control with user transparency and addressing challenges like exfiltration through allowed domains.
- Key lessons: Prioritize environmental containment over probabilistic model defenses; tailor isolation to user expertise; rely on battle-tested components over custom ones.
- Future challenges include persistent memory poisoning, multi-agent trust escalation, and agent identity management.