The ways we contain Claude across products

3 hours ago

Evolution of Claude's security approach: A year ago, granting Claude high-level access was unthinkable; now it's routine, improving developer productivity despite increased blast radius risks.
Two main strategies for capping blast radius: Human-in-the-loop supervision (e.g., permission prompts) and containment (e.g., sandboxes, VMs, egress controls).
Security risks fall into three categories: User misuse, model misbehavior, and external attacks.
Defenses applied to three components: The agent's environment (e.g., sandboxes), the model (e.g., system prompts), and external content (e.g., tool outputs).
Claude.ai uses server-side ephemeral containers with minimal blast radius but limited functionality.
Claude Code employs OS-level sandboxes to reduce approval fatigue but faced vulnerabilities like pre-consent execution and phishing via prompt injection.
Claude Cowork uses sealed VMs for strong isolation, balancing access control with user transparency and addressing challenges like exfiltration through allowed domains.
Key lessons: Prioritize environmental containment over probabilistic model defenses; tailor isolation to user expertise; rely on battle-tested components over custom ones.
Future challenges include persistent memory poisoning, multi-agent trust escalation, and agent identity management.

Hasty Briefsbeta