Hasty Briefsbeta

Bilingual

The ways we contain Claude across products

3 hours ago
  • #AI Security
  • #Agent Containment
  • #Risk Management
  • Evolution of Claude's security approach: A year ago, granting Claude high-level access was unthinkable; now it's routine, improving developer productivity despite increased blast radius risks.
  • Two main strategies for capping blast radius: Human-in-the-loop supervision (e.g., permission prompts) and containment (e.g., sandboxes, VMs, egress controls).
  • Security risks fall into three categories: User misuse, model misbehavior, and external attacks.
  • Defenses applied to three components: The agent's environment (e.g., sandboxes), the model (e.g., system prompts), and external content (e.g., tool outputs).
  • Claude.ai uses server-side ephemeral containers with minimal blast radius but limited functionality.
  • Claude Code employs OS-level sandboxes to reduce approval fatigue but faced vulnerabilities like pre-consent execution and phishing via prompt injection.
  • Claude Cowork uses sealed VMs for strong isolation, balancing access control with user transparency and addressing challenges like exfiltration through allowed domains.
  • Key lessons: Prioritize environmental containment over probabilistic model defenses; tailor isolation to user expertise; rely on battle-tested components over custom ones.
  • Future challenges include persistent memory poisoning, multi-agent trust escalation, and agent identity management.