Hasty Briefsbeta

Bilingual

The Webpage Has Instructions. The Agent Has Your Credentials

8 hours ago
  • #Prompt Injection
  • #Agent Security
  • #AI Risks
  • A poisoned GitHub issue led a coding agent to access a private repository and leak its contents in a public pull request.
  • Operator browser-agent had a 23% prompt-injection success rate post-mitigation in 31 test scenarios.
  • Agent Security Bench reported an 84.30% attack success rate across mixed attacks.
  • Untrusted content reaching tool calls, repository writes, memory updates, or agent handoffs poses significant risks.
  • OpenAI's safeguards for browser agents included confirmation prompts, watch mode, and a prompt-injection detector, yet attackers succeeded 23% of the time.
  • Deep Research highlighted risks of prompt injections, privacy breaches, and code execution in a single workflow.
  • Prompt injection became a standard engineering problem by March 2025, with OpenAI bundling web search, file search, and guardrails into developer toolkits.
  • Anthropic emphasized that even a 1% attack success rate is meaningful for agents handling sensitive tasks.
  • Microsoft and OpenAI described specific attack mechanics, such as HTML image tags leaking data and hidden channels.
  • Invariant Labs disclosed MCP tool-poisoning attacks where malicious instructions were hidden in tool descriptions.
  • Memory poisoning attacks can corrupt long-term memory and influence future agent responses.
  • Google's A2A protocol introduced risks of contaminated context flowing between agents with different permissions.
  • By early 2026, vendors like Google, OpenAI, and Anthropic adopted layered defenses, including classifiers, sandboxing, and confirmation steps.
  • Key defenses include labeling untrusted inputs, scoping permissions, limiting outbound connections, and treating memory as part of the security surface.
  • The first major prompt-injection incident with financial damage is predicted to involve multi-agent workflows.
  • Agent security is expected to converge with application security, focusing on trust boundaries and scoped credentials.