The Webpage Has Instructions. The Agent Has Your Credentials
6 hours ago
- #Prompt Injection
- #Agent Security
- #AI Risks
- A poisoned GitHub issue led a coding agent to access a private repository and leak its contents in a public pull request.
- Operator browser-agent had a 23% prompt-injection success rate post-mitigation in 31 test scenarios.
- Agent Security Bench reported an 84.30% attack success rate across mixed attacks.
- Untrusted content reaching tool calls, repository writes, memory updates, or agent handoffs poses significant risks.
- OpenAI's safeguards for browser agents included confirmation prompts, watch mode, and a prompt-injection detector, yet attackers succeeded 23% of the time.
- Deep Research highlighted risks of prompt injections, privacy breaches, and code execution in a single workflow.
- Prompt injection became a standard engineering problem by March 2025, with OpenAI bundling web search, file search, and guardrails into developer toolkits.
- Anthropic emphasized that even a 1% attack success rate is meaningful for agents handling sensitive tasks.
- Microsoft and OpenAI described specific attack mechanics, such as HTML image tags leaking data and hidden channels.
- Invariant Labs disclosed MCP tool-poisoning attacks where malicious instructions were hidden in tool descriptions.
- Memory poisoning attacks can corrupt long-term memory and influence future agent responses.
- Google's A2A protocol introduced risks of contaminated context flowing between agents with different permissions.
- By early 2026, vendors like Google, OpenAI, and Anthropic adopted layered defenses, including classifiers, sandboxing, and confirmation steps.
- Key defenses include labeling untrusted inputs, scoping permissions, limiting outbound connections, and treating memory as part of the security surface.
- The first major prompt-injection incident with financial damage is predicted to involve multi-agent workflows.
- Agent security is expected to converge with application security, focusing on trust boundaries and scoped credentials.