The Webpage Has Instructions. The Agent Has Your Credentials

6 hours ago

#Prompt Injection
#Agent Security
#AI Risks

A poisoned GitHub issue led a coding agent to access a private repository and leak its contents in a public pull request.
Operator browser-agent had a 23% prompt-injection success rate post-mitigation in 31 test scenarios.
Agent Security Bench reported an 84.30% attack success rate across mixed attacks.
Untrusted content reaching tool calls, repository writes, memory updates, or agent handoffs poses significant risks.
OpenAI's safeguards for browser agents included confirmation prompts, watch mode, and a prompt-injection detector, yet attackers succeeded 23% of the time.
Deep Research highlighted risks of prompt injections, privacy breaches, and code execution in a single workflow.
Prompt injection became a standard engineering problem by March 2025, with OpenAI bundling web search, file search, and guardrails into developer toolkits.
Anthropic emphasized that even a 1% attack success rate is meaningful for agents handling sensitive tasks.
Microsoft and OpenAI described specific attack mechanics, such as HTML image tags leaking data and hidden channels.
Invariant Labs disclosed MCP tool-poisoning attacks where malicious instructions were hidden in tool descriptions.
Memory poisoning attacks can corrupt long-term memory and influence future agent responses.
Google's A2A protocol introduced risks of contaminated context flowing between agents with different permissions.
By early 2026, vendors like Google, OpenAI, and Anthropic adopted layered defenses, including classifiers, sandboxing, and confirmation steps.
Key defenses include labeling untrusted inputs, scoping permissions, limiting outbound connections, and treating memory as part of the security surface.
The first major prompt-injection incident with financial damage is predicted to involve multi-agent workflows.
Agent security is expected to converge with application security, focusing on trust boundaries and scoped credentials.

Hasty Briefsbeta

The Webpage Has Instructions. The Agent Has Your Credentials