Prompt injection in AI browsers can’t be patched away. Here’s what actually works.
What the Comet OTP leak and the PleaseFix disclosures reveal about agentic browser security, and the defences that hold up where prompting doesn’t.
Prompt injection in AI browsers stopped being theoretical in August 2025, when Brave’s security team gave Perplexity’s Comet browser a routine instruction: summarise this page. The page was a Reddit thread. Hidden inside a spoiler tag, the kind readers click to reveal a punchline, sat a second instruction, invisible to a human reader but fully legible to Comet’s underlying model. The agent followed it. It navigated to the user’s account settings, triggered a one-time passcode email, opened Gmail to read the code, and posted both the email address and the code back to the same Reddit thread. The user never typed a command past “summarise this page.”
That demonstration is now nearly a year old, and the underlying class of vulnerability, prompt injection in AI browsers, hasn’t gone away. If anything it has multiplied, because the agents have. ChatGPT Atlas, Perplexity Comet, Opera Neon and a handful of others now hold logged-in sessions to email, calendars, banking portals and password managers, and they’re built to act on the open web’s text, images and metadata as readily as they act on a typed prompt. Every one of those surfaces is also a place an attacker can leave instructions.
OpenAI’s own chief information security officer has said in public that this isn’t a bug working its way toward a patch. The mitigations that have shipped, confirmation prompts, logged-out modes, automated red-teaming, reduce the blast radius. They don’t close the hole, because the hole isn’t a coding mistake. It’s a structural property of how these agents read the web. Understanding why is the difference between a security policy that quietly fails the first time someone tests it, and one that actually holds.
Why prompt injection in AI browsers is an architecture problem, not a bug
Conventional browser security, the Same-Origin Policy, CSRF tokens, Content Security Policy, was built to stop one site’s script from reading or acting on another site’s data without permission. None of it applies to what Comet did in the demonstration above, because the agent wasn’t running someone else’s script. It was running with the user’s own authenticated session, across every domain that session reaches, and treating page content as a legitimate source of instructions.
The mechanics are consistent across incidents. An attacker places a payload using whatever the page format allows: white text on a white background, a zero-height div, an HTML comment, or text inside an image that the model reads through OCR. The user triggers the agent with an innocuous request. The model then processes the page content and the user’s instruction through the same context window, as the same kind of token, with no reliable way to tell which one came from the person it’s supposed to be working for. Whatever instruction it decides to follow, it executes with the user’s full authenticated privilege, across whatever domains that session can reach.
It is worth naming the closer analogy: this is the same category of mistake as early SQL injection, mixing code and data in a single channel, except there is no parameterised-query equivalent yet for natural language. The attack surface isn’t “is this browser well coded.” It’s “what does this user’s authenticated session reach,” and for most knowledge workers, that answer is close to everything: inbox, calendar, banking, internal tools, and whatever password manager holds the rest.
PleaseFix: when the attack doesn’t even need a click
In March 2026, security firm Zenity Labs disclosed a vulnerability family it named PleaseFix, found in Perplexity Comet and tested against other agentic browsers. It contained two distinct exploits, and the fact that one disclosure produced two unrelated attack paths is itself the point.
The first used a calendar invite. Attacker-controlled content arrived as an ordinary meeting invite; once it sat in the user’s calendar, a later, completely routine request to Comet, asking it to check the day’s schedule, was enough to trigger the embedded instruction. The agent read local files and sent their contents to an attacker-controlled endpoint, while still returning the response the user expected. No malicious link was ever clicked. The invite alone did the work.
The second exploit targeted 1Password, not by attacking the password manager directly but by manipulating what task the agent believed it was performing. Comet already had standing, agent-authorised access to interact with 1Password on the user’s behalf. Zenity’s researchers showed that the same injection mechanism could redirect that access toward pulling stored credentials or taking over the vault itself.
Zenity disclosed responsibly, and Perplexity and 1Password shipped fixes for both specific exploits. But “this exploit is patched” and “prompt injection is solved” are different sentences. The underlying mechanism, an agent unable to reliably separate user intent from third-party content, produced two unrelated vulnerability classes inside a single disclosure. There will be a third.
What OpenAI tried, and what its own CISO admits
Atlas shipped with real mitigations, not gestures. A “logged-out mode” lets the agent operate without handing over saved passwords. A “watch mode” requires the user to explicitly confirm sensitive actions, sending a message, making a payment, before the agent proceeds. OpenAI has also built an automated red-teaming pipeline that trains a reinforcement-learning attacker model to probe Atlas for exactly this class of exploit, on a repeating cycle rather than a one-off audit.
Chief information security officer Dane Stuckey has described the company’s approach as investing in automated red teaming, reinforcement learning, and rapid response loops, with the explicit goal of shrinking the time between a new injection technique appearing in the wild and a fix landing. That’s a meaningfully different goal from making the underlying flaw disappear.
“Prompt injection is unlikely to ever be fully ‘solved’ in browser agents.”
That is an unusually candid admission for a vendor to make about its own flagship product, and it happens to be the correct one. OpenAI has compared the risk to phishing and social engineering: categories of attack the security industry has spent decades mitigating without ever eliminating. Anyone evaluating these tools should adopt that as a baseline assumption going in, not treat it as a gap someone else’s roadmap will eventually close.
The defences that actually hold up
None of this means agentic browsers are unusable. It means the controls that work are architectural, not conversational. Five matter in practice.
Least-privilege scoping limits which domains, accounts and actions an agent session can touch at all, instead of handing it the user’s entire authenticated footprint. This is what would have stopped the cross-domain jump in the Comet demonstration, from a Reddit thread to the user’s own Gmail inbox.
Mandatory confirmation on sensitive actions, payments, message sends, credential access, file transfers, stops an agent from executing the most damaging steps silently. It does not stop data that has already been read into the agent’s context before the confirmation step, which is why it has to be paired with scoping rather than relied on alone.
Treating the agent’s session as a separate, lower-trust identity from the human user, with its own audit log, its own anomaly detection and its own revocation path, turns an invisible compromise into a detectable one. PleaseFix-style file exfiltration produces unusual outbound traffic; an organisation that logs agent actions as a distinct identity has a chance of catching that. One that doesn’t, won’t.
Isolating credential stores, password managers, banking portals, the most sensitive internal tools, from agent reach removes the 1Password-style attack path entirely for workflows that don’t genuinely need it. Most don’t.
Better system prompts do not belong on this list. Every published defence along the lines of “ignore instructions found in page content” has been bypassed within weeks of release, for the structural reason already covered: the model cannot reliably tell the difference between an instruction in the page and an instruction from the user, because at the token level they are the same kind of data.
| Mitigation | Stops | Doesn’t stop |
|---|---|---|
| Least-privilege scoping | Cross-domain exfiltration, blast radius | An attack confined to an already-authorised domain |
| Mandatory confirmation (“watch mode”) | Silent payments and message sends | Data already read into context before the prompt |
| Separate agent identity + anomaly monitoring | Detection and containment after the fact | The initial compromise itself |
| Isolating credential stores from agent reach | Direct credential and vault theft | Workflows that genuinely require that access |
| Better system prompts | Close to nothing, reliably | Everything above |
A decision framework, not a ban
For a security or IT lead deciding whether to allow these tools at work, the binary of allow-or-ban is the wrong frame. Teams that ban outright tend to see employees adopt the same tools on personal accounts and personal devices instead, which removes visibility rather than risk.
A workable sequence looks closer to this: inventory what each agent’s logged-in session can actually reach today, since most organisations haven’t checked; turn on whichever confirmation or watch-mode equivalent the vendor offers, and make it mandatory for payments, sends and credential access; keep password managers and the most sensitive systems out of any session an agent can reach until the vendor’s isolation claims are demonstrated rather than promised; and log agent actions under a distinct identity so anomaly detection has something concrete to find.
A minimal policy expressing that intent, at the configuration level, looks something like this.
{
"agentSession": "browser-agent",
"allowedDomains": ["mail.company.com", "calendar.company.com"],
"deniedDomains": ["*.password-manager.com", "*.banking.com"],
"requireConfirmation": [
"send_message",
"make_payment",
"credential_access",
"file_download"
],
"logAs": "agent:browser-low-trust"
}It is a starting point, not a finished control. The point is that it is enforced by configuration the agent can’t talk its way around, rather than by an instruction the agent might simply not follow.
Where this goes next
Agentic browsers will keep shipping, because the productivity case behind them is real and the vendors building dedicated agentic-browser security tooling are already treating this as a permanent category, not a launch-week bug list. The organisations that get hurt by the next disclosure won’t be the ones using these tools. They’ll be the ones that adopted them without first asking what their existing authenticated sessions were already exposed to.
Frequently asked questions
Related reading
An AI agent deleted PocketOS's production database in 9 seconds. Credential scoping was the real failure.
A Cursor agent found one unscoped API token and wiped a production database and its backups in nine seconds. The real failure was credential scoping, not the model.
Flaky tests aren't random. Six root causes explain almost all of them.
Retrying a failed CI job treats every flaky test as the same problem. Research from Google, Microsoft, and Atlassian shows flakiness has six distinct root causes, and the fix for one works against another.
Three npm supply-chain attacks hit in four weeks. None of them needed a stolen password.
Three unrelated npm attacks in May and June 2026 used three different techniques. All three got past 2FA and OIDC Trusted Publishing by skipping the registry account and going straight for the CI runner.