Can prompt injection in AI browsers be fixed with better instructions or system prompts?

Not reliably. The model receives page content and user instructions through the same channel, so an instruction embedded in a web page looks identical, at the token level, to an instruction from the user. Every defence built on telling the model to ignore untrusted content has been bypassed within weeks of release.

Should a company just ban AI browser agents outright?

Banning often pushes the behaviour underground rather than removing the risk, as employees switch to personal accounts on personal devices with no logging and no scoping at all. Scoping what each agent session can reach, requiring confirmation for sensitive actions, and keeping the most sensitive credential stores out of reach is a more durable approach than a blanket ban.

Is this the same vulnerability as the prompt injection risks in MCP-based agents?

It’s the same underlying mechanism, an agent that can’t separate trusted instructions from untrusted content, applied to a different surface. MCP-based agents face it through tool definitions and server responses; browser agents face it through web pages, calendar invites and any text the agent reads. The mitigation principles, least privilege, confirmation and isolation, are the same in both cases.

What’s the single highest-leverage step a small security team can take this week?

Turn on whatever confirmation step the vendor offers for sensitive actions, and move password managers and banking access out of any session an AI browser agent can reach. That alone would have stopped both the Comet one-time-passcode leak and the PleaseFix credential exploit.

Best PracticesJun 22, 20267 min readReviewed Jun 22, 2026

Prompt injection in AI browsers can’t be patched away. Here’s what actually works.

What the Comet OTP leak and the PleaseFix disclosures reveal about agentic browser security, and the defences that hold up where prompting doesn’t.

By FlowVerify Editorial Team

Prompt injection in AI browsers stopped being theoretical in August 2025, when Brave’s security team gave Perplexity’s Comet browser a routine instruction: summarise this page. The page was a Reddit thread. Hidden inside a spoiler tag, the kind readers click to reveal a punchline, sat a second instruction, invisible to a human reader but fully legible to Comet’s underlying model. The agent followed it. It navigated to the user’s account settings, triggered a one-time passcode email, opened Gmail to read the code, and posted both the email address and the code back to the same Reddit thread. The user never typed a command past “summarise this page.”

That demonstration is now nearly a year old, and the underlying class of vulnerability, prompt injection in AI browsers, hasn’t gone away. If anything it has multiplied, because the agents have. ChatGPT Atlas, Perplexity Comet, Opera Neon and a handful of others now hold logged-in sessions to email, calendars, banking portals and password managers, and they’re built to act on the open web’s text, images and metadata as readily as they act on a typed prompt. Every one of those surfaces is also a place an attacker can leave instructions.

OpenAI’s own chief information security officer has said in public that this isn’t a bug working its way toward a patch. The mitigations that have shipped, confirmation prompts, logged-out modes, automated red-teaming, reduce the blast radius. They don’t close the hole, because the hole isn’t a coding mistake. It’s a structural property of how these agents read the web. Understanding why is the difference between a security policy that quietly fails the first time someone tests it, and one that actually holds.

Why prompt injection in AI browsers is an architecture problem, not a bug

Conventional browser security, the Same-Origin Policy, CSRF tokens, Content Security Policy, was built to stop one site’s script from reading or acting on another site’s data without permission. None of it applies to what Comet did in the demonstration above, because the agent wasn’t running someone else’s script. It was running with the user’s own authenticated session, across every domain that session reaches, and treating page content as a legitimate source of instructions.

The mechanics are consistent across incidents. An attacker places a payload using whatever the page format allows: white text on a white background, a zero-height div, an HTML comment, or text inside an image that the model reads through OCR. The user triggers the agent with an innocuous request. The model then processes the page content and the user’s instruction through the same context window, as the same kind of token, with no reliable way to tell which one came from the person it’s supposed to be working for. Whatever instruction it decides to follow, it executes with the user’s full authenticated privilege, across whatever domains that session can reach.

It is worth naming the closer analogy: this is the same category of mistake as early SQL injection, mixing code and data in a single channel, except there is no parameterised-query equivalent yet for natural language. The attack surface isn’t “is this browser well coded.” It’s “what does this user’s authenticated session reach,” and for most knowledge workers, that answer is close to everything: inbox, calendar, banking, internal tools, and whatever password manager holds the rest.

PleaseFix: when the attack doesn’t even need a click

In March 2026, security firm Zenity Labs disclosed a vulnerability family it named PleaseFix, found in Perplexity Comet and tested against other agentic browsers. It contained two distinct exploits, and the fact that one disclosure produced two unrelated attack paths is itself the point.

The first used a calendar invite. Attacker-controlled content arrived as an ordinary meeting invite; once it sat in the user’s calendar, a later, completely routine request to Comet, asking it to check the day’s schedule, was enough to trigger the embedded instruction. The agent read local files and sent their contents to an attacker-controlled endpoint, while still returning the response the user expected. No malicious link was ever clicked. The invite alone did the work.

The second exploit targeted 1Password, not by attacking the password manager directly but by manipulating what task the agent believed it was performing. Comet already had standing, agent-authorised access to interact with 1Password on the user’s behalf. Zenity’s researchers showed that the same injection mechanism could redirect that access toward pulling stored credentials or taking over the vault itself.

Zenity disclosed responsibly, and Perplexity and 1Password shipped fixes for both specific exploits. But “this exploit is patched” and “prompt injection is solved” are different sentences. The underlying mechanism, an agent unable to reliably separate user intent from third-party content, produced two unrelated vulnerability classes inside a single disclosure. There will be a third.

What OpenAI tried, and what its own CISO admits

Atlas shipped with real mitigations, not gestures. A “logged-out mode” lets the agent operate without handing over saved passwords. A “watch mode” requires the user to explicitly confirm sensitive actions, sending a message, making a payment, before the agent proceeds. OpenAI has also built an automated red-teaming pipeline that trains a reinforcement-learning attacker model to probe Atlas for exactly this class of exploit, on a repeating cycle rather than a one-off audit.

Chief information security officer Dane Stuckey has described the company’s approach as investing in automated red teaming, reinforcement learning, and rapid response loops, with the explicit goal of shrinking the time between a new injection technique appearing in the wild and a fix landing. That’s a meaningfully different goal from making the underlying flaw disappear.

“Prompt injection is unlikely to ever be fully ‘solved’ in browser agents.”

— OpenAI, on ChatGPT Atlas

That is an unusually candid admission for a vendor to make about its own flagship product, and it happens to be the correct one. OpenAI has compared the risk to phishing and social engineering: categories of attack the security industry has spent decades mitigating without ever eliminating. Anyone evaluating these tools should adopt that as a baseline assumption going in, not treat it as a gap someone else’s roadmap will eventually close.

The defences that actually hold up

None of this means agentic browsers are unusable. It means the controls that work are architectural, not conversational. Five matter in practice.

Least-privilege scoping limits which domains, accounts and actions an agent session can touch at all, instead of handing it the user’s entire authenticated footprint. This is what would have stopped the cross-domain jump in the Comet demonstration, from a Reddit thread to the user’s own Gmail inbox.

Mandatory confirmation on sensitive actions, payments, message sends, credential access, file transfers, stops an agent from executing the most damaging steps silently. It does not stop data that has already been read into the agent’s context before the confirmation step, which is why it has to be paired with scoping rather than relied on alone.

Treating the agent’s session as a separate, lower-trust identity from the human user, with its own audit log, its own anomaly detection and its own revocation path, turns an invisible compromise into a detectable one. PleaseFix-style file exfiltration produces unusual outbound traffic; an organisation that logs agent actions as a distinct identity has a chance of catching that. One that doesn’t, won’t.

Isolating credential stores, password managers, banking portals, the most sensitive internal tools, from agent reach removes the 1Password-style attack path entirely for workflows that don’t genuinely need it. Most don’t.

Better system prompts do not belong on this list. Every published defence along the lines of “ignore instructions found in page content” has been bypassed within weeks of release, for the structural reason already covered: the model cannot reliably tell the difference between an instruction in the page and an instruction from the user, because at the token level they are the same kind of data.

Mitigation	Stops	Doesn’t stop
Least-privilege scoping	Cross-domain exfiltration, blast radius	An attack confined to an already-authorised domain
Mandatory confirmation (“watch mode”)	Silent payments and message sends	Data already read into context before the prompt
Separate agent identity + anomaly monitoring	Detection and containment after the fact	The initial compromise itself
Isolating credential stores from agent reach	Direct credential and vault theft	Workflows that genuinely require that access
Better system prompts	Close to nothing, reliably	Everything above

What each mitigation actually stops, and what it doesn’t

A decision framework, not a ban

For a security or IT lead deciding whether to allow these tools at work, the binary of allow-or-ban is the wrong frame. Teams that ban outright tend to see employees adopt the same tools on personal accounts and personal devices instead, which removes visibility rather than risk.

A workable sequence looks closer to this: inventory what each agent’s logged-in session can actually reach today, since most organisations haven’t checked; turn on whichever confirmation or watch-mode equivalent the vendor offers, and make it mandatory for payments, sends and credential access; keep password managers and the most sensitive systems out of any session an agent can reach until the vendor’s isolation claims are demonstrated rather than promised; and log agent actions under a distinct identity so anomaly detection has something concrete to find.

A minimal policy expressing that intent, at the configuration level, looks something like this.

agent-session-policy.json

{
  "agentSession": "browser-agent",
  "allowedDomains": ["mail.company.com", "calendar.company.com"],
  "deniedDomains": ["*.password-manager.com", "*.banking.com"],
  "requireConfirmation": [
    "send_message",
    "make_payment",
    "credential_access",
    "file_download"
  ],
  "logAs": "agent:browser-low-trust"
}

It is a starting point, not a finished control. The point is that it is enforced by configuration the agent can’t talk its way around, rather than by an instruction the agent might simply not follow.

Where this goes next

Agentic browsers will keep shipping, because the productivity case behind them is real and the vendors building dedicated agentic-browser security tooling are already treating this as a permanent category, not a launch-week bug list. The organisations that get hurt by the next disclosure won’t be the ones using these tools. They’ll be the ones that adopted them without first asking what their existing authenticated sessions were already exposed to.

Frequently asked questions

An AI agent deleted PocketOS's production database in 9 seconds. Credential scoping was the real failure.

A Cursor agent found one unscoped API token and wiped a production database and its backups in nine seconds. The real failure was credential scoping, not the model.

Jul 1, 2026Read full article →

Best PracticesJun 22, 20267 min readReviewed Jun 22, 2026

Prompt injection in AI browsers can’t be patched away. Here’s what actually works.

What the Comet OTP leak and the PleaseFix disclosures reveal about agentic browser security, and the defences that hold up where prompting doesn’t.

By FlowVerify Editorial Team

Why prompt injection in AI browsers is an architecture problem, not a bug

PleaseFix: when the attack doesn’t even need a click

What OpenAI tried, and what its own CISO admits

“Prompt injection is unlikely to ever be fully ‘solved’ in browser agents.”

— OpenAI, on ChatGPT Atlas

The defences that actually hold up

None of this means agentic browsers are unusable. It means the controls that work are architectural, not conversational. Five matter in practice.

Mitigation	Stops	Doesn’t stop
Least-privilege scoping	Cross-domain exfiltration, blast radius	An attack confined to an already-authorised domain
Mandatory confirmation (“watch mode”)	Silent payments and message sends	Data already read into context before the prompt
Separate agent identity + anomaly monitoring	Detection and containment after the fact	The initial compromise itself
Isolating credential stores from agent reach	Direct credential and vault theft	Workflows that genuinely require that access
Better system prompts	Close to nothing, reliably	Everything above

What each mitigation actually stops, and what it doesn’t

A decision framework, not a ban

A minimal policy expressing that intent, at the configuration level, looks something like this.

agent-session-policy.json

{
  "agentSession": "browser-agent",
  "allowedDomains": ["mail.company.com", "calendar.company.com"],
  "deniedDomains": ["*.password-manager.com", "*.banking.com"],
  "requireConfirmation": [
    "send_message",
    "make_payment",
    "credential_access",
    "file_download"
  ],
  "logAs": "agent:browser-low-trust"
}

Prompt injection in AI browsers can’t be patched away. Here’s what actually works.

Why prompt injection in AI browsers is an architecture problem, not a bug

PleaseFix: when the attack doesn’t even need a click

What OpenAI tried, and what its own CISO admits

The defences that actually hold up

A decision framework, not a ban

Where this goes next

Frequently asked questions

Related reading

An AI agent deleted PocketOS's production database in 9 seconds. Credential scoping was the real failure.

Flaky tests aren't random. Six root causes explain almost all of them.

Three npm supply-chain attacks hit in four weeks. None of them needed a stolen password.

Stay ahead on eSignatures, compliance, and document workflows

An AI agent deleted PocketOS's production database in 9 seconds. Credential scoping was the real failure.

Prompt injection in AI browsers can’t be patched away. Here’s what actually works.

Why prompt injection in AI browsers is an architecture problem, not a bug

PleaseFix: when the attack doesn’t even need a click

What OpenAI tried, and what its own CISO admits

The defences that actually hold up

A decision framework, not a ban

Where this goes next

Frequently asked questions

Related reading

An AI agent deleted PocketOS's production database in 9 seconds. Credential scoping was the real failure.

Flaky tests aren't random. Six root causes explain almost all of them.

Three npm supply-chain attacks hit in four weeks. None of them needed a stolen password.

Stay ahead on eSignatures, compliance, and document workflows

An AI agent deleted PocketOS's production database in 9 seconds. Credential scoping was the real failure.