Zero-Click Prompt Injection Is Proof That Guardrails Were Always the Wrong Solution

Zenity's CTO just walked into RSAC 2026 and showed live exploits against Cursor, Salesforce, ChatGPT, Gemini, Copilot, and Einstein. Six agents. Zero clicks required from the victim. Every exploit followed the same pattern: attacker-controlled content enters the agent's context — a calendar invite, a document, an email — and the agent obeys it. Silently. Without asking anyone.

One demo leaked developer secrets out of Cursor. Another redirected Salesforce's customer-facing agent to an attacker-controlled server mid-conversation.

This is not a new vulnerability class. Prompt injection has been documented for years. What's new is that it's happening against production agents, at scale, in front of the security industry's biggest annual audience.

The wrong lesson to take from this: "We need better prompt injection detection."

The right lesson: Agents cannot distinguish between a legitimate instruction from their principal and an injected instruction from an attacker. Not reliably. Not at the model level. The model processes text — all text, from all sources — and follows it. That's what it was trained to do.

Guardrails try to intercept bad instructions before the model acts on them. But they're playing defense on the model's input surface, which is enormous. Every document the agent reads is potential attack surface. Every email. Every calendar entry. Every API response. You cannot filter your way out of that.

The only structural fix is to limit what the agent can do, not what it can read. An agent that can only read has nothing worth stealing. An agent with least-privilege permissions — no silent exfiltration, no arbitrary API calls, no credential access beyond what the immediate task requires — is still injectable, but the blast radius shrinks to near zero.

Bargury's demos weren't about clever jailbreaks. They were about production agents with production capabilities operating exactly as designed — except the instructions came from an attacker instead of a user. That's the threat model. It's not a bug in the model. It's a design failure in how we deploy agents.

The industry needed a live demonstration of this at RSAC. It got one. The question now is whether it changes how anyone builds.