IDPI Is Not a Research Problem Anymore

Unit 42 dropped a report this week that deserves to be read carefully, not skimmed. They analyzed large-scale telemetry and confirmed what many of us in the agentic security space have been saying for the past year: indirect prompt injection is not theoretical. It's in production. It's working.

The numbers are real: 22 distinct attacker techniques observed in the wild. Not in a lab. Not in a red team exercise. On actual websites, targeting actual AI agents running in browsers, pipelines, and automated workflows.

What caught my attention most wasn't the volume — it was the specificity of the intents. Ad review evasion. SEO manipulation to push phishing sites up search rankings. Data destruction commands. Unauthorized transaction execution. These aren't researchers poking at edge cases. These are adversaries who understand exactly how AI agents consume web content and are deliberately weaponizing that behavior.

Here's what that means in practice: the web is now an attack surface for your AI stack. Every page your agent browses, every document it summarizes, every API response it processes — any of that content could carry instructions masquerading as data. The agent doesn't know the difference. Neither does your LLM, unless you've built explicit defenses around it.

The most underappreciated detail in the Unit 42 findings is the ad review evasion case — the first confirmed instance of an attacker using IDPI to manipulate an AI system reviewing ad content. That's not a prompt injection attack against a chatbot. That's an attack against a business process. The agent was doing its job and got weaponized against the system it was supposed to protect.

This is the shift I keep talking about: we're past the "can this happen?" phase. We're into "how often is this happening, and to whom?" And the honest answer is: we don't fully know yet, because most organizations aren't instrumented to detect it.

If your agents touch the web — and most do — you need input validation that treats external content as adversarial by default, sandboxed execution environments that limit what a manipulated agent can actually do, and behavioral monitoring that catches anomalous action sequences before they complete. Trust at the content layer is the same mistake as trusting user input in SQL queries. We solved SQLi with parameterization and strict typing. We'll solve IDPI the same way — by never letting untrusted content become executable instructions.

Unit 42 did the field a service with this research. Now it's on the builders to act on it.