GlossaryIndirect Prompt Injection

Indirect Prompt Injection

attack-vectorsprompt-injectionagent-securityfundamentals
Ofir Stein·Updated March 12, 2026

Indirect prompt injection is an attack where malicious instructions are embedded in external content — documents, emails, web pages, database records — that an AI agent retrieves and processes, hijacking the agent's behavior without ever touching the model directly.


What Is Indirect Prompt Injection?

The attack doesn't target the model. It targets the data the model reads.

In a direct prompt injection, the attacker controls user input — they type malicious instructions into a chat box. That's a limited surface: you can filter it, rate-limit it, monitor it. Indirect prompt injection is different. The attacker plants instructions in the environment — in a PDF the agent will summarize, a web page it will browse, a Slack message it will read, an email in the inbox it processes. When the agent retrieves that content, the injected instructions enter the context window and the agent executes them.

The agent never knows the difference between your system prompt and an instruction it just read from an attacker-controlled document. To the model, it's all text.

Why It's Dangerous

Every external content source your agent touches is a potential attack vector. Not a theoretical one — a structural one. The moment you give an agent access to external data and tools, indirect injection becomes a viable exploitation path.

The impact scales with the agent's capabilities. An agent that can only read and summarize? The blast radius is limited. An agent that reads emails and sends emails and can query your CRM? One injected instruction in one email and the attacker has a proxy inside your infrastructure.

Defense

Behavioral defenses — "ignore instructions in documents" — are insufficient. LLMs are probabilistic; a sufficiently crafted injection will eventually bypass any instruction-based filter.

Structural defenses are the right answer:

  • Separate retrieval from execution. Agents that read untrusted content should have reduced tool access.
  • Content-aware sandboxing. Tag and isolate content by trust level before it enters the context window.
  • Minimal tool grants. If the agent's job doesn't require sending emails, it shouldn't have that tool. Full stop.

Indirect prompt injection is solved structurally — or it isn't solved.