Blast radius is the maximum scope of harm a compromised or misbehaving AI agent can cause — determined entirely by the structural constraints placed on that agent at design time, not by how well-behaved it is at runtime.
What Is Blast Radius?
It's the single most important question in agentic security. Not "will this agent be compromised?" — eventually, it will. The question is: if it is, how bad can it get?
Blast radius is a structural property. It's set when you design the agent's permissions, tool access, and data scope. It cannot be reduced at runtime by better prompts, guardrail models, or output filters. Those controls operate after the agent has the capability. Blast radius containment operates before.
An agent with access to all company data, all APIs, and no approval gates has a blast radius that encompasses the entire organization. An agent scoped to one customer's data, two specific tools, and a human approval gate for any write operation has a blast radius you can define precisely in a sentence.
Why Behavioral Controls Don't Contain It
The dominant approach to agent security is behavioral: write a better system prompt, add a safety classifier, train the model to refuse bad instructions. These controls assume the agent makes correct decisions at inference time.
Blast radius doesn't care about the agent's decisions. Blast radius is about what the agent can do — not what it will do. A jailbroken agent with minimal permissions is a contained incident. A well-behaved agent with excessive permissions is a disaster waiting for the right injection.
The math: if compromise probability is nonzero (it always is) and blast radius is unlimited, expected harm is unbounded. The only lever that changes expected harm structurally is blast radius.
Designing for Small Blast Radius
- Scope data access per task. An agent processing invoices doesn't need access to HR records.
- Grant minimal tool sets. If the agent's job doesn't require it, don't add the tool.
- Treat lateral movement as a design smell. If a compromised agent can reach adjacent systems, your blast radius is larger than you think.
- Hard gates for irreversible actions. Deletions, outbound messages, financial transactions — require human confirmation. The action is recoverable or it isn't; design accordingly.
Blast radius is the architecture question that every other security control is downstream of. Get this wrong and nothing else you do is sufficient.