MCP poisoning is an attack that compromises an MCP (Model Context Protocol) server to inject malicious tool behavior, manipulated outputs, or unauthorized instructions into an AI agent's reasoning pipeline — turning the connector layer into an attack layer.
What Is MCP Poisoning?
MCP servers are the new browser extensions: trusted by default, invisible to the end user, running with access to everything the agent can reach. Poisoning the connector poisons everything downstream.
When an AI agent calls an MCP tool — reads a file, queries a database, sends a message — it trusts the response. There's no cryptographic verification that the MCP server is the legitimate one. There's no signature on the returned data. The agent receives a response and reasons over it. If that response contains instructions, it executes them.
MCP poisoning can happen in multiple ways: a malicious MCP server installed via a compromised package registry, a legitimate server hijacked through a dependency vulnerability, or a typosquat server that gets installed instead of the real one. The mechanism varies. The result is the same — the agent now has an adversary in the loop.
The Trust Problem
The MCP protocol establishes what tools are available and what they do. Nothing in the protocol establishes whether to trust the server providing those tools. That trust is implicit — it's granted at configuration time when someone wires the MCP server into the agent's environment.
This is structurally identical to how browser extensions work, and browser extensions have a decades-long history of being weaponized at scale.
Defense
- Treat MCP servers as third-party dependencies — not as infrastructure. They warrant the same scrutiny as an NPM package you're adding to production.
- Pin versions and verify hashes. An MCP server that auto-updates is an MCP server that can change behavior under you.
- Least-privilege scoping. An MCP server that retrieves calendar data shouldn't have write access to your filesystem.
- Audit tool call outputs. Log what MCP servers return, not just what the agent calls. The injection lives in the response, not the request.
MCP is a powerful abstraction. It's also a trust boundary with no native enforcement — that's the attack surface.