What is MCP and why does it create a new security boundary problem?

MCP (Model Context Protocol) is an open protocol that connects AI agents to external tools, data sources, and services. It creates a new attack surface because the boundary between trusted agent context and untrusted external content collapses — an attacker who controls any data the agent reads (GitHub issues, documents, emails) can inject instructions that the agent will execute with the agent's full permissions. Authentication doesn't prevent this; the agent authenticates correctly while being manipulated.

What is indirect prompt injection in the context of MCP?

Indirect prompt injection occurs when an attacker embeds malicious instructions in content the agent will process — not in direct user input. In MCP deployments, this content could be a GitHub issue, a document, an email, or any external data source the agent accesses. The agent reads the content as part of its task, follows the embedded instructions, and executes attacker-defined actions using its legitimate credentials and permissions.

How do you defend an MCP-enabled agent against prompt injection?

The core defense is structural, not behavioral. First, scope the agent's permissions to the minimum required for each specific task — a coding agent should not have read access to finance repositories. Second, treat all external content as untrusted and isolate agent actions that result from processing external content (require human approval for consequential actions). Third, log all agent actions with full context, so attacks are detectable. Safety prompts alone cannot reliably prevent an agent from following instructions embedded in the data it processes.

Why doesn't authentication solve the MCP security problem?

Authentication verifies the agent's identity and grants it access — but it cannot verify that the agent's actions reflect legitimate user intent rather than injected attacker instructions. In the MCP attack pattern, the agent is authenticated, the MCP server logs a legitimate operation, and no access control is violated. The attack succeeds because the agent is doing exactly what it was set up to do — process external content and take actions — just with attacker-defined objectives substituted for user objectives.

The Boundary That Doesn't Exist: Why Every Layer of Your AI Stack Is Now an Attack Surface

The short answer: MCP (Model Context Protocol) creates a new and largely unrecognized attack surface: any content an agent reads — GitHub issues, documents, emails — can contain instructions the agent will follow. Authentication doesn't help. The agent is authenticated; it just executes attacker objectives instead of user objectives. The defense is structural: scope permissions tightly, treat external content as untrusted, and require human approval for consequential actions.

A developer at a mid-sized engineering firm connected their GitHub account to an MCP-enabled coding agent. Standard setup. The MCP server was correctly configured. Authentication was in place. There was nothing obviously wrong.

Then an attacker opened a GitHub issue on a public repo the developer had access to. The issue contained attacker-controlled text — instructions formatted to blend with legitimate content. When the agent processed the issue while working on a related task, it followed those instructions. It authenticated using the developer's valid session, reached into private repositories the developer had access to, and exfiltrated code. The MCP server logged a successful, authenticated operation. Because it was one.

This is not a proof-of-concept. This happened. And if your threat model for MCP deployments starts with "ensure authentication is configured," you have the wrong threat model.

It's Not a Bug. It's a Class.

Security people love to categorize incidents as implementation failures. Bad config. Missing auth. Insufficient input sanitization. These framings feel productive because they imply fixes: patch this, configure that, add a validation layer. The GitHub MCP incident doesn't fit that frame. The server was authenticated. The connection was legitimate. The agent behavior was exactly as designed.

What failed wasn't the implementation. What failed was the underlying assumption that data and instructions are different things.

Traditional security architecture is built on that assumption. Firewalls distinguish between traffic types. Access controls distinguish between principals. Code signing distinguishes between trusted and untrusted execution. These are all downstream of one premise: that we can tell the difference between data and code.

AI agents have quietly dissolved that premise at every layer of the stack. This week gave us three separate incidents that prove the same point from three different angles. They are not three CVEs. They are one structural crisis.

Layer 1: Config Is Code Now

CVE-2025-59536 and CVE-2026-21852 are classified as remote code execution vulnerabilities in AI coding tools. That classification is technically accurate and completely misses the point.

Here's what actually happens: a malicious .claude config file exists in a repository. A developer clones that repo and opens the project in their AI coding tool. Zero additional clicks. Zero prompts asking for confirmation. Arbitrary shell commands execute immediately, including commands that read and transmit the developer's API keys, credentials, and whatever else is accessible in their environment.

The attack surface is a config file.

Developers are trained to treat config files as data — JSON, TOML, YAML, dotfiles. They're text. You read them. You edit them. They don't execute. Except now, in AI coding environments, they do. Every AI coding tool that processes project configuration as an instruction source has made every config file in every repository a potential execution context.

This is not an Anthropic problem. Anthropic is the example here, not the subject. The class is: AI coding tools that parse project files and act on their contents. That class includes every major AI coding assistant on the market. The threat model that said "config files are data" was accurate before these tools existed. It isn't anymore, and nobody updated it.

When you onboard a new engineer, you probably have them clone a few repos on day one. Think about what's in those repos now.

Layer 2: The MCP Trust Model Is Broken by Design

Let's be direct about something the MCP ecosystem has been slow to confront: authentication is not the problem. Three simultaneous MCP failures this week prove this, and they're worth walking through precisely because they cover all the cases.

No auth. A recent scan of over 8,000 MCP servers found that 36% had zero authentication configured. No auth means no perimeter — anything that can reach the server can instruct the agent. This is the obvious failure, and it's the one people focus on. Fix it and you've addressed 36% of the exposed surface.

Broken auth. CVE-2026-27896 is a vulnerability in the MCP Go SDK where Go's case-insensitive JSON parsing allows attackers to bypass security intermediaries. Authentication is present. It applies correctly to the correctly-cased request. The bypass routes around it using field name variants that Go's parser treats as equivalent. The auth layer didn't fail to exist — it failed to apply.

Correct auth. This brings us back to the GitHub MCP incident at the top of this piece. The server was authenticated. The developer was authorized. The agent used legitimate credentials to do exactly what the attacker instructed it to do. Authentication worked precisely as designed.

All three failure modes — no auth, broken auth, correct auth — produce the same outcome: an agent under attacker control with the keys to your environment. If fixing authentication doesn't close the attack surface across all three cases, then authentication is not the right abstraction for this problem.

Here's the actual problem: MCP is not a transport protocol. A transport protocol moves data. MCP routes arbitrary content to agents that execute on it. Attacker-controlled text in a GitHub issue, a Jira ticket, an email body, a Confluence page — all of that content is data to the human reading it and instructions to the MCP-connected agent processing it. There is no structural separation between these categories. There is no mechanism that marks a token as "untrusted content to be summarized" versus "trusted instruction to be executed." The same researchers who identified this class of vulnerability in prompt injection have documented why it's not readily fixable at the model layer: the model receives tokens, not trust levels.

Configuring authentication on your MCP servers is necessary. It addresses the 36%. It does nothing about the architectural reality that MCP hands arbitrary external content to an agent that treats it as commands.

Layer 3: The Supply Chain Is Now Vulnerable to AI Error

Socket Research has been tracking an active npm campaign they've labeled SANDWORM_MODE: 19+ typosquatted packages targeting AI coding tools, deploying malicious payloads when installed. That's a familiar pattern. The novel element is how the package names were chosen.

The attackers didn't guess what developers might mistype. They didn't analyze npm download statistics for popular packages with common typo variants. They fed prompts to AI models until the models hallucinated package names — invented names for packages that don't exist but sound plausible. Then they registered those exact names as malicious packages on npm.

This technique has a name: slopsquatting. It's new enough that most practitioners haven't encountered the term. It deserves attention because it represents something genuinely different in the supply chain threat model.

Traditional typosquatting requires developer error. The developer types lodahs instead of lodash and installs the malicious package. The attack path runs through human mistake. Slopsquatting requires no developer error at all — it requires only AI error. An AI coding assistant recommends a package that doesn't exist. The developer trusts the recommendation and runs npm install. The package exists now, because the attacker created it for exactly this moment.

As AI coding assistants become the primary path to package discovery and installation, the supply chain threat model has to be rebuilt around AI behavior, not human behavior. This is not a metaphor. It is an operational shift. Every package your AI coding tool has ever recommended is a potential slopsquatting target. The package might not have existed when the model was trained. It might exist now, registered by someone who was paying attention to what models hallucinate.

The Convergence Argument

A config file that executes shell commands is a data file that became a command. An MCP message that hijacks agent behavior is a protocol payload that became a system instruction. A hallucinated package name that delivers malware is a model output that became an attack vector.

Three layers. Same failure. AI systems treating untrusted input as trusted instruction.

When a security team patches CVE-2025-59536 and marks the issue resolved, they've fixed one implementation of the failure mode. The config-as-code problem exists in every AI coding tool that processes project files. When a team enables authentication on their MCP server, they've addressed the no-auth case. The architectural trust model remains intact. When npm adds a scanner for known malicious packages, they're chasing yesterday's campaign while today's attacker is prompting models for tomorrow's hallucinations.

The question is not "can attackers inject commands through this specific layer?" The answer to that question is always yes, and patching one layer doesn't change the answer for the others. The question is: when they do, what is the blast radius?

Blast radius is an engineering problem. It has to be solved before the incident, not after it.

The Speed Problem

Here's what makes the architectural gap non-negotiable: agentic attacks don't operate at human speed.

Barracuda's 2026 security findings document what practitioners are already observing: existing incident response playbooks were designed for human-paced attackers. They assume time to detect, time to contain, time to escalate. An alert fires, a human investigates, containment happens. The timeline is minutes to hours.

Formally verified research published this week (arXiv:2602.19555) proved what was already suspected: viral agent loops can propagate across multi-agent pipelines via shared context faster than any human response cycle. An agent that has been injected via an MCP message can, in the same inference cycle, pass compromised instructions to downstream agents in the same pipeline. Before anyone is paged, the context has spread.

You cannot respond to an agent worm the way you respond to a lateral movement campaign. Lateral movement gives you IOCs, network logs, time. Agent propagation leaves legitimate tool calls, authorized operations, and completed tasks. The audit trail shows nothing wrong because nothing was wrong from the system's perspective — it followed instructions.

The IR gap is architectural. Detecting and responding after the fact is not a viable containment strategy for a class of attack that completes before the alert fires. The blast radius has to be engineered before the incident occurs, because there won't be time to engineer it during one.

Three Things to Actually Do

This isn't a "raise awareness" piece. Here's what structural mitigation looks like, specifically.

Treat every config file as code in your threat model. Repository security scanning should flag AI tool config files alongside code files — not just for secrets, but for instruction content. Code review requirements should cover changes to .claude, .cursor, .copilot, and equivalent files. Onboarding workflows should not assume a freshly cloned repo is a safe execution environment for AI tools. The mental model update is simple: if an AI tool reads it, it's code.

MCP connections need capability scoping, not just authentication. Define what the agent is permitted to do through each MCP connection — which repositories, which operations, which resource types — and enforce that scope structurally, not in a system prompt. The GitHub MCP incident exploited the gap between "the agent is authenticated" and "the agent is authorized to do exactly these operations on exactly these resources." Tighten that gap with explicit capability grants before deployment. An MCP connection that can access "all repositories the user has access to" is not least-privileged. It's a staging area for exfiltration waiting for an attacker to write the right GitHub issue.

AI tool package resolution needs supply chain verification at the tool layer. If your AI coding tools are a primary path to package installation, they're a primary path for slopsquatting. Integrate package provenance verification into the tool workflow, not as a separate human review step that assumes developers will catch what models recommend. Lock package versions in lockfiles and treat any AI-recommended package outside the lockfile as requiring explicit sign-off. Consider running AI coding tools in environments where unreviewed package installation is blocked at the system level. The attack doesn't require a developer to make a mistake. Don't design a mitigation that requires them to catch one.

The boundary between data and instruction doesn't exist anymore. Every layer of your AI stack is an execution context — config files, protocol messages, package recommendations, memory stores. Build your architecture as if that's been true since the day you deployed your first AI tool. Because it has been.