MCP Tool Poisoning and the Protocol That Speed-Ran 25 Years of Security Mistakes

A malicious MCP tool doesn't need to be executed to compromise your system. It just needs to be loaded.

That's the finding from Invariant Labs' tool poisoning disclosure: hidden instructions embedded in MCP tool descriptions are invisible to users but fully visible to AI models. When a poisoned tool enters an agent's context window, the model follows the hidden instructions on its next request, regardless of whether the tool itself is ever called. In a proof of concept, researchers demonstrated silent exfiltration of SSH keys and MCP configuration files from Cursor simply by loading a compromised tool alongside legitimate ones.

The MCPTox benchmark, which tested 20 prominent LLM agents against real-world tool poisoning attacks, found attack success rates exceeding 72%. Even Claude 3.7 Sonnet, one of the more safety-aligned models, refused these attacks less than 3% of the time.

This is the state of MCP security in 2026: the protocol designed to give AI agents access to the world has reproduced every security mistake we've spent 25 years learning to avoid, and it did it in under 18 months.

The Year Everything Went Wrong

The Model Context Protocol launched in late 2024 as an open standard for connecting AI agents to external tools and data sources. By mid-2025, it had become the default integration layer for enterprise AI. By the end of 2025, the breach timeline read like a greatest hits album of security failures.

AuthZed's documentation of MCP breaches catalogs the damage:

April 2025: Invariant Labs demonstrated that a malicious MCP server could silently exfiltrate a user's entire WhatsApp chat history by combining tool poisoning with a legitimate WhatsApp MCP server in the same agent.

May 2025: Malicious GitHub issues hijacked AI assistants through prompt injection, leaking private repository contents, project details, and financial data to public repositories. The root cause: over-privileged Personal Access Tokens combined with untrusted content in the LLM context.

June 2025: An access control logic flaw in Asana's MCP integration exposed one organization's projects and tasks to entirely different customers. Separately, Anthropic's own MCP Inspector had a remote code execution vulnerability (CVE-2025-49596) that exposed filesystems and API keys through an unauthenticated localhost listener.

July 2025: CVE-2025-6514, a command injection vulnerability in the mcp-remote npm package, affected over 437,000 downloads across implementations by Cloudflare, Hugging Face, and Auth0. A malicious authorization endpoint could execute arbitrary commands, stealing API keys, credentials, SSH keys, and Git contents.

September 2025: A supply chain attack through a fake Postmark MCP server silently BCC'd copies of all email communications to attacker-controlled servers.

October 2025: A path traversal vulnerability in Smithery's build configuration compromised 3,000+ hosted MCP server applications, leaking deployment tokens and downstream API keys.

Ten major breaches in ten months. Each one exploiting a different class of vulnerability. Each one familiar to anyone who has spent time in enterprise security.

Why Tool Poisoning Breaks the Mental Model

Traditional supply chain attacks compromise executable code: a malicious package, a tampered build artifact, a backdoored dependency. Tool poisoning is different. The "code" being poisoned is a natural language description.

The attack works because MCP's security model assumes tool descriptions are trustworthy. Clients display simplified summaries to users while passing the full description, including any hidden instructions, to the AI model. The gap between what the user sees and what the model reads is the attack surface.

Invariant Labs demonstrated two variants. In the first, a poisoned add tool included hidden <IMPORTANT> tags instructing the AI to read and exfiltrate sensitive files. In the second, a malicious server's tool description modified how a completely separate, trusted send_email tool behaved, redirecting all messages to an attacker's address while concealing the redirection from the user.

That second variant, tool shadowing, is particularly concerning. It means a single malicious MCP server can alter the behavior of every other tool in the agent's context. The poisoned tool doesn't just attack directly; it corrupts the model's understanding of the entire toolkit.

This is the pattern I wrote about in The YOLO Problem: developers are granting AI agents broad system access without understanding the blast radius. MCP tool poisoning takes that risk and multiplies it. An agent connected to ten MCP servers where one is compromised doesn't have a 10% security problem. It has a 100% security problem, because the poisoned descriptions propagate through the model's context to affect all tool interactions.

And there's a "rug pull" variant: servers can modify tool descriptions after initial approval, enabling post-installation attacks that bypass any upfront review.

Cloudflare's Answer: We Need Firewalls Again

In August 2025, Cloudflare launched MCP Server Portals in open beta. The pitch: a centralized zero-trust gateway through which every MCP connection must pass before it can reach an AI agent.

The architecture is straightforward. Instead of configuring agents with individual MCP server URLs, administrators configure a single Portal URL. The portal authenticates users through corporate identity providers, enforces access policies (multi-factor authentication, device posture checks, geographic restrictions), curates which MCP servers and tools are available, and logs every request for audit.

If this sounds familiar, it should. Cloudflare has built a Cloud Access Security Broker for AI agents. It's the same architectural pattern the industry adopted for SaaS applications a decade ago: when you can't secure every endpoint individually, you put a policy enforcement gateway in the middle.

The Cloudflare blog post framing was telling. They described the current state of MCP connections as "the Wild West of unmanaged connections, impossible to secure." That's the exact language network security vendors used about internet connectivity in the late 1990s, right before firewalls became mandatory.

We're rebuilding the firewall. For AI agents. In 2026.

The Pattern Recognition Problem

Here's what bothers me about the MCP security crisis: none of it is new.

Tool poisoning is supply chain poisoning. We learned this lesson with CodeBreach and CI/CD trust models, with npm typosquatting, with SolarWinds. The attack surface changed from executable code to natural language descriptions, but the principle is identical: if you trust upstream inputs without verification, you inherit their compromises.

Unrestricted network access from MCP servers is the missing security perimeter problem. We saw this with VS Code extensions, with browser plugins, with mobile app permissions. The pattern: new integration points ship without network isolation, and attackers exploit the gap.

Cross-tenant data exposure in Asana's MCP integration is the same class of access control failure that plagues every multi-tenant system. Localhost services treated as unauthenticated APIs is the same mistake we made with Docker, Kubernetes dashboards, and Redis instances.

MCP didn't invent new vulnerability classes. It provided a new protocol through which every existing vulnerability class could manifest simultaneously. In the time it takes most protocols to develop one or two known attack patterns, MCP accumulated them all.

The Moltbook crisis showed what happens when AI agents interact without security controls: network effects amplify network vulnerabilities. MCP tool poisoning is the protocol-level version of the same problem. Every connected tool increases the attack surface for every other tool.

What This Actually Means for Enterprises

Cloudflare's MCP Server Portals are a reasonable response to an unreasonable situation. Centralized policy enforcement, identity-aware access control, and comprehensive logging are the right architectural primitives. But they're a perimeter solution to a problem that's already inside the perimeter.

Tool poisoning works because models follow instructions embedded in their context. Putting a gateway in front of MCP connections helps with authentication, authorization, and visibility. It doesn't help when a legitimately authorized MCP server has been compromised, or when a tool description contains instructions that look benign to a gateway but manipulate the model's behavior.

The Practical DevSecOps analysis of MCP vulnerabilities identifies eight distinct threat categories, and centralized gateways address roughly three of them. Tool poisoning, the confused deputy problem, and over-privileged tool access require defenses at the model and agent level, not just the network level.

Three things need to happen beyond gateway solutions:

Tool description verification needs to become standard. Every MCP tool description should be cryptographically signed, versioned, and auditable. Changes to descriptions after initial registration should trigger re-review. The "rug pull" attack vector exists because there's no mechanism to detect unauthorized description modifications.

Models need to treat tool descriptions as untrusted input. The current architecture passes tool descriptions directly into the model's context as trusted instructions. This is the MCP equivalent of SQL injection: mixing data with instructions in the same channel. Tool descriptions should be isolated, sanitized, or processed through a separate verification layer before entering the model's context.

Agents need least-privilege tool access with runtime monitoring. Connecting an agent to ten MCP servers when it needs two creates unnecessary blast radius. Tools should be loaded on-demand, not pre-loaded into context. Runtime monitoring should flag when an agent accesses tools or data unrelated to the user's request.

The Firewall Moment

In Navy EOD, we had a saying about watching someone approach an uncleared area without proper equipment: "They're not brave, they're uninformed." The AI agent ecosystem spent 2025 running through an uncleared field, and the breach timeline is the predictable result.

Cloudflare building a firewall for MCP connections is the industry's admission that we've been uninformed, that we deployed a protocol designed for unlimited tool access into environments that require zero-trust controls. The fact that we need to learn this lesson again, for a protocol that's barely two years old, suggests the problem isn't technical. It's institutional.

We don't lack the security patterns. We lack the discipline to apply them before the breaches start.

MCP tool poisoning isn't a new class of attack. It's every old class of attack, delivered through natural language instead of code, targeting AI models instead of servers, and exploiting trust assumptions instead of buffer overflows. The vulnerabilities are different in their mechanism and identical in their root cause: we trusted by default when we should have verified by default.

The protocol that was supposed to connect AI agents to the world connected them to every security failure we've already seen. Cloudflare's response is the right first step. But if the last 25 years of security have taught us anything, it's that the firewall is where defense starts, not where it ends.

A malicious MCP tool doesn't need to be executed to compromise your system. It just needs to be loaded.

The Year Everything Went Wrong

AuthZed's documentation of MCP breaches catalogs the damage:

September 2025: A supply chain attack through a fake Postmark MCP server silently BCC'd copies of all email communications to attacker-controlled servers.

October 2025: A path traversal vulnerability in Smithery's build configuration compromised 3,000+ hosted MCP server applications, leaking deployment tokens and downstream API keys.

Ten major breaches in ten months. Each one exploiting a different class of vulnerability. Each one familiar to anyone who has spent time in enterprise security.

Why Tool Poisoning Breaks the Mental Model

And there's a "rug pull" variant: servers can modify tool descriptions after initial approval, enabling post-installation attacks that bypass any upfront review.

Cloudflare's Answer: We Need Firewalls Again

In August 2025, Cloudflare launched MCP Server Portals in open beta. The pitch: a centralized zero-trust gateway through which every MCP connection must pass before it can reach an AI agent.

We're rebuilding the firewall. For AI agents. In 2026.

The Pattern Recognition Problem

Here's what bothers me about the MCP security crisis: none of it is new.

What This Actually Means for Enterprises

Three things need to happen beyond gateway solutions:

The Firewall Moment

We don't lack the security patterns. We lack the discipline to apply them before the breaches start.

MCP Tool Poisoning and the Protocol That Speed-Ran 25 Years of Security Mistakes

The Year Everything Went Wrong

Why Tool Poisoning Breaks the Mental Model

Cloudflare's Answer: We Need Firewalls Again

The Pattern Recognition Problem

What This Actually Means for Enterprises

The Firewall Moment

Related Posts

Apple Built an AI That Sees Your Screen. Attackers Already Know How to Exploit That.

IDMerit Exposed 1 Billion Identity Records. The Real Story Is Why They Had Them.

Google Proved It Can't Fix Prompt Injection. 27 Million Enterprise Users Are Deploying Anyway.

The Year Everything Went Wrong

Why Tool Poisoning Breaks the Mental Model

Cloudflare's Answer: We Need Firewalls Again

The Pattern Recognition Problem

What This Actually Means for Enterprises

The Firewall Moment