An AI Agent Breached McKinsey in Two Hours. The 46 Million Messages Aren't the Scary Part.

The headlines are about the data. 46.5 million plaintext chat messages. 728,000 confidential files. 57,000 user accounts. Those numbers from CodeWall's autonomous breach of McKinsey's internal AI platform Lilli are staggering, and every outlet is fixated on them.

They're fixated on the wrong thing.

The real catastrophe isn't what the agent could read. It's what the agent could write.

What Happened in Two Hours

CodeWall, an autonomous offensive security startup, pointed its AI agent at McKinsey's publicly disclosed HackerOne bug bounty program. The agent, operating with "zero human input" according to CEO Paul Price, autonomously selected McKinsey as a target, mapped its attack surface, and achieved full read-write access to Lilli's production database.

The vulnerability chain was almost embarrassingly basic. Of Lilli's 200+ API endpoints, 22 required no authentication. The agent discovered that while user input values were properly parameterized in SQL queries, the JSON field names were concatenated directly into SQL without sanitization. When error messages reflected live production data, the agent recognized a classic error-based SQL injection vector that OWASP ZAP, the industry-standard scanner, completely missed.

SQL injection. In 2026. On a production AI platform used by 40,000+ consultants processing 500,000+ prompts per month.

Everyone Is Covering the Data. They're Missing the Point.

Yes, the numbers are dramatic. CodeWall's report documented access to:

46.5 million plaintext chat messages covering strategy, M&A, and client engagements
728,000 files including 192,000 PDFs, 93,000 Excel spreadsheets, and 93,000 PowerPoint decks
57,000 user accounts
266,000+ OpenAI vector stores
3.68 million RAG document chunks

Those figures deserve a caveat. As security analyst Edward Kiledjian noted, CodeWall "conflates three categories: what was theoretically reachable, what was actually accessed, and what was verified as exfiltrated." The numbers reflect the blast radius of the vulnerability, not confirmed data theft.

But here's what almost no one is discussing: among those findings were 95 system prompts across 12 model types. All of them writable.

95 Writable System Prompts. One HTTP Call.

CodeWall put it plainly: "No deployment needed. No code change. Just a single UPDATE statement wrapped in a single HTTP call."

That single HTTP call could rewrite how Lilli responds to every prompt from every one of McKinsey's 40,000+ consultants. Not after a code review cycle. Not pending a deployment window. Immediately.

Think about what Lilli is used for. McKinsey consultants rely on it for strategy research, competitive analysis, M&A evaluation, and client recommendations. 72% of the firm's workforce uses it. If a threat actor rewrote Lilli's system prompts to subtly bias how the AI frames competitive landscapes, cites sources, or evaluates acquisition targets, the poisoned output would flow directly into deliverables for Fortune 500 clients. No one receiving the advice would know it had been tampered with.

This isn't a data breach. It's a supply-chain attack vector for corporate strategy itself.

I've written before about how AI agents can become insider threats when their permissions aren't properly scoped. The McKinsey breach demonstrates something worse: an AI platform where the system prompts, the instructions governing every response, were stored in a writable database accessible via SQL injection. A threat actor wouldn't need to steal data. They could change how every AI in the organization thinks.

CodeWall was right about one thing: "AI prompts are the new Crown Jewel assets." McKinsey stored them like they were disposable.

The Firm That Sells AI Strategy Shipped SQL Injection

There's an irony the coverage has been too polite to name directly. McKinsey is the firm that Fortune 500 CEOs pay premium rates for advice on digital transformation, cybersecurity posture, and AI adoption. Their consultants walk into boardrooms and recommend security architectures to the world's largest enterprises.

Their own AI platform shipped with OWASP Top 10 vulnerabilities. As CodeWall put it: "SQL injection is one of the oldest bug classes in the book."

The specific failure is worth understanding because it's so common. McKinsey's developers parameterized user input values, the standard defense against SQL injection. But they concatenated JSON field names directly into queries without sanitization. They secured the obvious part and missed the subtle one. Standard scanners check values; they don't typically audit whether field names are sanitized too.

This connects to a pattern I've been tracking across enterprise AI deployments. As I explored in my analysis of prompt injection risks at enterprise scale, organizations are racing to deploy AI platforms while applying security practices designed for a pre-AI world. The attack surface has changed fundamentally. The defenses haven't caught up.

The AI Agent That Chose Its Own Target

One detail deserves more scrutiny than it's getting. CodeWall's agent didn't just exploit McKinsey. It selected McKinsey as a target autonomously. Price described the process as "fully autonomous from researching the target, analyzing, attacking, and reporting."

CodeWall operated under McKinsey's HackerOne responsible disclosure policy, which gives this specific engagement legitimacy. But the underlying capability raises questions that extend well beyond authorized red-teaming. Kiledjian raised the right one: "An AI system deciding whom to attack raises serious questions about operator control."

This isn't the first time we've seen autonomous AI agents operate with minimal human oversight. CodeWall has already demonstrated this capability against multiple targets, including Jack & Jill, a hiring platform used by Anthropic and Stripe, where the agent achieved full admin access in one hour by chaining four minor bugs.

The gap between "autonomous red-team tool" and "autonomous threat actor" is a policy decision, not a technical one. We've seen the consequences when AI agent governance lags behind capability. Autonomous target selection takes that gap to a different level entirely.

What McKinsey's Forensic Claim Actually Means

McKinsey's spokesperson stated that their investigation, "supported by a leading third-party forensics firm, identified no evidence that client data or client confidential information were accessed by this researcher or any other unauthorized third party."

That statement is carefully worded, and it raises more questions than it answers. Without knowing when the unauthenticated endpoints were first exposed, the forensic claim is inherently limited. The vulnerability could have existed for weeks, months, or since Lilli's launch in 2023. A nine-day window between CodeWall's disclosure on February 28 and public release on March 9 is tight for a comprehensive forensic review of a platform processing 500,000+ prompts per month.

"No evidence of access" is not the same as "no access occurred." Especially for a platform that, by CodeWall's account, stored chat messages in plaintext and maintained writable system prompts with no apparent audit trail.

The RAG Attack Surface Nobody Is Discussing

Buried in the technical details is another finding that deserves its own conversation: 266,000+ OpenAI vector stores and 3.68 million RAG document chunks were accessible through the same vulnerability. This reveals the depth of McKinsey's integration with OpenAI's infrastructure and the volume of proprietary knowledge embedded in retrieval-augmented generation systems.

RAG document stores represent a novel attack surface at the intersection of traditional database security and AI-specific risk. When a SQL injection vulnerability provides access to both the production database and the entire RAG knowledge base, you're not just looking at data exposure. You're looking at the ability to poison the knowledge the AI retrieves and synthesizes. Every agent system built on poisoned retrieval becomes a vector for misinformation at scale.

This is what makes the McKinsey breach more than a conventional application security failure with AI window dressing. The vulnerability was traditional. The consequences are not.

What Every Enterprise Running Internal AI Should Do Now

If your organization operates an internal AI platform, the McKinsey breach is your case study. Here's what to act on.

Treat system prompts as crown jewels. Store them in version-controlled, read-only configurations, not in writable database rows accessible to application queries. If a SQL injection can rewrite your AI's instructions, your entire platform is a supply-chain risk to every decision it informs.

Audit the full attack surface, not just user inputs. Standard security tools missed McKinsey's JSON key injection because they tested values, not field names. Your AI platform has attack surfaces that traditional scanners weren't designed to find, including API authentication gaps, RAG retrieval pipelines, and agent communication channels.

Know your RAG exposure. If your AI platform uses retrieval-augmented generation, understand where those document stores live, who can access them, and what happens if they're tampered with. The pattern of insufficient access controls on internal AI systems is systemic, not unique to McKinsey.

Prepare for autonomous offensive agents. CodeWall's agent is a commercial product. The same capability will be replicated, open-sourced, and weaponized. If your security posture assumes human-speed attacks and human-pattern reconnaissance, you're already behind. I covered this trajectory in detail when ten AI agents destroyed production systems with zero postmortems to show for it.

The Uncomfortable Question

McKinsey will patch these vulnerabilities, hire additional security resources, and publish a responsible AI framework. That's the predictable playbook.

The uncomfortable question is broader: if the world's most prestigious consulting firm, the one advising others on AI adoption and cybersecurity strategy, shipped a production AI platform with SQL injection and writable system prompts, what does the rest of the enterprise landscape look like?

CodeWall's agent found this in two hours. It won't be the last AI agent looking.

They're fixated on the wrong thing.

The real catastrophe isn't what the agent could read. It's what the agent could write.

What Happened in Two Hours

SQL injection. In 2026. On a production AI platform used by 40,000+ consultants processing 500,000+ prompts per month.

Everyone Is Covering the Data. They're Missing the Point.

Yes, the numbers are dramatic. CodeWall's report documented access to:

46.5 million plaintext chat messages covering strategy, M&A, and client engagements
728,000 files including 192,000 PDFs, 93,000 Excel spreadsheets, and 93,000 PowerPoint decks
57,000 user accounts
266,000+ OpenAI vector stores
3.68 million RAG document chunks

But here's what almost no one is discussing: among those findings were 95 system prompts across 12 model types. All of them writable.

95 Writable System Prompts. One HTTP Call.

CodeWall put it plainly: "No deployment needed. No code change. Just a single UPDATE statement wrapped in a single HTTP call."

That single HTTP call could rewrite how Lilli responds to every prompt from every one of McKinsey's 40,000+ consultants. Not after a code review cycle. Not pending a deployment window. Immediately.

This isn't a data breach. It's a supply-chain attack vector for corporate strategy itself.

CodeWall was right about one thing: "AI prompts are the new Crown Jewel assets." McKinsey stored them like they were disposable.

The Firm That Sells AI Strategy Shipped SQL Injection

Their own AI platform shipped with OWASP Top 10 vulnerabilities. As CodeWall put it: "SQL injection is one of the oldest bug classes in the book."

The AI Agent That Chose Its Own Target

What McKinsey's Forensic Claim Actually Means

The RAG Attack Surface Nobody Is Discussing

This is what makes the McKinsey breach more than a conventional application security failure with AI window dressing. The vulnerability was traditional. The consequences are not.

What Every Enterprise Running Internal AI Should Do Now

If your organization operates an internal AI platform, the McKinsey breach is your case study. Here's what to act on.

The Uncomfortable Question

McKinsey will patch these vulnerabilities, hire additional security resources, and publish a responsible AI framework. That's the predictable playbook.

CodeWall's agent found this in two hours. It won't be the last AI agent looking.

An AI Agent Breached McKinsey in Two Hours. The 46 Million Messages Aren't the Scary Part.

What Happened in Two Hours

Everyone Is Covering the Data. They're Missing the Point.

95 Writable System Prompts. One HTTP Call.

The Firm That Sells AI Strategy Shipped SQL Injection

The AI Agent That Chose Its Own Target

What McKinsey's Forensic Claim Actually Means

The RAG Attack Surface Nobody Is Discussing

What Every Enterprise Running Internal AI Should Do Now

The Uncomfortable Question

Keep Reading

Vercel's Breach Ran Through an AI Tool Nobody Scoped

Claude's Extension Ecosystem Has a Confused Deputy Problem

Mistral's Advisory Was Correct. The Questionnaire That Should Have Preceded It Is Missing a Row.

What Happened in Two Hours

Everyone Is Covering the Data. They're Missing the Point.

95 Writable System Prompts. One HTTP Call.

The Firm That Sells AI Strategy Shipped SQL Injection

The AI Agent That Chose Its Own Target

What McKinsey's Forensic Claim Actually Means

The RAG Attack Surface Nobody Is Discussing

What Every Enterprise Running Internal AI Should Do Now

The Uncomfortable Question