In May 2025, Google DeepMind published a research paper titled "Lessons from Defending Gemini Against Indirect Prompt Injections." It's the most transparent assessment of AI security limitations any major AI company has ever released. And the numbers in it should make every enterprise security leader pause before their next Workspace rollout.
Here's what the paper found: an undefended Gemini model succeeded on over 70% of prompt injection test scenarios. After Google applied their best defenses, including adversarial fine-tuning for Gemini 2.5, the most effective attack technique (TAP) still succeeded 53.6% of the time, down from 99.8%. Google framed this as a 47% average reduction in attack success. That's one way to read it. Another way: attacks still work more than half the time against one of the most defended AI models on earth.
In any other security domain, a 53% breach success rate would be a crisis. In AI, it's being positioned as progress.
The Attack Timeline That Should Concern You
The DeepMind paper wasn't published in isolation. It landed in the middle of a cascade of real-world Gemini exploits that demonstrated exactly what the research predicted.
GeminiJack (June-December 2025): Security researchers at Noma Security discovered a zero-click vulnerability in Gemini Enterprise that exploited the RAG pipeline. An attacker could embed malicious instructions in a shared Google Doc, a calendar invitation, or an email. When any employee later performed a routine search in Gemini Enterprise, the AI retrieved the poisoned content and executed the hidden instructions. The result: silent exfiltration of email correspondence, calendar histories, and document repositories through image tags that transmitted data to an attacker's server. No clicks required. No warnings displayed.
Calendar Invite Weaponization (January 2026): Researchers at Miggo demonstrated that calendar event descriptions could carry prompt injection payloads. When a user asked Gemini a simple question like "Am I free on Saturday?", the model loaded all calendar events, including the attacker's payload. The hidden instructions caused Gemini to summarize private meetings and write the details into a new calendar event visible to the attacker. The researchers made a critical observation about why traditional defenses fail here: the malicious instruction isn't an obviously dangerous string. It's a plausible, even helpful-sounding instruction. Pattern matching can't catch what looks legitimate.
Gemini Trifecta (September 2025): Tenable researchers disclosed three simultaneous vulnerabilities: a search-injection flaw in Gemini's Search Personalization Model, an indirect prompt injection in the Gemini Browsing Tool that could exfiltrate user location data, and a prompt injection in Gemini Cloud Assist affecting cloud service management.
Google Translate Bypass (February 2026): Researchers found that Google Translate's Gemini integration could be bypassed with trivially simple prompt injection. Entering a question in a foreign language with English meta-instructions below it caused the service to abandon its core translation function entirely. This wasn't a sophisticated attack; it was the AI equivalent of asking nicely.
Gemini CLI Vulnerabilities (2025): Cyera Research Labs discovered both command injection and prompt injection flaws in Google's Gemini CLI. Unsanitized file paths could execute arbitrary commands, and the shell command validation that blocked $() substitution failed to catch backtick equivalents. Successful exploitation could expose development credentials, source code, and model artifacts.
Each of these vulnerabilities was individually patched. But the pattern matters more than any single fix.
Why RAG Architecture Makes This Unfixable
The GeminiJack attack illustrates a problem that goes beyond any individual vulnerability. It exposes a fundamental architectural limitation of Retrieval-Augmented Generation.
RAG systems work by retrieving external data and feeding it to the model as context alongside the user's query. The model then generates a response based on both. The security problem is structural: the model cannot reliably distinguish between "data I should reference" and "instructions I should follow." Both arrive through the same channel, in the same format, processed by the same architecture.
Google's fix for GeminiJack involved separating Vertex AI Search from Gemini Enterprise and modifying how both interact with retrieval systems. That addressed the specific pipeline. But the underlying confusion between content and instruction isn't something you can patch at the architecture level. Every RAG-based enterprise AI system faces this same constraint.
I explored this dynamic in When Your AI Agent Becomes an Insider Threat, where CyberArk demonstrated how poisoned data in an order database could cause an AI agent to exfiltrate vendor bank details. The GeminiJack attack is the same pattern at Google's scale: the AI doesn't know the difference between data it should learn from and instructions it should execute.
Google's own research confirms this. Their paper tested multiple static defenses, including "spotlighting" (marking data boundaries) and "self-reflection" (having the model check its own behavior). Both showed initial promise against basic attacks but collapsed against adaptive adversaries who learned to bypass them. As the DeepMind team wrote, evaluating only non-adaptive attacks creates a false sense of security.
Google's Five-Layer Defense (And Its Limits)
Google deserves credit for transparency. In June 2025, they published their layered defense strategy for protecting Gemini, a five-layer approach:
- Prompt injection content classifiers: ML models trained on adversarial data to detect malicious instructions in emails, files, and other content.
- Security thought reinforcement: Embedding security instructions around user prompts to remind the model to ignore adversarial content.
- Markdown sanitization and URL redaction: Preventing external image rendering (blocking the exfiltration channel GeminiJack used) and redacting malicious URLs via Google Safe Browsing.
- User confirmation framework: Requiring human approval for risky operations like deleting calendar events.
- End-user security notifications: Alerting users when defenses activate.
These layers work synergistically, and together with adversarial training they made Gemini 2.5 Google's most secure model family. But "most secure" is relative. The paper's own data shows that TAP attacks still breached those defenses 53.6% of the time. And here's the counterintuitive finding that should give every AI company pause: more capable models weren't necessarily more secure. Better instruction-following abilities sometimes made models easier to attack, because a model that's better at following legitimate instructions is also better at following malicious ones.
The adversarial training approach, where Google fine-tuned the model on realistic attack scenarios, achieved meaningful improvement. But the paper acknowledges that attackers adapt too, and the cost asymmetry is brutal: crafting an effective attack trigger costs less than $10, while defending against it requires continuous retraining at significant computational expense.
The Enterprise Gap
Here's where the math gets uncomfortable. According to industry statistics, Gemini has 27 million enterprise users globally, with 8 million paying licenses. Gemini for Google Workspace is used in 73% of enterprise accounts. In the first half of 2025 alone, Google Workspace integrations with Gemini drove over 2.3 billion document interactions. Healthcare and finance, the sectors with the most sensitive data, are the fastest-growing adopters at 3.4x growth.
Every one of those integrations expands the attack surface. When I wrote about Gemini Personal Intelligence in January, I flagged the privacy implications of an AI that can reason across your entire Google data footprint: Gmail, Calendar, Docs, Photos, Search. The prompt injection research validates that concern with hard data. Each integration point is a potential injection vector, and the more data the model can access, the more damage a successful attack can cause.
This is the same deployment-outpacing-security pattern I documented with Clawdbot's YOLO problem: convenience is winning, and the security consequences compound silently. The difference is that Clawdbot exposed developer environments. Gemini Enterprise exposes entire organizations.
The gap between deployment velocity and defense maturity is widening, not closing. IDC's Asia/Pacific research found that enterprises cite prompt injection as the second most concerning AI-driven threat. Yet industry data shows 65.3% of organizations lack dedicated prompt injection defenses. They know the risk exists. They haven't built the controls.
What This Means for Enterprise Security
Google's transparency is commendable, and their five-layer approach represents real defensive progress. But the data tells a story the marketing won't: prompt injection is not a bug to be patched. It's a constraint to be managed.
Here's what enterprises deploying Gemini (or any RAG-based AI system) should be doing:
Assume compromise in your threat model. If attacks succeed over 50% of the time against Google's best defenses, your internal AI deployments are not immune. Treat AI-connected data stores the same way you treat DMZ systems: assume hostile input will reach them.
Audit your RAG attack surface. Map every data source your AI system can access. Every shared drive, every email inbox, every calendar, every document repository is a potential injection point. The blast radius of a GeminiJack-style attack is defined by what the AI can reach.
Implement data-level controls, not just model-level. Google's defenses focus heavily on the model layer: classifiers, prompt hardening, adversarial training. But the shadow AI data exfiltration problem I've written about before applies here too. If sensitive data flows through AI pipelines without field-level protection, no amount of model hardening fixes the exposure. Tokenization and data masking at the source remain the most reliable controls.
Don't confuse layered defense with solved problem. Five layers that each reduce attack success are genuinely valuable. But multiplicative reduction from a 70%+ baseline doesn't reach zero. It reaches "better." Plan your incident response around the residual risk, not the marketing claim.
Monitor for the attacks you can't prevent. Google added end-user notifications as their fifth layer for a reason: some attacks will get through. Build detection capabilities that can identify when an AI agent's behavior deviates from expected patterns, when data is flowing to unexpected destinations, when calendar events or documents contain content that resembles instruction sets.
The Honest Assessment
Google's paper is remarkable not because it describes novel attacks, but because it quantifies what the security community has suspected: prompt injection is the SQL injection of the AI era, except we don't have prepared statements yet. The industry is converging on a truth that most vendors won't say publicly: there is no complete solution. There is only risk management.
The question isn't whether your enterprise AI can be compromised through prompt injection. Google's own numbers say it can. The question is whether you've built the controls to detect it, contain the blast radius, and respond before the damage compounds.
For 27 million enterprise users, the answer to that question matters more than any press release about layered defenses.