There's a comfortable assumption spreading through enterprise security teams: if you deploy Azure OpenAI through a private endpoint, or access Vertex AI through VPC Service Controls, or run Bedrock via AWS PrivateLink, your data stays private. The model is "inside" your infrastructure. Your prompts never touch the public internet. Problem solved.
This assumption is wrong, and the consequences of misunderstanding it are significant.
According to McKinsey's 2025 survey, 78% of organizations now use AI in at least one business function, with 71% specifically adopting generative AI. The enterprise LLM market reached $6.7 billion in 2024 and is projected to hit $71.1 billion by 2034. Yet 44% of enterprises cite data privacy and security as their top barrier to LLM adoption. Many believe "private" VPC deployments solve this concern.
They don't. VPC provides network isolation, not data isolation. And the distinction matters enormously.
Two Kinds of "Private" LLM
When enterprises say "private LLM," they're often conflating two very different architectures.
The first is a truly self-hosted model: you download open-source weights like Llama or Mistral, deploy them on your own infrastructure (cloud or on-premises), and operate them entirely within your control. Your data never leaves your environment because the model physically runs on machines you own or lease.
The second is a VPC-connected provider model: you access Azure OpenAI, Google Vertex AI, or AWS Bedrock through a private endpoint. Traffic doesn't traverse the public internet. You configure VPC Service Controls or PrivateLink. The network path is "private."
Most enterprises are deploying the second architecture while expecting the privacy guarantees of the first. The marketing language encourages this confusion. Terms like "private endpoint," "isolated environment," and "data residency" suggest the provider can't see your data. The reality is more nuanced.
The Trust Boundary Problem
In VPC deployments of provider-hosted LLMs, the model weights remain provider-owned. Your plaintext prompts and completions cross from your infrastructure into provider-controlled infrastructure. The privacy assurances you receive are contractual, not technical.
This is the same dynamic I explored in The Invisible Attack Surface: Why Third-Party Data Sharing Is 2026's Biggest Security Risk. When you share data with any third party, that data exists in a security context you no longer control. VPC doesn't change this fundamental reality; it just makes the network path more direct.
Consider what "private endpoint" actually means. Azure OpenAI's documentation explains that a private endpoint creates "a security boundary" where traffic flows through your VPC rather than the public internet. But the data is still processed on Microsoft's infrastructure. You've privatized the pipe, not the destination.
AWS Bedrock's documentation makes a similar point: PrivateLink ensures "traffic between your VPC and Amazon Bedrock will not leave the Amazon network." Your data stays on Amazon's network. That's different from staying on your network.
The distinction matters because a compromise of the provider's operational infrastructure, not just model weights, could expose VPC customer data. Your security posture now includes the security posture of your LLM provider.
What Actually Exists on Provider Infrastructure
Even in VPC deployments, significant data exists outside your direct control. Understanding this operational data is essential for accurate risk assessment.
Logging and Monitoring
Azure OpenAI's default behavior retains prompts and generated content for up to 30 days "to detect and mitigate abuse." This data is stored in Microsoft's environment and accessible to authorized Microsoft personnel for security monitoring. Zero Data Retention (ZDR) is available, but only for customers with enterprise agreements, and it requires filing a support ticket to enable. You must verify it's active by checking your resource's JSON configuration for a contentLogging field.
Caching for Performance
Google Vertex AI caches customer data in-memory to reduce latency and accelerate responses. By default, this cache has a 24-hour TTL. The data is isolated at the project level and stored only in-memory, not at-rest, but it exists on Google's infrastructure during that window. You can disable caching via API, but the default is on.
Fine-Tuning Artifacts
This is where the exposure becomes more significant. When you fine-tune a model, your training data doesn't just influence the weights; it generates artifacts that persist. AWS Bedrock's documentation acknowledges that fine-tuning "generates model artifacts which are stored in the model provider AWS account." These artifacts are encrypted with your KMS key, but they physically reside in the provider's account.
Microsoft's March 2025 Azure OpenAI terms update disclosed that fine-tuning operations might involve "temporary data relocation" outside your selected geography. If you're fine-tuning in US West, your training data might be processed in a centralized location before the final model is returned to your region.
Memory and Stateful Features
Newer features compound the exposure. Azure OpenAI's Assistants API, Stored Completions, and vector stores all persist data within your resource. Files uploaded for fine-tuning or retrieval stay in the provider's infrastructure. These aren't transient caches; they're persistent stores that become part of your extended attack surface.
Inference vs. Fine-Tuning: Different Risk Profiles
A common reassurance is that inference data isn't incorporated into model weights. This is technically accurate but misses the broader picture.
Standard API calls to base models are stateless in the sense that your prompts don't retrain the model. Other users won't receive outputs influenced by your data. But statelessness doesn't mean your data is invisible. It passes through the provider's systems, gets logged (by default), and exists in operational infrastructure that's subject to the provider's security controls.
Fine-tuning is a different matter entirely. When you fine-tune, you're deliberately incorporating your data into a model. That model's weights now contain statistical representations of your training data. Research published in 2025 found that training data extraction attacks can be more effective than previously understood, with extraction rates underestimated by as much as 2.14X in prior work. Uniquely formatted data like email addresses and phone numbers are particularly vulnerable to extraction.
The operational data vectors, logs, caches, and artifacts, exist regardless of whether you're doing inference or fine-tuning. The question is whether you're adding permanent data incorporation on top.
The Compliance Implications
The regulatory picture is increasingly unfavorable for the "VPC equals privacy" assumption.
GDPR Right to Erasure
A June 2025 research paper from the University of Tübingen established that large language models may qualify as personal data under EU privacy regulations. The reasoning: if a model can output information about a specific individual, the model itself contains personal data. This creates obligations throughout the development lifecycle.
The GDPR's right to erasure presents a fundamental technical challenge. As GDPR compliance analysis notes, "Unlike traditional databases, LLMs encode information within their model parameters, making it difficult to isolate and delete knowledge related to an individual." Machine unlearning techniques exist but face what researchers call "severe challenges" at scale. France's CNIL has provided guidance allowing alternatives like output filtering, but the underlying problem remains unsolved.
This complicates the governance challenges I discussed in AI Governance in Enterprise Data Management. When data enters an LLM through fine-tuning, fulfilling deletion requests may require model retraining. When data exists in provider logs for 30 days, your data subject access request response now depends on the provider's cooperation.
Cross-Border Data Transfer
VPC deployments don't necessarily keep data in your jurisdiction. Google Vertex AI's "data residency" controls cover data at rest in your specified region, but processing may occur elsewhere. The U.S. DOJ Data Security Rule, effective April 2025, creates new restrictions on data transfers to "countries of concern." Organizations now face a complex matrix of requirements that VPC alone doesn't address.
Gartner predicts that by 2027, more than 40% of AI-related data breaches will stem from improper use of generative AI across borders. The cross-border issue isn't just regulatory; it's operational. Every API call to a VPC-connected LLM can become a cross-border data transfer event depending on where the provider processes your request.
The Breach Statistics Are Telling
IBM's 2025 Cost of a Data Breach Report found that 13% of organizations reported breaches involving AI models or applications. Of those breached, 97% lacked AI access controls. The absence of governance creates exposure that technical controls alone can't address.
The shadow AI statistics are even more concerning. As I covered in Shadow AI and the Data Exfiltration Risk Enterprises Can't See, unauthorized AI usage adds an average of $670,000 to breach costs. But even "authorized" VPC deployments create similar risk profiles if enterprises misunderstand what privacy they're actually getting.
Tokenization: Solving the Trust Boundary
If VPC doesn't provide data isolation, what does? The answer is ensuring sensitive data never reaches the provider in the first place.
This is where tokenization becomes relevant. NIST's Privacy Engineering Program endorses tokenization as a privacy-by-design control. The OWASP LLM Top 10 identifies prompt injection and data exfiltration as the top two risks. Tokenization directly addresses both.
The approach is straightforward: before any data leaves your environment for an LLM, sensitive values are replaced with tokens. The prompt "Summarize account activity for John Smith, SSN 123-45-6789" becomes "Summarize account activity for [TOKEN_001], SSN [TOKEN_002]." The LLM processes the tokenized prompt, returns a tokenized response, and your system de-tokenizes only for authorized users.
What makes this effective is that even if the provider's logs are compromised, even if the fine-tuned model is extracted, even if operational infrastructure is breached, the sensitive values aren't there to steal. The provider receives and processes meaningless tokens.
Vaultless Tokenization
Traditional tokenization stores token-to-value mappings in a central vault. This creates its own single point of failure. If the vault is compromised, all tokens can be reversed.
Vaultless tokenization takes a different approach. Capital One's Databolt, launched in April 2025, uses patented cryptographic methods to generate tokens without maintaining a central mapping database. Tokens are created deterministically using secure algorithms and encryption keys, meaning the same input produces the same token consistently. This preserves analytical utility: you can aggregate, join, and analyze tokenized data because the relationships are maintained.
The performance matters for LLM workflows. Databolt processes up to 4 million tokens per second with low latency. For real-time inference scenarios, tokenization can't be a bottleneck. The architecture is also cloud-native, with integrations for Databricks and Snowflake that embed tokenization directly into data pipelines feeding AI systems.
At Capital One, we run over 100 billion tokenization operations monthly across hundreds of applications. Early Warning Services, the financial services technology consortium, has deployed Databolt specifically because, as their VP of Cloud Security noted, tokenization is "a critical next step in further enhancing our defenses" given the growing complexity of data protection.
This builds on the principles I discussed in Building AI Systems That Enterprises Can Trust: protection that travels with the data, not protection that stops at a perimeter.
Honest Caveats
Tokenization is powerful, but it's not magic. Three limitations deserve acknowledgment.
Context Leakage
Surrounding text can reveal information even when the sensitive value is tokenized. "Patient [TOKEN] was diagnosed with early-stage pancreatic cancer during their visit to Memorial Oncology Center" contains meaningful data regardless of whether the name is tokenized. Tokenization reduces risk; it doesn't eliminate it. Robust solutions pair tokenization with broader data classification and redaction strategies.
Coverage Challenges
Tokenization only protects data that's detected and tokenized. If your PII detection misses a field, that field goes to the LLM in plaintext. Detection accuracy matters. This requires ongoing tuning as data formats evolve and new sensitive data types emerge.
Utility Trade-offs
Heavily tokenized inputs may affect response quality. An LLM asked to "Summarize [TOKEN_001]'s account history" can't use contextual information about the person that might improve the summary. For some use cases, this trade-off is acceptable. For others, you need to carefully balance privacy against utility. The goal is tokenizing what must be protected while leaving enough context for the model to be useful.
The Real Question
VPC deployments of provider-hosted LLMs satisfy procurement checklists. They demonstrate that you've configured private endpoints, enabled VPC Service Controls, and followed the vendor's "best practices" documentation. Audit teams can check boxes.
But procurement compliance isn't the same as data privacy. The question enterprises should be asking isn't "Where does the model run?" but "What data reaches the model?"
For inference workloads where prompts contain sensitive data, VPC alone doesn't prevent that data from existing in provider logs, caches, and operational systems. For fine-tuning workloads, VPC doesn't prevent training data from being incorporated into model weights that live on provider infrastructure.
The organizations taking AI data privacy seriously are moving beyond network-level controls to data-level controls. They're tokenizing sensitive values before any LLM interaction, sanctioned or shadow, ensuring that even when data crosses trust boundaries, the crown jewels stay home.
That's the difference between procurement privacy and operational privacy. The former checks boxes. The latter protects data.