Here's a number that should keep security leaders up at night: according to recent research from Kiteworks, 93% of employees admit to inputting information into AI tools without company approval. Not 93% have used AI; 93% have used AI with data they shouldn't be sharing.
The traditional security perimeter was built to stop hackers. But the most significant data exfiltration risk in 2025 isn't coming from outside the organization. It's walking through the front door every morning, opening ChatGPT, and pasting in customer records, financial data, and proprietary code.
The Scale of Invisible Data Movement
Shadow AI has become the leading channel for unauthorized data movement. Research from Reco's 2025 State of Shadow AI Report found that generative AI tools are now responsible for 32% of all corporate-to-personal data exfiltration—more than any other channel. Nearly 40% of files uploaded to these tools contain personally identifiable information or payment card data.
The numbers get worse the closer you look:
- 86% of organizations are blind to AI data flows, and the average enterprise unknowingly hosts 1,200 unofficial applications creating potential attack surfaces
- 71.6% of generative AI access happens via non-corporate accounts, meaning data flows through personal accounts that the organization can't monitor or control
- Between March 2023 and March 2024, the percentage of sensitive data used in AI tools increased from 10.7% to 27%
This isn't a theoretical risk. IBM's 2025 Cost of a Data Breach Report found that shadow AI breaches cost an average of $670,000 more than traditional incidents and affect roughly one in five organizations.
Why Employees Take the Risk
Understanding shadow AI requires understanding why employees use unauthorized tools despite knowing the risks. A 2024 survey found that 45% of workers have used AI tools their employers specifically banned, and 40% said they would violate an anti-AI policy if it meant completing a task faster.
The motivation isn't malicious; it's practical. AI tools make employees more productive. When the official, IT-approved path is slow or nonexistent, people find workarounds. They paste customer data into ChatGPT's free tier (where 54% of sensitive prompts end up) because they need to draft a response now, not after a three-week procurement process.
What makes this particularly dangerous is the nature of the data being shared:
- 46% of leaked data is customer information, including billing and authentication data
- 27% is employee PII and payroll data
- 15% is legal and financial information
- 37% have shared private internal company data through unauthorized tools
The employees sharing this data often don't realize the implications. When you paste text into a free AI tool, that data may be used to train the model. Your customer's private information becomes part of a system that will generate outputs for other users—including competitors.
The Governance Gap
The disconnect between what organizations think they control and what they actually control is staggering. While 33% of executives claim comprehensive AI usage tracking, independent research from Deloitte shows only 9% have working governance systems. Gartner's analysis puts the number of organizations with dedicated AI governance structures at just 12%.
This is the governance gap I explored in AI Governance in Enterprise Data Management, but shadow AI makes the problem exponentially harder. You can't govern what you can't see, and most organizations can't see where their data is going.
The healthcare sector illustrates this perfectly. Healthcare leads all industries in breach costs at $7.42 million per incident, taking 279 days to resolve. Yet only 35% of healthcare organizations can track their AI usage. When I think about the compliance requirements we address with HIPAA's new security mandates, shadow AI represents a compliance gap that no amount of encryption or access controls can address if data is leaving through unsanctioned channels.
The Attacker's Advantage
Shadow AI doesn't just leak data passively; it creates new attack vectors. Microsoft's 2025 Digital Defense Report found that AI-driven phishing is now three times more effective than traditional campaigns. Attackers are using AI to scale social engineering, creating personalized lures based on company information scraped from the web.
But here's what concerns me more: the same AI tools employees use to be more productive are training on the data those employees provide. Every sensitive document pasted into an AI tool becomes potential training data for attacks against your organization, your competitors, customers, and partners.
The compounding effect is real:
- Attackers compromise AI tools through prompt injection and supply chain exploits
- Employees feed sensitive data into those compromised tools
- AI systems generate more convincing phishing and fraud attempts using that data
- Microsoft tracked $4 billion in fraud attempts against their customers in just 12 months
What Data Platforms Must Do
From a product management perspective, the shadow AI problem demands a fundamental shift in how we think about data governance. At Databolt, we've been focused on protecting data within sanctioned systems, but the greater challenge is data that leaves those systems entirely through AI channels.
Data governance platforms need to evolve in several directions:
Visibility Into AI Data Flows
You can't govern what you can't see. Platforms need capabilities to:
- Detect when data is being copied or exported to AI tools
- Identify patterns that suggest AI-based exfiltration (large text pastes, API-like query patterns)
- Track which data classifications are most at risk of AI exposure
- Provide dashboards that show AI data flow trends over time
This builds on the core principle of building AI systems enterprises can trust. Trust requires transparency, and transparency requires visibility.
Policy Controls for AI Access
Rather than blocking AI entirely (which employees will circumvent), platforms should enable granular controls:
- Allow AI usage for non-sensitive data while blocking it for regulated information
- Implement real-time classification that tags data before it can be exported
- Create audit trails that capture what data was exposed to which AI tools
- Enable differential policies by role, department, or data sensitivity level
Sanctioned AI Alternatives
The reason employees use shadow AI is that it solves real problems faster than official channels. Data platforms should integrate AI capabilities that:
- Provide similar productivity benefits within controlled environments
- Keep data within enterprise boundaries
- Maintain audit logs and governance controls
- Support compliance requirements (SOC 2, HIPAA, GDPR)
If the secure path is also the easy path, employees won't need to find workarounds.
Data Tokenization for AI Workflows
This is where platforms like Databolt can provide unique value. By tokenizing sensitive data before it enters any AI workflow, sanctioned or otherwise, you create a protection layer that travels with the data:
- Tokenized data can be used in AI prompts without exposing actual PII
- Even if data reaches unauthorized tools, the sensitive information remains protected
- De-tokenization requires explicit authorization, creating natural audit points
The Regulatory Pressure Is Coming
The governance gap won't be optional for long. U.S. agencies issued 59 AI regulations in 2024, more than double the previous year. IBM found that 32% of breached organizations paid regulatory fines, with 48% of those fines exceeding $100,000.
Organizations that don't address shadow AI risk are accumulating compliance debt that will eventually come due. The question isn't whether regulators will require AI data governance; it's whether your organization will be ready when they do.
The 93% of employees using unauthorized AI tools aren't trying to harm their organizations. They're trying to do their jobs. The responsibility falls on data platforms and governance systems to make secure AI usage as easy as opening ChatGPT and pasting in a prompt. Until then, enterprise data will continue flowing out through the channel organizations can't see.