When Your AI Agent Becomes an Insider Threat

A financial services company deploys an AI agent to help vendors list their recent orders. Straightforward enough. The agent has access to the order database and an invoicing tool for related workflows. Standard enterprise integration.

Then an attacker places a small order with a carefully crafted shipping address. Hidden in that address field is a malicious prompt. When a legitimate vendor later asks the agent to list orders, the system ingests the hidden instruction. Instead of simply listing orders as designed, the compromised agent uses its invoicing tool to fetch sensitive vendor data, including bank account details, packages that information into an invoice, and sends it directly to the attacker.

This isn't a theoretical scenario. CyberArk Labs demonstrated this exact attack to illustrate a fundamental shift in enterprise security: AI agents with access to your systems aren't just tools. They're identities. And like any identity with privileged access, they can be compromised and turned against you from the inside.

The attack succeeded because of two failures the CyberArk team identified: lack of input filtering and excessive permissions. As they put it, "an AI agent's entitlements define the potential blast radius of an attack."

The Shift from Assistant to Actor

The AI systems most enterprises deployed in 2024 and 2025 were, fundamentally, sophisticated question-answering machines. You asked ChatGPT a question; it gave you text back. The security risks were real but containable: prompt injection could leak context, sensitive data might be inadvertently shared, outputs could be manipulated. I explored these risks in depth in my post on shadow AI and data exfiltration.

Agentic AI is different. These systems don't just answer questions; they take actions. They browse the web. They execute code. They make API calls. They interact with databases, file systems, and external services on your behalf. The security model has to change accordingly.

When an attacker successfully uses prompt injection against an agentic system, they aren't just extracting information. They're potentially gaining control of an entity that has legitimate access to your internal systems. As Menlo Security's analysis describes it: if an agent has access to your OneDrive, Google Drive, or Salesforce, a successful prompt injection effectively turns your trusted assistant into an insider threat working against you.

This is the logical extension of what I described in The AI Safety Gap No One Is Talking About. That post highlighted the "Agentic Safety Deficit": high-performing enterprises are 4.5x more likely to invest in agentic AI architectures, yet only 37% have robust AI security processes in place. The gap between adoption and protection is widening precisely as the stakes increase.

The OWASP Framework: A Map of the Attack Surface

In December 2025, the OWASP GenAI Security Project released the Top 10 for Agentic Applications 2026, developed through collaboration with more than 100 industry experts. It's the first comprehensive framework for understanding the security risks specific to autonomous AI systems.

The full list reveals how fundamentally different agentic security is from traditional application security:

ASI01: Agent Goal Hijack attacks redirect agent objectives by manipulating instructions, tool outputs, or external content. This is what happened in the CyberArk demonstration: the agent's goal shifted from "list orders" to "exfiltrate bank data" through a single poisoned input.

ASI02: Tool Misuse & Exploitation covers agents misusing legitimate tools due to prompt injection, misalignment, or unsafe delegation. The invoicing tool in the CyberArk attack wasn't compromised; it was functioning exactly as designed. The agent was simply instructed to use it maliciously.

ASI03: Identity & Privilege Abuse addresses attackers exploiting inherited credentials, delegated permissions, or agent-to-agent trust. Every AI agent is an identity. It needs credentials to access databases, cloud services, and code repositories. Those credentials become the attack surface.

ASI04: Agentic Supply Chain Vulnerabilities covers malicious or tampered tools, model descriptors, or agent personas. This connects directly to the third-party data sharing risks I've written about: when your agent integrates with external services, you're trusting their security posture as well as your own.

ASI05: Unexpected Code Execution occurs when agents generate or execute attacker-controlled code. If your agent can write and run code, an attacker who controls its instructions can write and run code on your infrastructure.

ASI06: Memory & Context Poisoning involves persistent corruption of agent memory, RAG stores, or contextual knowledge. Unlike traditional prompt injection, which affects a single session, memory poisoning can reshape agent behavior long after the initial attack.

ASI07: Insecure Inter-Agent Communication allows spoofed messages to misdirect entire agent clusters. As organizations deploy multiple agents that coordinate with each other, the trust relationships between agents become attack vectors.

ASI08: Cascading Failures describes how agentic systems chain decisions across multiple steps. Small inaccuracies compound and propagate, potentially triggering system-wide outages or operational loops. Forrester analyst Paddy Harrington warned: "When you tie multiple agents together and you allow them to take action based on each other, at some point, one fault somewhere is going to cascade and expose systems."

ASI09: Human Oversight Failures reflects a psychological vulnerability: humans tend to over-trust agentic systems. When an AI agent has been helpful and accurate for months, the impulse to verify its actions diminishes.

ASI10: Rogue Agents covers the ultimate failure mode: compromised or misaligned agents that diverge entirely from intended behavior.

The Identity Problem

Every discussion of agentic AI security eventually comes back to identity. An AI agent needs credentials. It needs permissions. It needs the ability to authenticate to services and take actions on behalf of users or the organization.

This means every AI agent is, from a security perspective, a non-human identity that requires the same governance as any other privileged account. CyberArk's research emphasizes that organizations must start treating agents as identities with their own entitlements, access controls, and monitoring requirements.

The challenge is that AI agents often inherit permissions from the users who deploy them or the systems they integrate with. A developer's AI coding assistant might have access to production repositories. A sales team's CRM agent might have access to customer financial data. A customer service agent might have access to account modification capabilities.

In the CyberArk demonstration, the agent had access to both order data and invoicing tools. That combination made sense for its intended purpose. But it also defined the blast radius: an attacker who compromised the agent could access any data and take any action the agent was permitted to take.

This is why the OWASP framework introduces the principle of "Least-Agency" as an extension of least privilege: agents should only be granted the minimum level of autonomy required to complete their defined task. Not minimum permissions; minimum autonomy. The distinction matters because agentic systems can chain actions in ways that static permission models don't anticipate.

What Enterprises Must Do Differently

Forrester predicts that agentic AI will cause a public breach leading to employee dismissals in 2026. The question isn't whether agentic systems will be compromised; it's whether organizations will have implemented adequate controls before they are.

Based on the OWASP framework and the research I've reviewed, here's what needs to change:

Treat agents as identities, not tools. Your identity governance program should include non-human identities. AI agents need their own accounts, their own access reviews, their own monitoring. The agent deployed by your sales team shouldn't inherit the sales VP's permissions; it should have its own, scoped to exactly what it needs.

Implement input sanitization at every ingestion point. The CyberArk attack succeeded because a shipping address field wasn't validated for malicious content. Every piece of data that an agent might process, from form fields to email content to database records, is a potential injection vector. Treat all external input as untrusted, even when it comes from your own systems.

Enforce strong observability. The OWASP framework emphasizes detailed logging of goal state, tool-use patterns, and decision pathways. You need to be able to answer questions like: What actions did this agent take? What tools did it invoke? Why did it make those decisions? Without this visibility, you can't detect compromise until the damage is done.

Sandbox code execution. If your agents can execute code, that execution needs to happen in isolated environments with strict resource limits. Security researchers recommend gVisor-style sandboxing for any agent capable of running code. Assume that attacker-controlled code will eventually reach your execution environment and design accordingly.

Treat tool definitions as untrusted input. The tools an agent can access define what an attacker can do when they compromise that agent. Tool definitions, including MCP server configurations and function schemas, should be treated as high-risk components requiring security review and version control.

Design for cascading failure. Multi-agent systems amplify the impact of any single compromise. If one agent is compromised, what systems can it reach? What other agents does it coordinate with? Build circuit breakers and approval gates that prevent a single failure from propagating across your infrastructure.

The Uncomfortable Truth

The security industry has spent decades learning to protect systems from external attackers. We've built firewalls, intrusion detection, access controls, and monitoring. We've trained employees to recognize phishing and social engineering.

Agentic AI introduces a new category of threat: trusted entities that can be turned against you without anyone noticing. The agent that your team deployed last quarter, the one that's been helpful and reliable, could become an insider threat the moment it processes a maliciously crafted input.

The 82% of executives who say secure AI is essential aren't wrong. But the 76% who aren't securing their AI projects are building time bombs. The implementation gap I've written about before is about to be stress-tested in ways that will end careers and damage organizations.

The organizations that will navigate this transition successfully are the ones treating AI agents as what they actually are: privileged identities that require governance, monitoring, and security controls commensurate with their access. The ones that treat agents as just another productivity tool will learn the lesson the hard way.

The blast radius of an AI agent attack is defined by its entitlements. Make sure you know what those entitlements are before an attacker demonstrates them for you.

The Shift from Assistant to Actor

The OWASP Framework: A Map of the Attack Surface

The full list reveals how fundamentally different agentic security is from traditional application security:

ASI10: Rogue Agents covers the ultimate failure mode: compromised or misaligned agents that diverge entirely from intended behavior.

The Identity Problem

What Enterprises Must Do Differently

Based on the OWASP framework and the research I've reviewed, here's what needs to change:

The Uncomfortable Truth

The blast radius of an AI agent attack is defined by its entitlements. Make sure you know what those entitlements are before an attacker demonstrates them for you.

When Your AI Agent Becomes an Insider Threat

The Shift from Assistant to Actor

The OWASP Framework: A Map of the Attack Surface

The Identity Problem

What Enterprises Must Do Differently

The Uncomfortable Truth

Related Posts

Who Validates the Validator? The Recursive Trust Problem in Agentic Security

The Pentagon Cut 99% of Its Targeting Analysts. Then a School Full of Children Was Hit.

The FBI Is Investigating a Steam Hacker. They Should Be Investigating the Platform.

The Shift from Assistant to Actor

The OWASP Framework: A Map of the Attack Surface

The Identity Problem

What Enterprises Must Do Differently

The Uncomfortable Truth