The Posture Is Deteriorating, Not Improving
In September 2025, Cisco Talos scanned Shodan for exposed Ollama servers and found 1,139 instances, of which 214 (18.8%) responded to an unauthenticated request with a live model attached. Eight months later, Intruder's Benjamin Marr published a population-level scan of roughly two million hosts pulled from certificate transparency logs and identified about one million exposed AI services. The Ollama subset grew to more than 5,200 instances, with 1,652 (31%) responding to unauthenticated requests with a model attached. The exposure rate did not hold steady while organizations matured; it climbed by twelve percentage points.
Marr stated the conclusion bluntly: "The AI infrastructure we researched is more vulnerable, exposed, open, and misconfigured on average than any other software we've ever investigated." That is a strong claim, and the rest of his data carries it. The signal was already visible at smaller scale in January 2026, when SlowMist's research surfaced more than a thousand exposed Clawdbot servers leaking conversation history and API keys over Shodan; the Intruder findings are the population-level confirmation of a trend the smaller scans forecast.
This trend invalidates the most comfortable assumption in 2026 enterprise AI risk discussions, which is that the shadow-AI problem narrowed to employees pasting confidential data into ChatGPT. I covered that vector in an earlier post on shadow AI data exfiltration; the employee-paste pattern is still real. The Intruder findings expose a second vector: the organization itself running self-hosted LLM runtimes, agent builders, and workflow engines on the open internet. The procurement and contract-language consequence of that second vector is the subject of this post.
The Three Exposure Patterns That Matter for Procurement
Intruder catalogued exposures across many AI projects; three of them carry the procurement thesis directly.
Ollama. Marr identified more than 5,200 exposed Ollama instances, and 1,652 of them (31%) responded to an unauthenticated request with a live model attached. The runtime defaults to bind on 0.0.0.0:11434 with no authentication, and the operators running these instances did not change that default. Beyond the population-scan finding, SecurityWeek reported on CVE-2026-7482 ("Bleeding Llama"), a flaw in which three API calls in sequence extract prompts, conversation messages, and embedded API keys from any unauthenticated Ollama deployment. The CVE estimate ran to roughly 300,000 affected Ollama deployments globally.
Open WebUI. Intruder found more than 12,000 exposed Open WebUI instances on the internet, 24 of which had authentication disabled entirely. The project's own hardening documentation opens by stating that Open WebUI is "built for private, trusted networks" and that public exposure is not the intended deployment model. The product is therefore not failing its design; its operators are deploying it outside the perimeter the design assumes.
Flowise. Intruder identified more than 2,650 exposed Flowise instances, and 92 of them leaked full agent workflow definitions, system prompts, integration configurations, and credential references. One sampled AI project had more than 90% of its exposed instances carrying serious known CVEs. Flowise is an agent builder; its exposed instances reveal not just runtime endpoints but the entire agent design, including which tools the agent can call and which credentials it uses to call them.
These three projects share a structural property: each is the kind of self-hosted infrastructure a vendor would adopt to deliver an "agentic" or "private-LLM" feature to enterprise customers, and each defaults to permissive exposure unless an operator does the hardening work the docs describe.
The Frontier-Model-Wrapper Finding
The most financially direct finding in the Intruder dataset is not about prompt theft. Of the 1,652 Ollama servers responding with a live model, Marr identified 518 wrapping paid frontier-model APIs from Anthropic, OpenAI, Google, DeepSeek, and Moonshot. Whoever operates those servers paid for the API keys; whoever discovers the open endpoint can run inference against those keys for free. The endpoint is, in effect, an unmetered relay between strangers and a paid frontier-model account.
This shifts the failure mode from data exposure to direct financial loss. A discovered Ollama wrapper does not require an attacker to chain a vulnerability or run an inference attack; it requires a curl request. Vendors who deliver private-LLM features by wrapping a frontier-model account, and who self-host the wrapper, carry an obligation that no current questionnaire surfaces: rotate and rate-limit the wrapped credentials, and bound the financial blast radius if the wrapper is exposed.
The Counterargument: This Is a Vulnerability-Management Problem
A reasonable security leader will read the Intruder findings and conclude that this is a vulnerability-management story. The exposed instances should be discovered by EASM tooling, the operators should be notified, and the configuration should be hardened. Procurement is not the right lever; operations is.
That counterargument has weight, and Intruder's findings do support a vulnerability-management response. It is also incomplete. SOC 2 attestations cover a vendor's managed infrastructure: the production VPC, the customer-facing API tier, the database tier audited within the report's scope. When a vendor sells an "agentic" feature built on Flowise, Langflow, or n8n self-hosted inside their VPC for delivery to enterprise customers, the question is whether those control planes are inside or outside the SOC 2 audit boundary. The questionnaires currently in circulation do not ask, and "we have SOC 2 Type II" therefore does not answer the question. This is the same procurement-language gap I covered in the recent post on shared-kernel SaaS risk: an attestation that covers a defined scope cannot speak to infrastructure outside that scope.
The vulnerability-management response handles instances the operator knows about. The procurement response handles instances the operator's customers do not know exist.
Conversation Logs Are a New Sensitive Data Class
Every enterprise data classification policy I have read was written before chat became the primary work surface. Public, Internal, Confidential, and Restricted: those four levels assume documents, datasets, and structured records. The Intruder findings make clear that prompts and responses now routinely contain trade-secret-tier content: infrastructure topology pasted into an agent for debugging, source code pasted in for review, medical workflow definitions inside Flowise agent specs, credential references inside system prompts. The intelligence value of a conversation log is not theoretical, either; the ChatGPT-as-spy-diary case I analyzed earlier this year showed prompts and responses rich enough to reconstruct an entire covert operation from the log alone.
Yet conversation logs sit in default storage on whichever LLM runtime is generating them, almost never encrypted at rest with customer-controlled keys, almost never subject to a retention schedule, and almost never indexed for DLP. IBM's 2025 Cost of a Data Breach Report found that organizations with high levels of shadow AI saw an average of $670,000 in additional breach costs, and only 37% had policies to manage AI or detect shadow AI. That number is consistent with a data class that nobody has classified yet. I made a related argument in the post on the VPC privacy illusion: "private" deployment does not by itself produce privacy when the conversation log is the leaked artifact.
The first move for any AI or security leader is to write conversation logs into the classification policy as a tier, with explicit encryption, access-control, retention, and deletion requirements. The second move is to require vendors to attest to those same requirements.
The Lifecycle Frame, Not the OSS Frame
It would be easy to read the Intruder findings as an indictment of open-source AI tooling, and that reading would be wrong. Every successful infrastructure category passes through a default-permissive phase before its security-by-default moment arrives; I traced the Docker 2016 and MongoDB 2017 versions of that story in the post on the MCP shared-responsibility model, where the same lifecycle is now playing out across 200,000 exposed Model Context Protocol servers. The Intruder scan is the parallel signal one layer up the stack. Ollama, Open WebUI, and Flowise have not yet had their security-by-default forcing event; the population data Marr surfaced is the early read on what will produce it.
The procurement consequence is independent of how that forcing event resolves. Vendors who adopt these tools to deliver enterprise features will deploy whatever defaults exist at adoption time, and the customer must price that into the contract. The same researcher's earlier work on agentic identity bypass, which I covered in the post on OpenClaw, EDR, and DLP, made the broader point: the AI agent's effective authority inside enterprise systems is the input that procurement language must control.
Two Clauses for the 2026 AI Vendor MSA
The Intruder research produces a specific procurement output, not a specific patch. The two questions below are the operational lever; they are missing from the questionnaires I have reviewed in the past year, and they should live in the master service agreement, not the questionnaire, so that the answer is contractually binding. They extend rather than replace the five risk-class questions I built off the Five Eyes guidance in the post on agentic AI procurement.
-
Self-hosted LLM and agent-builder disclosure. "Do you self-host any LLM runtime, agent-builder, or workflow engine (including but not limited to Ollama, Open WebUI, Flowise, Langflow, and n8n) in support of the contracted service, and is each such control plane within the perimeter of your most recent SOC 2 Type II audit?" The answer is binary, the answer is verifiable, and the answer informs whether the vendor's attestation actually covers the surface that processes your data.
-
Conversation log handling. "How are conversation logs (prompts, responses, system prompts, and intermediate agent state) encrypted at rest, access-controlled, retained, and deleted? Provide the encryption key custody model, the retention period, the deletion verification method, and the named role responsible for each." The answer surfaces whether the vendor has classified conversation logs at all, which is itself the signal.
A third clause is worth adding for any vendor whose product wraps a paid frontier-model API: a key-rotation cadence and a financial blast-radius cap, with credential rotation triggered on any detection of unauthenticated egress.
The 2026 shadow IT story is not the employee pasting into ChatGPT; it is the organization running Ollama on a public certificate-transparency-visible host and not knowing it. The Intruder scan is the data; the two clauses above are what the data demands of the next contract you sign.