OpenAI just launched Codex Security, an AI security agent that scanned 1.2 million commits during its beta period and surfaced 792 critical and 10,561 high-severity vulnerabilities. The tool, formerly codenamed "Aardvark," spent a year in private beta before graduating to research preview for ChatGPT Pro, Enterprise, Business, and Edu customers. It found 14 vulnerabilities severe enough for CVE inclusion across OpenSSH, GnuTLS, Chromium, and other major open-source projects.
Two weeks earlier, Anthropic launched Claude Code Security with similar capabilities and similar headline numbers: 500+ zero-day vulnerabilities found in production codebases.
The coverage has focused on the impressive detection metrics and what this means for traditional security vendors. But there's a more fundamental question that nobody seems to be asking: if AI is now sophisticated enough to find these vulnerabilities, why can't it just write secure code in the first place?
The Question That Sounds Obvious but Isn't
"Why doesn't AI just write secure code?" is the question every non-security person asks when they see tools like Codex Security. It sounds reasonable. If the model is smart enough to spot the bug after the fact, surely it can avoid introducing it during generation.
The answer reveals something important about the nature of security itself.
Security is not a property of individual lines of code. It's a property of systems. A function that sanitizes user input is perfectly secure in a web application where the input comes from an HTML form. The same function is a critical vulnerability in an API endpoint where the input comes from a trusted internal service that has already sanitized the data, because now you've created a double-encoding bug that bypasses downstream validation.
The Register documented exactly this pattern in a real-world case: an AI coding tool generated a honeypot that pulled client-supplied IP headers and treated them as visitor IPs. In isolation, the code looked fine. In context, it was a trust boundary failure that enabled attackers to inject payloads through the IP header, opening the door to local file disclosure and server-side request forgery.
Static analysis tools, including Semgrep and Gosec, failed to flag it. The vulnerability wasn't syntactic. It was contextual.
Security Is Systemic, Not Syntactic
This is the core insight that explains why the "just write secure code" approach doesn't scale, and why security-focused AI tools are emerging as a distinct category rather than a feature of coding assistants.
When an AI coding tool generates a function, it operates within a narrow context window: the current file, maybe some imports, a prompt. But security vulnerabilities live at the intersections. They exist in the trust boundaries between services, the assumptions one repository makes about another, the runtime configurations that determine whether a permissive IAM role is a convenience or a backdoor.
Consider the scope of what Codex Security actually does. According to SiliconANGLE's coverage, the tool creates a temporary copy of your entire repository in an isolated container, studies code files over several days to produce a threat model, then tests discovered flaws in a sandbox to determine exploitability. OpenAI describes it as functioning "more like a security researcher who studies a codebase, maps potential attack paths, and proposes fixes, rather than a static scanner."
That's fundamentally different from what happens when a coding assistant generates code inline. Generation is local. Security is global. You can't evaluate whether a piece of code is secure without understanding every other piece of code it interacts with, every trust boundary it crosses, every assumption it inherits from the system architecture.
This is why The Register found that AI reasoning models "repeatedly produce AWS IAM roles vulnerable to privilege escalation even when prompted for secure configurations." The model can generate a technically correct IAM policy. It can't evaluate whether that policy is safe in the context of your specific AWS environment, your other roles, your cross-account access patterns, and your organizational security posture. That evaluation requires the kind of multi-day, full-codebase analysis that Codex Security performs, not the split-second generation that coding assistants do.
The Numbers Are Getting Worse, Not Better
The urgency behind these tools is real. I wrote about the security debt crisis from AI-generated code over a year ago when Veracode's research showed 45% of AI-generated code failing security tests. The numbers have deteriorated since then.
CrowdStrike's research now puts AI-generated code at 2.74x more likely to contain vulnerabilities than human-written code. Aikido Security's 2026 report attributes one in five breaches to AI-generated code. And the incidents aren't theoretical: Wiz security researchers found that PromptBase, a site built entirely through vibe coding ("didn't write one line of code," its founder said), had a misconfigured database exposing 1.5 million authentication tokens, 35,000 email addresses, and private messages.
The most concerning pattern isn't the bugs themselves. It's the failure mode. AI agents tasked with fixing runtime errors have been documented removing validation checks, relaxing database policies, and disabling authentication flows to make error messages go away. As one analysis noted, "the simplest way to get a user to accept a code block is often to make the error message go away. Unfortunately, the constraint causing the error is sometimes a safety guard."
This is what I was getting at in AI Didn't Eliminate the Bottleneck. It Moved It.: the companies dying aren't the ones that can't build fast enough. They're the ones that can't evaluate what they've built. And when the AI itself is removing safety guardrails to satisfy the developer's immediate need, the evaluation gap widens.
The Fox Guarding the Henhouse
Now here's the part that should make every security leader uncomfortable.
OpenAI is simultaneously the largest producer of AI-generated code (via ChatGPT, Codex CLI, and IDE integrations) and now the vendor selling the security tool to find bugs in that code. Anthropic occupies a nearly identical position with Claude Code and Claude Code Security.
This is a new form of recursive technical debt: AI creates the vulnerabilities, then AI charges you to find them. The parallel to a pharmaceutical company that causes an illness and then sells the treatment is uncomfortable but apt.
Consider the incentive structure. If Codex Security's analysis revealed that AI-generated code is systematically and irreparably insecure, would OpenAI have an incentive to publicize that finding? The same company that profits from developers using its coding tools now profits from those developers needing security tools. The worse AI-generated code is from a security perspective, the more valuable Codex Security becomes.
I'm not suggesting that Codex Security isn't a genuinely useful tool. The 14 CVEs it found across major open-source projects are real vulnerabilities that real attackers could exploit. The 84% reduction in redundant alerts and 50%+ decrease in false positives represent meaningful improvements over traditional static analysis. And the "Codex for OSS" program offering free scanning to open-source maintainers addresses a real need.
But the "Codex for OSS" program also means OpenAI gains read access to millions of open-source repositories. Whether that data feeds back into model training is a question nobody's asking publicly.
What This Actually Means for Security Teams
Here's the practical reality for anyone running an enterprise security program:
Security-focused AI tools are a real category now, and it's going to grow. Both leading AI companies have launched dedicated security products within weeks of each other. This isn't a feature; it's a market. Expect Google, Microsoft, and every major security vendor to follow. The underlying logic is sound: AI can hold more context than any human reviewer and can analyze codebases at a scale that traditional static analysis can't match.
These tools supplement; they do not replace. Codex Security's three-step process (threat modeling, validation, fix proposals) still produces outputs that humans need to review and merge. In understaffed teams using vibe coding precisely because they lack engineering resources, the same judgment gap that created the vulnerabilities will govern whether fixes are correctly applied. I explored this judgment bottleneck in depth, and tools like Codex Security don't resolve it.
Demand independent validation. The 50% false positive reduction claim is self-reported. Traditional AppSec tools like Snyk, Checkmarx, and SonarQube submit to independent benchmarks like the OWASP Benchmark. OpenAI has published no such independent validation for Codex Security. Until they do, treat the metrics as marketing.
Evaluate the data access tradeoffs. Codex Security runs in OpenAI's cloud, meaning your code leaves your environment. Claude Code Security runs locally by default. For regulated industries, healthcare, financial services, government, this architectural difference matters enormously. Know where your code goes and what the vendor retains.
The Bigger Picture
We're watching something interesting happen in real time. The AI industry has created a problem (insecure AI-generated code at scale), recognized the problem, and is now building a parallel product line to address it. This is a pattern I've tracked across multiple posts, from the original security debt crisis to Claude Code Security's market impact to the VoidLink malware framework that demonstrated what happens when AI coding capability meets zero security oversight.
The question isn't whether AI security tools will be useful. They will be. The question is whether the net effect on software security is positive: does AI find more bugs than it creates? Nobody has published the data to answer that question. And the companies best positioned to run that analysis are the ones with the least incentive to share the results.
Until someone proves otherwise, the safest assumption is that AI coding tools and AI security tools are both necessary, that neither eliminates the need for human security expertise, and that the vendor selling you both sides of this equation deserves the same scrutiny you'd give any supplier with a structural conflict of interest.
Security was always systemic. AI didn't change that. It just made the system bigger.