The Numbers That Don't Add Up
Anthropic's annualized revenue hit $30 billion in April 2026, tripling from $9 billion at the end of 2025. Over 1,000 businesses now spend more than $1 million annually on Claude. OpenAI's token usage surged from 6 billion to 15 billion tokens per minute in five months.
At the same time, Claude's API uptime over the past 90 days sits at 98.95%, well below the 99.99% standard that established cloud providers maintain. Developers on Reddit report exhausting $100/month Claude Code Max plans in a single hour during peak times. An independent analysis of 7,000 session files and 230,000 tool calls found measurable drops in reasoning depth since February.
Demand grew faster than anyone built capacity for. And the response to that gap tells you everything about who these products are actually for.
Rationing Is the Strategy
When resources are scarce, how companies allocate them reveals their priorities. Here is what the allocation looks like across the AI industry in April 2026:
Anthropic adjusted session limits during peak hours (weekdays, 5 AM to 11 AM Pacific), causing users to burn through their 5-hour windows significantly faster. The company says it affects approximately 7% of users. Anthropic denies any degradation to the underlying model.
GitHub paused Copilot Pro free trials and tightened usage limits, transitioning fully to consumptive "Premium Requests" billing.
Windsurf replaced its credit system with daily and weekly quotas on March 19, locking out heavy users mid-day.
OpenAI shifted Codex from flat-fee to token-based metering and launched a $100/month Pro tier specifically for compute-heavy coding sessions, while adjusting the cheaper Plus plan for shorter sessions.
The pattern is consistent: individual developers experience tighter limits, reduced quotas, and consumption-based pricing. Enterprise customers with annual contracts and dedicated capacity agreements continue to operate with fewer disruptions. Retool founder David Hsu switched providers entirely despite preferring Claude's output quality, citing reliability concerns that enterprise-tier access would have mitigated.
This is not a conspiracy. It is economics.
The Enterprise Bet That Landed
The compute crunch is revealing a strategic divergence that has been building for two years.
Anthropic built for enterprise from the start: API-first, developer-focused tooling, direct sales motions targeting Fortune 500 companies. The result is $30 billion ARR with a customer base concentrated in high-value enterprise accounts that pay for dedicated capacity and premium SLAs. When I wrote about Anthropic's vendor concentration trajectory in February, the revenue was $14 billion. It more than doubled in eight weeks.
OpenAI pursued the consumer market first: ChatGPT, consumer subscriptions, broad availability. That strategy drove adoption at scale, making ChatGPT a household name, but consumer revenue is inherently fragmented. Millions of users paying $20/month generate volume but not the concentrated, predictable revenue that lets you forecast infrastructure investments years ahead.
The numbers now reflect this divergence. Anthropic passed OpenAI in annualized revenue while spending roughly 4x less to train its models. Yesterday's analysis on this blog examined why capital efficiency matters in the AI era. Today's question is what happens to the users who are not in that enterprise bucket.
Enterprise customers are easier to serve at scale not because their needs are simple, but because those needs are predictable. They want coding assistance, call center automation, document processing, cost efficiencies tied to measurable KPIs. Consumer users are heterogeneous: different workflows, different expectations, different tolerance for degradation. When you have to choose where limited GPUs go, the enterprise contract worth $5 million a year wins over a thousand $20/month subscriptions every time.
OpenAI recognized the gap. Codex, which grew to 2 million weekly active users by March, is now positioned as an enterprise agent platform with dedicated security tools, plugin systems, and corporate billing. Companies including Cisco, Nvidia, and Ramp have rolled it out across their developer teams. The consumer-first company is pivoting hard toward enterprise, because enterprise is where the compute allocation can be justified by revenue.
The Supply Side Will Not Save You Soon
The infrastructure needed to close this gap does not exist yet, and it will not for years.
Nvidia Blackwell GPU spot pricing hit $4.08 per hour, a 48% increase from $2.75 just two months earlier. CoreWeave raised prices over 20% and extended minimum contracts from one year to three. Bank of America forecasts that GPU demand will outstrip supply through 2029 and beyond.
AWS customers attempted to buy out Amazon's entire 2026 Graviton capacity; two separate large customers each tried to purchase all available instances. Deloitte estimates that inference workloads will account for two-thirds of all AI compute in 2026, up from a third in 2023. Data centers require years to build, and power grid capacity in many regions is already fully allocated. The Texas grid situation illustrates the scale of the mismatch: 230 GW of data center power requests against just 7.5 GW approved, a 30:1 ratio between what the AI industry wants and what the physical world can deliver.
I wrote in January that the AI infrastructure panic is largely self-inflicted: 75% of enterprises remain stuck in pilot mode, straining infrastructure with experiments that never reach production. That analysis still holds for the enterprise demand side. But the compute scarcity at the model provider level is a different phenomenon. This is not organizational chaos; this is real production workloads exceeding what anyone built capacity for, precisely because the technology started delivering measurable value.
OpenAI's CFO Sarah Friar put it directly: they are making tough trades on what not to pursue. OpenAI shut down Sora, its video generation product, entirely: web and app access ends April 26, with the API following in September. When a company kills a product to reallocate GPUs, the shortage is not theoretical.
The Counterargument: This Is Growing Pains
The charitable read is straightforward: demand spiked, supply is catching up, and this is a temporary adjustment period. The market is building capacity at unprecedented speed. New data centers, custom silicon (Google TPU, Amazon Trainium), and competition from AMD and startups like Groq will eventually close the gap.
There is real evidence for this. Anthropic, Google, and Amazon are all investing tens of billions in infrastructure. Nvidia's next-generation chips promise significant efficiency gains. The 90-day trailing uptime of 98.95% is a snapshot, not a permanent state. Energy infrastructure, while constrained today, is expanding.
This argument is probably right on the supply side. The compute gap will narrow.
But the prioritization framework that emerged during the shortage will not reverse.
What the Rationing Actually Reveals
When investment banks allocate IPO shares, oversubscribed deals do not go to retail investors. They go to institutional clients who provide ongoing revenue and long-term relationships. When airlines overbook flights, the passengers who get bumped are not flying business class. Scarcity does not create hierarchy; it exposes the hierarchy that was already there.
AI companies needed individual developers and enthusiasts to drive adoption, build the ecosystem, create the word-of-mouth that made these products ubiquitous. That strategy worked spectacularly. But the revenue model was always enterprise. The 95% of AI pilots that never reach production are not the revenue base; the 5% that do, run by companies spending seven figures annually, are.
For developers, the practical implication is clear: designing around a single provider's consumer tier is now a known risk. I have written about the vendor monoculture risk that forms when organizations collapse their AI dependencies into a single relationship. Multi-provider architectures, local model capabilities for latency-sensitive workloads, and graceful degradation across providers are not over-engineering. They are the rational response to a market that just told you where you fall in the priority stack.
The compute shortage is real. The rationing is rational. And the hierarchy it revealed was always the business model. The only new information is that now everyone can see it.