Anthropic unbundled Claude, Claude Code, and Cowork from enterprise seat fees, moving to per-token billing. Revenue tripled in four months. But Retool's founder already switched to OpenAI. The uptime number says why.
Anthropic's enterprise billing restructure is framed as a documentation update. The direction is unmistakable.
"The seat fee only covers access to the platform and doesn't include any usage. All usage across Claude, Claude Code, and Cowork is billed separately at standard API rates, based on what your team actually consumes."
— Anthropic Enterprise Help Center (updated language, April 2026)Before the official billing change, the real signal came from David Hsu — the founder who preferred Claude but chose OpenAI anyway.
Retool's founder publicly stated he preferred Anthropic's Claude Opus 4.6 model for quality. The model was the better choice on every benchmark that mattered to Retool's engineering teams.
98.95% uptime sounds close to 99.99%. In practice it's ~92 hours of downtime per year vs ~53 minutes. When Anthropic's API went down, Retool's customers couldn't ship. That's a customer problem, not a benchmark problem.
Hsu moved Retool off Claude to OpenAI — picking the inferior model because the inferior model stayed up. He told the Wall Street Journal this publicly.
This is not a capabilities story. It's a reliability story. The verdict that should scare Anthropic: an enterprise buyer chose the worse model because it was more dependable. You cannot patch trust with a pricing page.
One extra nine does not sound significant. Translated to downtime hours per year, the difference is disqualifying for enterprise buyers.
Enterprise buyers trained on 20 years of cloud discipline treat the gap between 98.95% and 99.99% as contractually disqualifying. At 98.95%, Anthropic offers roughly 92 hours of downtime per year. At the standard, 53 minutes. For customers shipping code to their own customers via Claude, those 92 hours are revenue events — not outage reports.
Run rate tripled in four months. The response was a billing restructure — not a victory lap. That tells the story.
Anthropic's annual revenue run rate grew from $9B (end 2025) to $30B (Q1 2026) — tripling in roughly four months. More than 1,000 customers now pay over $1M/year. No comparable historical growth benchmark exists.
Snowflake took a decade to reach $1B run rate. Google's search-advertising ramp was the previous record. Anthropic reportedly covered nearly 4× that pace in a single quarter. Axios came back empty hunting for a historical comparison.
One engineer running Claude Code overnight can consume the token budget of 200 casual chat users. OpenAI saw token usage jump from 6B/minute (October 2025) to 15B/minute (late March 2026). The flat-fee subsidy became untenable.
Blackwell GPU rental prices climbed 48% in two months. CoreWeave raised prices 20%+. Bank of America expects demand to outstrip supply through 2029. PJM needs 15 additional GW of AI power by early 2027.
For ~18 months, AI companies ran a subsidised model. The agentic agent was the thirsty customer who ended the party.
"$20/month Claude Pro subscription: access to one of the best coding models in the world, running on compute that cost Anthropic significantly more than you paid. Power users always understood this. The arithmetic was never a secret."
"Agentic workflows do not sip. They chain tools across steps and run loops without asking. They spawn subagents carrying their own contexts. Cached tokens burn by the hundred thousand. One engineer running Claude Code overnight can consume the token budget of 200 casual chat users."
"Open bars work until somebody shows up thirsty. In this case, the thirsty somebody is the agent."
— Marcus Schuler, Implicator.aiEach change defended individually. Together, the shape is obvious: features lifted from the subscription, meters welded on, customers sent the difference.
| Change | Date | Impact | Framing |
|---|---|---|---|
| Cache TTL cut (60 min → 5 min) | Early March 2026 | Higher quota burn | Optimisation |
| Thinking effort defaults lowered | March 3, 2026 | Reduced reasoning depth | Balance |
| Session caps tightened (peak hours) | Late March 2026 | 7% users affected | Product change |
| OpenClaw moved to usage metering | April 4, 2026 | Up to 50× bill increase | Pricing change |
| Enterprise help center restructure | April 2026 | All usage now metered | Formal restructure |
This is not an Anthropic story. The flat-fee era for agentic AI workloads is over across the industry — the memo is circulating.
Enterprise seat fee now covers access only. All usage billed per-token at API rates. OpenClaw moved to metered billing April 4. Session caps tightened. Cache TTL cut from 60 min to 5 min.
OpenAI shifted Codex from flat-message pricing to token metering in early April 2026. Token usage grew from 6B/minute (October) to 15B/minute (late March) — a break in the load curve requiring the same math.
GitHub tightened Copilot usage limits on April 10 2026. Enterprise AI coding assistance moving from generous flat access toward consumption-based caps aligned to actual compute cost.
Windsurf replaced its credit system with daily and weekly quotas in March 2026. Another signal that the AI coding tool market is standardising around consumption metering over flat access.
"The flat-fee era is over. The memo is circulating. Expect every major AI provider running agentic workloads to move to usage-based enterprise billing within six months."
— Marcus Schuler, Implicator.aiWhat enterprise AI buyers should do before their next contract renewal with Anthropic or any AI provider moving to usage-based billing.
Before renewal, measure actual usage across Claude, Claude Code, and Cowork. Know your baseline: tokens per user per day, which workflows are heaviest, which teams drive the most agentic load.
Price your baseline consumption at standard API rates. Compare to your current seat fee. For heavy agentic users, the difference can be 10–50×. Know your number before Anthropic tells you at renewal.
Push for committed spend tiers with per-token ceilings. Avoid open-ended billing for agentic workloads — a single overnight Claude Code session can consume hundreds of dollars without a budget ceiling.
At 98.95%, Anthropic falls short of the 99.99% cloud standard. Require contractual uptime commitments with financial penalty clauses. If the service goes down during customer-facing hours, you need recourse beyond a status page.
Single-provider dependency creates both reliability risk (when Anthropic is down, all Claude workloads fail) and pricing leverage risk (no alternative = no negotiating power). Evaluate OpenAI, Google, and open-source models for substitutable workloads.
Under per-token billing, efficiency is a cost lever. Review Claude Code session architecture, cache strategies, and agent loop designs. The 5-minute cache TTL (down from 60 minutes) requires session design changes to avoid redundant token burn on long coding sessions.
This is an industry-wide migration, not a one-time event. OpenAI, GitHub Copilot, and Windsurf all moved in parallel within weeks of Anthropic. Establish a quarterly review of AI provider pricing, usage patterns, and alternative options — the market is evolving faster than annual vendor reviews can track.
7% of users hit new session caps in late March. That's not random — it's the heavy users who drive disproportionate token consumption. They are the signal for your organisation's cost exposure.
The cut from 60 to 5 minutes means sessions longer than 5 minutes re-process cached context. If Claude Code sessions commonly run 30+ minutes, your quota burn is now 6–12× what it was before March.
OpenClaw moved from bundle to meter on April 4. If your engineering teams use OpenClaw agent frameworks, that line item needs its own budget allocation — potentially 50× higher than the implicit previous cost.
98.95% = 92 hours/year. That number needs to appear in your AI vendor evaluation rubric alongside model quality, pricing, and security. Make it a contractual requirement, not a post-incident negotiation.
When AI costs triple on renewal, finance and procurement will ask questions. Prepare the narrative: this is industry-wide, it reflects actual compute consumed, and the alternative is self-hosting or switching providers.
Claude's growth engine was individual developers pulling it into enterprise contracts. Session caps and surprise quota math damage that pipeline. Watch whether your developer community is staying enthusiastic or quietly switching.
Key questions about Anthropic's per-token billing shift and its industry-wide implications.
Key terms from Anthropic's per-token pricing shift and the end of the flat-fee era.
Pricing model charging per AI token consumed (input + output) at a fixed rate per million tokens. Replaces flat-fee subscriptions; costs scale directly with usage, especially problematic for agentic workflows.
Fixed monthly/annual fee providing generous or unlimited AI access. For AI labs, this model subsidised heavy users with revenue from light users — unsustainable as agentic workloads drove usage 10-50× beyond casual chat.
AI usage where autonomous agents chain tools, spawn sub-agents, and run extended loops. Consumes far more tokens per session — one engineer's overnight Claude Code session can match 200 casual users' daily usage.
Popular AI agent framework moved from Anthropic's flat-fee bundle to usage-metered billing on April 4 2026. Heavy users reported bills up to 50× higher after the unbundling.
Time-to-live for cached prompt tokens. Anthropic cut this from 1 hour to 5 minutes in early March 2026, causing long coding sessions to re-process cached context and burn additional quota.
Service Level Agreement for availability. Anthropic's 98.95% equals ~92 hours downtime/year. Cloud-standard 99.99% equals ~53 minutes/year. The gap caused Retool's David Hsu to switch to OpenAI despite preferring Claude's model quality.
Annualized revenue projection from current performance. Anthropic's run rate grew from $9B (end 2025) to $30B (Q1 2026) — tripling in approximately four months, a pace exceeding all historical software growth benchmarks.
Model-side parameter controlling reasoning depth per request. Anthropic lowered Claude Code's defaults March 3 2026 to reduce compute cost — confirmed after an AMD senior director published a 6,852-session analysis on GitHub.
GPU supply-demand imbalance of 2025–2026. Blackwell rental prices +48% in two months. CoreWeave +20%+. BofA: demand exceeds supply through 2029. PJM: 15GW additional AI power needed by early 2027.
Quietly reducing compute, reasoning depth, or features of a fixed-price plan rather than raising list price. Session caps, cache TTL cuts, lower thinking-effort defaults — all reduce effective value per dollar without a nominal price increase.