Anthropic Shifts to Per-Token Enterprise Pricing

What Actually Changed

Anthropic's enterprise billing restructure is framed as a documentation update. The direction is unmistakable.

Before: Flat-Fee Plan

Seat fee included usage allowance
Claude, Claude Code, Cowork bundled
Fixed monthly cost regardless of token volume
Grandfathered terms for legacy customers
OpenClaw included in bundle
1-hour prompt cache TTL

After: Per-Token Billing

Seat fee covers platform access only
All usage billed separately at API rates
Variable cost based on actual token consumption
Legacy customers must migrate at next renewal
OpenClaw metered separately (up to 50× higher)
Prompt cache TTL cut to 5 minutes

The Framing

Enterprise help center updated quietly
"The seat fee only covers access to the platform"
No list-price increase announced
The Information first reported the shift
Framed as tied to compute crunch
Each change defended as "optimisation"

"The seat fee only covers access to the platform and doesn't include any usage. All usage across Claude, Claude Code, and Cowork is billed separately at standard API rates, based on what your team actually consumes."

— Anthropic Enterprise Help Center (updated language, April 2026)

The Retool Verdict

Before the official billing change, the real signal came from David Hsu — the founder who preferred Claude but chose OpenAI anyway.

🏆

Model Quality

David Hsu's Preference

Retool's founder publicly stated he preferred Anthropic's Claude Opus 4.6 model for quality. The model was the better choice on every benchmark that mattered to Retool's engineering teams.

💀

The Problem

The Service Kept Dying

98.95% uptime sounds close to 99.99%. In practice it's ~92 hours of downtime per year vs ~53 minutes. When Anthropic's API went down, Retool's customers couldn't ship. That's a customer problem, not a benchmark problem.

🔄

The Decision

Switched to OpenAI

Hsu moved Retool off Claude to OpenAI — picking the inferior model because the inferior model stayed up. He told the Wall Street Journal this publicly.

⚠

The Signal

A Trust Problem, Not Performance

This is not a capabilities story. It's a reliability story. The verdict that should scare Anthropic: an enterprise buyer chose the worse model because it was more dependable. You cannot patch trust with a pricing page.

The Revenue Explosion — and the Subsidy Problem

Run rate tripled in four months. The response was a billing restructure — not a victory lap. That tells the story.

📈

Growth

$9B → $30B Run Rate

Anthropic's annual revenue run rate grew from $9B (end 2025) to $30B (Q1 2026) — tripling in roughly four months. More than 1,000 customers now pay over $1M/year. No comparable historical growth benchmark exists.

⚖

Context

Fastest Growth Ever Recorded

Snowflake took a decade to reach $1B run rate. Google's search-advertising ramp was the previous record. Anthropic reportedly covered nearly 4× that pace in a single quarter. Axios came back empty hunting for a historical comparison.

💻

The Problem

Agentic Workloads Broke the Math

One engineer running Claude Code overnight can consume the token budget of 200 casual chat users. OpenAI saw token usage jump from 6B/minute (October 2025) to 15B/minute (late March 2026). The flat-fee subsidy became untenable.

🏗

Compute Crunch

Supply Can't Keep Up

Blackwell GPU rental prices climbed 48% in two months. CoreWeave raised prices 20%+. Bank of America expects demand to outstrip supply through 2029. PJM needs 15 additional GW of AI power by early 2027.

The Policy Changes: A Timeline

Each change defended individually. Together, the shape is obvious: features lifted from the subscription, meters welded on, customers sent the difference.

Early March 2026

Prompt Cache TTL Cut: 1 hour → 5 minutes

Claude Code's prompt-cache time-to-live slashed from 60 minutes to 5 minutes. Long coding sessions now re-process previously cached context, burning additional quota for identical work.

March 3, 2026

Thinking Effort Defaults Lowered

Anthropic's Claude Code lead confirmed thinking-effort defaults were lowered "to the best balance across intelligence, latency and cost." An AMD senior director's GitHub analysis of 6,852 Claude Code session files had identified the change publicly before the acknowledgement.

Late March 2026

Session Caps Tightened (Peak Hours)

Five-hour session limits tightened for Pro and Max users during weekday peak hours (5am–11am PT). Approximately 7% of users started hitting caps they had not previously encountered.

April 4, 2026

OpenClaw Moved to Metered Billing

Popular AI agent framework pulled from the flat-fee bundle and moved to usage-metered billing. Heavy users reported bills potentially 50× higher than their previous subscription cost.

April 2026

Enterprise Help Center Updated

Anthropic formally documented the new billing structure: seat fee covers access only; Claude, Claude Code, and Cowork usage billed separately at standard API rates. Organizations on legacy plans must migrate by next renewal.

Change	Date	Impact	Framing
Cache TTL cut (60 min → 5 min)	Early March 2026	Higher quota burn	Optimisation
Thinking effort defaults lowered	March 3, 2026	Reduced reasoning depth	Balance
Session caps tightened (peak hours)	Late March 2026	7% users affected	Product change
OpenClaw moved to usage metering	April 4, 2026	Up to 50× bill increase	Pricing change
Enterprise help center restructure	April 2026	All usage now metered	Formal restructure

Industry Migration: Every Provider Is Moving

This is not an Anthropic story. The flat-fee era for agentic AI workloads is over across the industry — the memo is circulating.

🤖

Anthropic

Claude / Claude Code / Cowork

Enterprise seat fee now covers access only. All usage billed per-token at API rates. OpenClaw moved to metered billing April 4. Session caps tightened. Cache TTL cut from 60 min to 5 min.

🌐

OpenAI

Codex Token Metering

OpenAI shifted Codex from flat-message pricing to token metering in early April 2026. Token usage grew from 6B/minute (October) to 15B/minute (late March) — a break in the load curve requiring the same math.

🐙

GitHub

Copilot Limits Tightened

GitHub tightened Copilot usage limits on April 10 2026. Enterprise AI coding assistance moving from generous flat access toward consumption-based caps aligned to actual compute cost.

🌊

Windsurf

Credits → Daily Quotas

Windsurf replaced its credit system with daily and weekly quotas in March 2026. Another signal that the AI coding tool market is standardising around consumption metering over flat access.

"The flat-fee era is over. The memo is circulating. Expect every major AI provider running agentic workloads to move to usage-based enterprise billing within six months."

— Marcus Schuler, Implicator.ai

Enterprise Response: 7 Steps

What enterprise AI buyers should do before their next contract renewal with Anthropic or any AI provider moving to usage-based billing.

Audit your actual AI token consumption

Before renewal, measure actual usage across Claude, Claude Code, and Cowork. Know your baseline: tokens per user per day, which workflows are heaviest, which teams drive the most agentic load.
Model the flat-fee to per-token cost delta

Price your baseline consumption at standard API rates. Compare to your current seat fee. For heavy agentic users, the difference can be 10–50×. Know your number before Anthropic tells you at renewal.
Negotiate usage caps and price ceilings in new contracts

Push for committed spend tiers with per-token ceilings. Avoid open-ended billing for agentic workloads — a single overnight Claude Code session can consume hundreds of dollars without a budget ceiling.
Add contractual uptime SLA requirements

At 98.95%, Anthropic falls short of the 99.99% cloud standard. Require contractual uptime commitments with financial penalty clauses. If the service goes down during customer-facing hours, you need recourse beyond a status page.
Diversify your AI provider portfolio

Single-provider dependency creates both reliability risk (when Anthropic is down, all Claude workloads fail) and pricing leverage risk (no alternative = no negotiating power). Evaluate OpenAI, Google, and open-source models for substitutable workloads.
Optimise agentic workflows for token efficiency

Under per-token billing, efficiency is a cost lever. Review Claude Code session architecture, cache strategies, and agent loop designs. The 5-minute cache TTL (down from 60 minutes) requires session design changes to avoid redundant token burn on long coding sessions.
Build a quarterly AI pricing review cadence

This is an industry-wide migration, not a one-time event. OpenAI, GitHub Copilot, and Windsurf all moved in parallel within weeks of Anthropic. Establish a quarterly review of AI provider pricing, usage patterns, and alternative options — the market is evolving faster than annual vendor reviews can track.

Track the 7% figure

7% of users hit new session caps in late March. That's not random — it's the heavy users who drive disproportionate token consumption. They are the signal for your organisation's cost exposure.

Watch cache TTL in your session design

The cut from 60 to 5 minutes means sessions longer than 5 minutes re-process cached context. If Claude Code sessions commonly run 30+ minutes, your quota burn is now 6–12× what it was before March.

Price OpenClaw separately in your budget

OpenClaw moved from bundle to meter on April 4. If your engineering teams use OpenClaw agent frameworks, that line item needs its own budget allocation — potentially 50× higher than the implicit previous cost.

Uptime is now a procurement criterion

98.95% = 92 hours/year. That number needs to appear in your AI vendor evaluation rubric alongside model quality, pricing, and security. Make it a contractual requirement, not a post-incident negotiation.

Document the growth story for internal stakeholders

When AI costs triple on renewal, finance and procurement will ask questions. Prepare the narrative: this is industry-wide, it reflects actual compute consumed, and the alternative is self-hosting or switching providers.

Monitor the developer adoption pipeline

Claude's growth engine was individual developers pulling it into enterprise contracts. Session caps and surprise quota math damage that pipeline. Watch whether your developer community is staying enthusiastic or quietly switching.

Frequently Asked Questions

Key questions about Anthropic's per-token billing shift and its industry-wide implications.

What exactly changed in Anthropic's enterprise pricing? ▾

The enterprise plan's seat fee now covers only platform access. All usage across Claude, Claude Code, and Cowork is billed separately at standard API rates based on actual consumption. Organizations on older seat-based plans with fixed usage allowances must migrate at next contract renewal or lose grandfathered terms.

Why did Retool's founder switch from Claude to OpenAI? ▾

David Hsu told the Wall Street Journal he preferred Anthropic's Claude Opus 4.6 for quality, but Anthropic's service kept going down. At 98.95% uptime, the API logged roughly 92 hours of downtime annually. When customers can't ship because the AI platform is unavailable, model quality becomes irrelevant. Hsu chose the inferior model because it was reliable.

What is the compute crunch driving the pricing shift? ▾

Blackwell GPU rental prices climbed 48% in two months. CoreWeave raised prices 20%+ in late 2025 and now requires three-year contracts from smaller customers. Bank of America expects compute demand to outstrip supply through 2029. PJM, the eastern US grid operator, needs 15 additional gigawatts for AI by early 2027. Anthropic's run rate grew $21B in four months — flat-rate subsidies are unsustainable at that scale.

Why was the flat-fee model called an 'open bar'? ▾

For ~18 months, AI companies subsidised heavy users with revenue from lighter users — compute costs exceeded subscription revenue for power users. Agentic workflows ended this: one engineer running Claude Code overnight can consume the token budget of 200 casual chat users. OpenAI's token usage grew from 6B to 15B/minute in 5 months. The open bar had to close.

What specific policy changes did Anthropic make in Q1 2026? ▾

Four documented changes: (1) Prompt cache TTL cut from 1 hour to 5 minutes in early March, increasing quota burn for long sessions. (2) Thinking-effort defaults lowered March 3 "to balance intelligence, latency and cost." (3) Five-hour session limits tightened for peak hours (5am–11am PT, weekdays), affecting ~7% of users. (4) OpenClaw moved from flat-fee bundle to usage metering April 4, with heavy users facing bills up to 50× higher.

Are other AI providers making the same move? ▾

Yes. OpenAI shifted Codex from flat-message to token metering in early April 2026. GitHub tightened Copilot limits April 10. Windsurf replaced its credit system with daily/weekly quotas in March. The pattern is industry-wide. Every major AI provider running agentic workloads is moving to usage-based billing. This is not an Anthropic decision — it's a compute economics decision made simultaneously across the industry.

What is OpenClaw and why does the 50× figure matter? ▾

OpenClaw is a popular AI agent framework previously included in Anthropic's flat-fee bundle. Moved to usage metering April 4 2026. Heavy users reported bills potentially 50× higher. The 50× figure illustrates the hidden subsidy: when a feature used heavily by a small fraction of users is extracted from a flat fee and metered, the cost delta can be enormous for those users.

How fast has Anthropic's revenue grown? ▾

Anthropic's annual run rate grew from $9B (end 2025) to $30B (Q1 2026) — tripling in roughly four months. 1,000+ customers pay over $1M/year. Snowflake took a decade to reach $1B run rate. Axios found no historical growth comparison: Anthropic reportedly covered nearly 4× the ground of Google's search-advertising ramp in a single quarter. The billing restructure is the response to this growth, not a celebration of it.

What is AI 'shrinkflation'? ▾

Shrinkflation in AI: quietly reducing the compute, reasoning depth, or features provided by a fixed-price plan rather than raising the list price. In Anthropic's case — session limits tightened, cache TTL slashed, thinking effort defaults lowered — each change reduces effective value per dollar without a nominal price increase. An AMD senior director published a GitHub analysis of 6,852 Claude Code sessions identifying the pattern before Anthropic acknowledged it.

What is the trust problem facing Anthropic? ▾

Claude's growth engine was individual developers adopting Claude Code on cheap Pro plans and then pulling it into their employers' enterprise contracts. Tightening session caps, cutting cache TTL, metering OpenClaw, and raising costs all damage that pipeline. Business Insider spoke with affected users: one restructured his entire workday around limit resets; another breaks projects into four micro-chats to conserve tokens. The affection for Claude is intact. The relationship with the pricing model is not. Trust cannot be patched with a pricing page.

What is Anthropic's API uptime problem? ▾

Anthropic's API achieved 98.95% uptime over the 90 days ending April 8 2026. Established cloud providers (AWS, Azure, GCP) maintain 99.99% commitments. The gap: 92 hours of downtime per year vs. 53 minutes. For enterprise buyers with customers depending on Claude-powered features, 92 hours of downtime per year is contractually disqualifying — David Hsu's Retool switch was the most prominent public consequence.

What should enterprises do in response to AI usage-based billing? ▾

Audit actual token consumption before renewal; model the flat-fee to per-token cost delta for heavy users; negotiate usage caps and price ceilings in new contracts; require contractual uptime SLAs with financial penalties; diversify across multiple AI providers; optimise agentic session architecture for token efficiency (especially given the 5-minute cache TTL); build a quarterly AI pricing review cadence — the market is evolving faster than annual vendor reviews can track.

Glossary

Key terms from Anthropic's per-token pricing shift and the end of the flat-fee era.

Per-Token Billing

Pricing model charging per AI token consumed (input + output) at a fixed rate per million tokens. Replaces flat-fee subscriptions; costs scale directly with usage, especially problematic for agentic workflows.

Flat-Fee Subscription

Fixed monthly/annual fee providing generous or unlimited AI access. For AI labs, this model subsidised heavy users with revenue from light users — unsustainable as agentic workloads drove usage 10-50× beyond casual chat.

Agentic Workload

AI usage where autonomous agents chain tools, spawn sub-agents, and run extended loops. Consumes far more tokens per session — one engineer's overnight Claude Code session can match 200 casual users' daily usage.

OpenClaw

Popular AI agent framework moved from Anthropic's flat-fee bundle to usage-metered billing on April 4 2026. Heavy users reported bills up to 50× higher after the unbundling.

Prompt Cache TTL

Time-to-live for cached prompt tokens. Anthropic cut this from 1 hour to 5 minutes in early March 2026, causing long coding sessions to re-process cached context and burn additional quota.

Uptime SLA

Service Level Agreement for availability. Anthropic's 98.95% equals ~92 hours downtime/year. Cloud-standard 99.99% equals ~53 minutes/year. The gap caused Retool's David Hsu to switch to OpenAI despite preferring Claude's model quality.

Run Rate

Annualized revenue projection from current performance. Anthropic's run rate grew from $9B (end 2025) to $30B (Q1 2026) — tripling in approximately four months, a pace exceeding all historical software growth benchmarks.

Thinking Effort

Model-side parameter controlling reasoning depth per request. Anthropic lowered Claude Code's defaults March 3 2026 to reduce compute cost — confirmed after an AMD senior director published a 6,852-session analysis on GitHub.

Compute Crunch

GPU supply-demand imbalance of 2025–2026. Blackwell rental prices +48% in two months. CoreWeave +20%+. BofA: demand exceeds supply through 2029. PJM: 15GW additional AI power needed by early 2027.

AI Shrinkflation

Quietly reducing compute, reasoning depth, or features of a fixed-price plan rather than raising list price. Session caps, cache TTL cuts, lower thinking-effort defaults — all reduce effective value per dollar without a nominal price increase.

What Actually Changed

Before: Flat-Fee Plan

After: Per-Token Billing

The Framing

The Retool Verdict

David Hsu's Preference

The Service Kept Dying

Switched to OpenAI

A Trust Problem, Not Performance

The Uptime Gap

API Uptime Comparison (90 days ending April 8, 2026)

The Revenue Explosion — and the Subsidy Problem

$9B → $30B Run Rate

Fastest Growth Ever Recorded

Agentic Workloads Broke the Math

Supply Can't Keep Up

The Open Bar Analogy

The Policy Changes: A Timeline

Industry Migration: Every Provider Is Moving

Claude / Claude Code / Cowork

Codex Token Metering

Copilot Limits Tightened

Credits → Daily Quotas

Enterprise Response: 7 Steps

Audit your actual AI token consumption

Model the flat-fee to per-token cost delta

Negotiate usage caps and price ceilings in new contracts

Add contractual uptime SLA requirements

Diversify your AI provider portfolio

Optimise agentic workflows for token efficiency

Build a quarterly AI pricing review cadence

Track the 7% figure

Watch cache TTL in your session design

Price OpenClaw separately in your budget

Uptime is now a procurement criterion

Document the growth story for internal stakeholders

Monitor the developer adoption pipeline

Frequently Asked Questions

Glossary

Per-Token Billing

Flat-Fee Subscription

Agentic Workload

OpenClaw

Prompt Cache TTL

Uptime SLA

Run Rate

Thinking Effort

Compute Crunch

AI Shrinkflation

Related Resources

Anthropic Enterprise Billing Restructure

Retool's Provider Switch

OpenClaw Metering

The Flat-Fee Era Ending

The Compute Crunch

Implicator.ai