AnthropicinferenceneocloudsKAIROSinfrastructure

The Anthropic Leak and the Daemon Tax: What KAIROS Means for Inference Economics

On March 31, Anthropic accidentally shipped 512,000 lines of Claude Code's unobfuscated source to npm. Inside: a daemon called KAIROS that converts every seat into persistent inference. The cost model that underpins the GPU infrastructure buildout just changed.

KAIROS daemon process: wake / evaluate / sleep cycle with 15-second tick intervals. 512K lines leaked, 1.3-20x cost range, 90.6% CPU bottleneck, $19B ARR.

TLDR

KAIROS is a daemon that wakes every 15 seconds and consolidates memory overnight. Every Claude Code seat becomes a persistent compute process, not a session. The infrastructure stack was priced for bursty inference. Daemons are neither.
Cost range per seat: 1.3x at the floor (daemon mostly idle), 20x at the ceiling (high-agency mode). Agentic workloads are CPU-bottlenecked, not GPU-bottlenecked. Neoclouds built GPU farms. The agent runtime layer doesn't exist yet. The request-response inference model is ending.

AI Briefly Pro | Deep Dive | ~5,000 words | April 2, 2026

  1. What Actually Leaked
  2. KAIROS
  3. Anthropic's Chip Strategy: Co-Design Everything, Own Nothing
  4. How Much Does the Daemon Actually Cost
  5. Where the Tokens Go (CPU Bottleneck)
  6. The Subsidy Problem
  7. How Do You Price an Always-On Agent
  8. Everyone Wants Always-On AI. Nobody Has One.
  9. The Neocloud Problem
  10. What Breaks This
  11. How to Position
Disclosure: I previously worked at CoreWeave and Google. AI Briefly contributors may hold positions in companies mentioned. Not investment advice.

The Short Version

  • KAIROS is a daemon that wakes every 15 seconds and consolidates memory overnight. Every Claude Code seat becomes a persistent compute process, not a session. Rollout is gated for May 2026, employees first.
  • The infrastructure stack was priced for bursty, statistically multiplexable inference. Daemons are neither. Even a mostly-sleeping daemon forces capacity planning from overcommit to reserved guarantee - different business, different margins.
  • Cost range per seat: 1.3x at the floor (daemon mostly idle), 20x at the ceiling (high-agency mode running hot during work hours). The base case scenarios between those bounds are what nobody has modeled yet.
  • Agentic workloads are CPU-bottlenecked. Georgia Tech and Intel measured it - 90.6% of agent latency sits on CPU-side tool processing (arXiv:2511.00739). Nvidia shipped a dedicated agentic CPU at GTC. Neoclouds built GPU farms.
  • Anthropic isn't building a chip. It's co-designing Trainium with Annapurna, bought 400K TPUv7s outright from Broadcom, and keeps Nvidia as the third leg. Better play than a custom ASIC at $19B ARR.
  • CoreWeave has $14B in debt tied to the wrong hardware mix.
  • The agent runtime layer - the operating system for persistent AI - doesn't exist yet. Hyperscalers will get there. The window for someone else to own it is 12-24 months.
  • The request-response inference model that the entire infrastructure stack was sized and priced against is ending.

How to Position

Long: Anthropic (pre-IPO, $19B ARR, daemon-first architecture). Nvidia Vera CPU and AMD EPYC (CPU bottleneck reprices hardware mix). Nebius (closest to owning the agent runtime layer).
Short / caution: GPU-only rental neoclouds without software layers. CoreWeave at current multiples if BMaaS margins stay at 14-16%. Any company whose unit economics are built on current subsidized API rates.

Related Deep Dives