MCP cost: what a tool call actually costs

A single user prompt to an agentic IDE like Cursor or Windsurf can trigger 30-80 MCP tool calls in the background. Each one pulls tokens — and tokens are billed. This guide breaks down where the money goes, based on telemetry from real MCPSpend users.

The cost equation

Every MCP tool call has two cost dimensions: the input tokens the model sends to the server (the tool arguments + context) and the output tokens the server returns. Both are billed by the LLM provider. The formula:

cost_usd = (input_tokens / 1_000_000) * input_rate
         + (output_tokens / 1_000_000) * output_rate

With Claude Sonnet 4.6 — the most common model in agentic workflows — rates are $3 / M input and $15 / M output. A single Playwright browser_snapshot call that returns a 15K-token DOM payload therefore costs about 0.225 USD just for the output, and a couple more cents for the round-trip context.

Per-server averages (real telemetry)

These averages come from anonymized aggregated data across MCPSpend users. Heavy users see more, lighter users less — but the relative ranking is stable across every account we've looked at.

ServerAvg input tokens / callAvg output tokens / callTypical % of monthly spend
playwright (browser tools)2,00012,00040-60%
fetch / read-url3008,00015-25%
filesystem (read_file)3004,00010-15%
github (diff/issues)5006,0005-10%
brave-search / web-search3003,0005-10%
postgres / database4002,5003-8%
slack / linear / notion5001,5002-5%

Why browser tools dominate

Playwright and Windsurf's cascade-browser return full DOM snapshots of every page the agent visits. Modern web pages are dense — a single React-rendered SaaS dashboard can push 20-30K tokens into the context window per snapshot.

Multiply by the 5-10 page navigations a typical research task involves, and one task can run $0.50-$2.00 just on browser tools. Most users have no idea this is happening because provider invoices show only the aggregate.

What to optimize, in order

  1. Cap Playwright DOM payloads. Most Playwright MCP servers accept a --max-bytes flag (or equivalent). 8KB is usually enough for navigation decisions; the agent rarely needs the whole DOM.
  2. Disable browser MCPs for non-browsing tasks. If the agent isn't actively scraping or interacting with a webpage, the browser server shouldn't be loaded. Toggle it on per-task.
  3. Cache fetch responses. Reading the same URL twice in a conversation is wasteful. A simple in-memory LRU layer on top of fetch cuts repeat costs by 30-40%.
  4. Switch to Haiku for routine work. Claude Haiku 4.5 is $0.80 / $4 per M — that's ~4× cheaper than Sonnet. For file-reading, search, and routing tasks, Haiku is more than enough.
  5. Set per-agent budgets. Background agents (CI jobs, scheduled scrapers) can run away — a budget alert at 80% saves you from a surprise bill.

How MCPSpend measures this

We're a transparent stdio proxy. Your MCP client (Claude Desktop, Cursor, Windsurf, VS Code, Claude Code) launches us as a subprocess; we launch your real MCP server as our subprocess, and pipe stdin/stdout between them unchanged. Every JSON-RPC frame is observed as it flows. Each tool-call request is timestamped; the matching response gives us the latency and the byte counts. Cost is computed using public provider rates.

We never read tool arguments or tool responses — only the envelope (server name, tool name, timing, byte counts). Source is MIT-licensed at github.com/andreisirbu91-lab/MCPSpend and you can audit the ingest payload yourself.

Estimate your bill in 30 seconds

The free calculator takes your active agent hours and the MCP servers you use, then projects monthly spend using these real averages. No signup, no email.

Or install the proxy for exact numbers from your actual traffic:

npx --yes @mcpspend/proxy@latest init --key mcps_live_xxx

25,000 tool calls per month on the free tier. No card. Generate your key →

Track your own MCP spend — free

One command wraps every MCP client on your machine. 25,000 tool calls/month on the free tier. No card.

Related guides