The cost equation
Every MCP tool call has two cost dimensions: the input tokens the model sends to the server (the tool arguments + context) and the output tokens the server returns. Both are billed by the LLM provider. The formula:
cost_usd = (input_tokens / 1_000_000) * input_rate
+ (output_tokens / 1_000_000) * output_rateWith Claude Sonnet 4.6 — the most common model in agentic workflows — rates are $3 / M input and $15 / M output. A single Playwright browser_snapshot call that returns a 15K-token DOM payload therefore costs about 0.225 USD just for the output, and a couple more cents for the round-trip context.
Per-server averages (real telemetry)
These averages come from anonymized aggregated data across MCPSpend users. Heavy users see more, lighter users less — but the relative ranking is stable across every account we've looked at.
| Server | Avg input tokens / call | Avg output tokens / call | Typical % of monthly spend |
|---|---|---|---|
| playwright (browser tools) | 2,000 | 12,000 | 40-60% |
| fetch / read-url | 300 | 8,000 | 15-25% |
| filesystem (read_file) | 300 | 4,000 | 10-15% |
| github (diff/issues) | 500 | 6,000 | 5-10% |
| brave-search / web-search | 300 | 3,000 | 5-10% |
| postgres / database | 400 | 2,500 | 3-8% |
| slack / linear / notion | 500 | 1,500 | 2-5% |
Why browser tools dominate
Playwright and Windsurf's cascade-browser return full DOM snapshots of every page the agent visits. Modern web pages are dense — a single React-rendered SaaS dashboard can push 20-30K tokens into the context window per snapshot.
Multiply by the 5-10 page navigations a typical research task involves, and one task can run $0.50-$2.00 just on browser tools. Most users have no idea this is happening because provider invoices show only the aggregate.
What to optimize, in order
- Cap Playwright DOM payloads. Most Playwright MCP servers accept a
--max-bytesflag (or equivalent). 8KB is usually enough for navigation decisions; the agent rarely needs the whole DOM. - Disable browser MCPs for non-browsing tasks. If the agent isn't actively scraping or interacting with a webpage, the browser server shouldn't be loaded. Toggle it on per-task.
- Cache
fetchresponses. Reading the same URL twice in a conversation is wasteful. A simple in-memory LRU layer on top of fetch cuts repeat costs by 30-40%. - Switch to Haiku for routine work. Claude Haiku 4.5 is
$0.80 / $4 per M— that's ~4× cheaper than Sonnet. For file-reading, search, and routing tasks, Haiku is more than enough. - Set per-agent budgets. Background agents (CI jobs, scheduled scrapers) can run away — a budget alert at 80% saves you from a surprise bill.
How MCPSpend measures this
We're a transparent stdio proxy. Your MCP client (Claude Desktop, Cursor, Windsurf, VS Code, Claude Code) launches us as a subprocess; we launch your real MCP server as our subprocess, and pipe stdin/stdout between them unchanged. Every JSON-RPC frame is observed as it flows. Each tool-call request is timestamped; the matching response gives us the latency and the byte counts. Cost is computed using public provider rates.
We never read tool arguments or tool responses — only the envelope (server name, tool name, timing, byte counts). Source is MIT-licensed at github.com/andreisirbu91-lab/MCPSpend and you can audit the ingest payload yourself.
Estimate your bill in 30 seconds
The free calculator takes your active agent hours and the MCP servers you use, then projects monthly spend using these real averages. No signup, no email.
Or install the proxy for exact numbers from your actual traffic:
npx --yes @mcpspend/proxy@latest init --key mcps_live_xxx
25,000 tool calls per month on the free tier. No card. Generate your key →