Tool Calling Costs: Tokens Per Function Call
Tool Definition Token Overhead
Every tool definition included in an API request consumes input tokens. A typical tool definition with a name, description, and parameter schema occupies 100 to 300 tokens depending on its complexity. A system with 20 tools adds 2,000 to 6,000 tokens to every request, and these tokens are charged at input token rates on every API call whether or not the model invokes any tools.
This overhead is often underestimated because developers focus on the content tokens (user messages and model responses) and overlook the tool definition tokens that are silently added to every request. For a system that makes 10,000 API calls per day with 20 tools averaging 200 tokens each, the tool definition overhead alone is 40 million input tokens per day. At typical input token pricing, this can represent a significant portion of the total API cost.
The cost scales linearly with both the number of tools and the number of API calls. Adding a new tool increases the cost of every request, not just requests that use that tool. This economic pressure favors lean tool sets with well-curated, broadly useful tools rather than large collections of narrowly specialized tools. Every tool in the definition set should earn its token cost by being used frequently enough to justify its presence.
Conversation Turn Costs
Each tool calling round trip adds tokens to the conversation in three ways. The model generates tokens for the tool call itself (function name and arguments), which are charged at output token rates. The tool result is added to the conversation as input tokens in the next request. And the growing conversation history means that all previous tool calls and results are re-sent as input tokens in subsequent requests.
The conversation history growth is the most expensive component for multi-turn tool calling tasks. A task that involves 10 tool calls generates a conversation history that includes all 10 tool call messages and all 10 tool result messages. By the final request, the model is processing the full history of every previous interaction. This cumulative cost means that longer tool calling sessions are disproportionately more expensive than shorter ones.
Tool result size directly impacts conversation costs. A tool that returns 2,000 tokens of data adds those tokens to every subsequent request for the rest of the conversation. If the same tool is called five times, each returning 2,000 tokens, the conversation grows by 10,000 tokens of result data that is included in every subsequent request. Keeping tool results concise and relevant is one of the most effective cost optimization strategies.
External API and Service Costs
Many tools call paid external services, adding per-call charges on top of LLM token costs. Web search APIs charge per query. Data enrichment services charge per lookup. Cloud service APIs charge per operation. These costs are often small per call but accumulate quickly when agents make many tool calls per task.
External API costs are harder to predict than token costs because they depend on agent behavior rather than fixed configurations. An agent that makes 3 search queries per task costs three times as much in search API fees as an agent that makes 1 query. Agents that enter retry loops can make many more API calls than expected, multiplying external costs. Monitoring external API spending per agent, per task type, and per time period is essential for detecting cost anomalies early.
Cost Optimization Strategies
Dynamic tool selection is the highest-impact optimization for systems with large tool sets. Instead of including all tools in every request, the system analyzes the user message and includes only the tools likely to be relevant. A routing layer that reduces a 50-tool set to 8 relevant tools per request eliminates over 80% of the tool definition token overhead. The routing layer itself adds minimal cost compared to the savings it produces.
Prompt caching, available from Anthropic, OpenAI, and other providers, caches the tool definition tokens across requests within a session. When tool definitions are identical between requests (which they usually are within a conversation), cached definitions are charged at a fraction of the normal input token rate, typically 10% to 25% of the standard price. Prompt caching can reduce tool definition costs by 75% to 90% for multi-turn conversations.
Result summarization reduces the token cost of tool results in the conversation history. Instead of returning raw JSON with every field, tools can return summarized results that include only the information relevant to the current task. A database query that returns 20 fields per record can be summarized to the 4 or 5 fields the model actually needs, reducing result tokens by 75% or more.
Conversation history management controls the growth of conversation context over long tool calling sessions. Strategies include summarizing older tool results (replacing the full result with a brief summary), dropping results that are no longer relevant (tool results from early in the conversation that have been fully processed), and implementing sliding window approaches that keep only the most recent N messages.
Model selection optimization uses smaller, cheaper models for simple tool calling tasks and reserves larger, more expensive models for complex reasoning tasks. A simple data retrieval task that requires one or two tool calls can often be handled by a smaller model at a fraction of the cost. Complex multi-step reasoning tasks that require sophisticated tool coordination may need a larger model to achieve acceptable accuracy.
Tool calling costs come from three sources: definition tokens (charged on every request), conversation history growth (cumulative across turns), and external API charges (per tool execution). Dynamic tool selection, prompt caching, result summarization, and conversation management are the primary levers for controlling these costs in production systems.