Cost Per Task: What Individual Agent Actions Cost
Simple Tasks: Under $0.01
Simple tasks involve a single model call with a short prompt and a brief response. These include text classification, sentiment analysis, entity extraction, intent detection, and simple question answering. The token footprint is small, typically 200 to 500 input tokens and 50 to 200 output tokens.
Text classification, where the agent categorizes a message into one of several predefined categories, costs approximately $0.0001 to $0.001 per classification depending on the model. On Gemini Flash-Lite at $0.10 per million input tokens, classifying a 300-token message costs $0.00003. On Claude Haiku at $1 per million input tokens, the same classification costs $0.0003. Even on Claude Sonnet at $3 per million input tokens, it costs under $0.001.
Sentiment analysis follows similar economics. Analyzing the emotional tone of a customer message, social media post, or product review requires minimal tokens and a short model response. At scale, processing 100,000 customer reviews through a Haiku-based sentiment analysis agent costs approximately $30 to $50 in total API fees.
Entity extraction, pulling structured data like names, dates, amounts, and locations from unstructured text, costs $0.0003 to $0.002 per extraction depending on the complexity of the extraction schema and the model used. Agents that process invoices, receipts, or forms typically perform multiple extractions per document, bringing the per-document cost to $0.001 to $0.01.
Intent routing, where an agent determines what the user wants and directs them to the appropriate handler, costs $0.0002 to $0.001 per routing decision. This is one of the highest-value applications of budget models because the routing decision itself requires minimal reasoning, but it determines which downstream model handles the actual task.
Moderate Tasks: $0.01 to $0.50
Moderate tasks involve either a single model call with substantial context or two to three sequential calls that build on each other. These include drafting emails, answering multi-part questions, summarizing documents, generating short content pieces, and performing basic analysis.
Email drafting with context awareness, where the agent reads a thread of 5 to 10 previous messages and generates a contextually appropriate response, typically costs $0.01 to $0.05 per email. The input includes the conversation thread (1,000 to 3,000 tokens), system instructions (500 to 1,000 tokens), and tone guidelines (200 to 500 tokens). The output averages 200 to 500 tokens. On Claude Sonnet, this works out to approximately $0.02 per email.
Document summarization costs scale with the length of the source document. Summarizing a 5-page document (approximately 3,000 tokens) costs $0.01 to $0.03 on mid-tier models. Summarizing a 50-page report (approximately 30,000 tokens) costs $0.10 to $0.30. Summarizing an entire book (300,000 tokens or more) can cost $1 to $5 in a single API call, though models with large context windows like Gemini 2.5 Pro handle this in one pass while smaller context models require chunked summarization with multiple calls.
Customer support ticket resolution, where an agent reads a customer inquiry, searches a knowledge base, and generates a personalized response, typically involves two to three model calls and costs $0.02 to $0.10 per resolved ticket. The first call classifies the issue and retrieves relevant knowledge, the second generates the response, and an optional third call reviews the response for accuracy before sending.
Code review for a single function or file, where the agent analyzes code for bugs, style issues, and optimization opportunities, costs $0.05 to $0.20 depending on the code length and the depth of analysis. The input includes the code (500 to 5,000 tokens), review criteria (500 tokens), and context about the codebase (500 to 2,000 tokens). The output typically runs 300 to 1,000 tokens of detailed feedback.
Complex Tasks: $0.50 to $5.00
Complex tasks require multiple sequential model calls, large context windows, and often involve frontier models for the reasoning-intensive steps. These include multi-file code generation, comprehensive research reports, complex data analysis, and multi-step workflow automation.
Multi-file code generation, where an agent creates a complete feature across multiple files with tests, documentation, and integration code, typically costs $1 to $5 per feature. This involves 5 to 15 sequential model calls: understanding the requirements, designing the architecture, generating each file, writing tests, and reviewing the output for consistency. Each call consumes 2,000 to 10,000 input tokens (including growing context from previous steps) and generates 500 to 2,000 output tokens.
Comprehensive research reports, where an agent searches multiple sources, synthesizes findings, and produces a structured document with citations, cost $2 to $10 per report. The expense comes from multiple web search and content retrieval calls, each feeding into subsequent analysis and synthesis steps. A typical research workflow involves 10 to 20 model calls, with later calls consuming larger contexts as accumulated findings are included.
Complex data analysis, where an agent processes a dataset, identifies patterns, generates visualizations, and produces interpretive commentary, costs $0.50 to $3.00 per analysis depending on the dataset size and analytical depth. The token-intensive parts are the data ingestion (which can consume 50,000 to 100,000 tokens for large datasets) and the iterative analysis where the agent refines its approach based on initial findings.
Multi-step workflow automation, where an agent orchestrates a sequence of actions across multiple systems, costs $0.10 to $2.00 per workflow execution depending on the number of steps and the complexity of decisions at each step. A five-step workflow with simple decisions at each stage costs $0.10 to $0.30. A fifteen-step workflow with complex reasoning, error recovery, and conditional branching costs $1.00 to $2.00.
Factors That Multiply Task Costs
Several factors can dramatically increase the cost of any individual task beyond the baseline estimates above. Understanding these multipliers helps teams identify optimization opportunities and set appropriate budget expectations.
Context window size is the most significant cost multiplier. An agent that includes 50,000 tokens of context with each call pays ten times more in input token fees than one that includes 5,000 tokens for the same task. Techniques like RAG (retrieval-augmented generation), where only the most relevant context is retrieved and included, reduce context-driven costs by 70 to 90 percent compared to including everything.
Model tier selection creates a 50x cost difference for the same task. Classifying a message on Gemini Flash-Lite costs $0.00003, while the same classification on Claude Opus costs $0.0015. For tasks where cheaper models perform adequately, using a frontier model represents pure waste. The key is measuring quality at each tier and selecting the cheapest model that meets your quality threshold.
Multi-agent architectures where multiple agents collaborate on a single task multiply costs linearly with the number of participating agents. A three-agent pipeline where a planner, executor, and reviewer each make separate model calls costs three times more than a single-agent approach for the same task. The quality improvement from multi-agent collaboration must justify the cost multiplier.
Retry rates amplify costs unpredictably. An agent with a 10 percent retry rate effectively costs 10 percent more per task on average, but individual tasks that trigger multiple retries can cost three to five times the baseline. Structured output modes, better prompt engineering, and graceful degradation strategies reduce retry rates and their associated costs.
The cost per task varies by a factor of 50,000x, from $0.0001 for simple classification to $5 or more for complex research. The biggest cost levers are model selection and context size, not task complexity alone. Match the model to the task, minimize context, and your per-task costs will stay predictable.