Function Calling: The Foundation of Tool Use

Updated May 2026
Function calling is the API-level mechanism that allows language models to generate structured invocations of predefined functions rather than plain text. It is the technical foundation on which all tool use, agent behavior, and external system integration is built. Every major model provider, including OpenAI, Anthropic, and Google, implements function calling as a core API feature, each with slightly different syntax but fundamentally identical mechanics.

The Anatomy of a Function Call

A function call in the context of AI tool use has three components: the function definition provided by the developer, the function invocation generated by the model, and the function result returned by the application. These three components form a contract that governs how the model, the application, and the external system interact.

The function definition is a JSON object that describes a function the model can call. It includes the function name (a string identifier), a description (natural language explanation of what the function does and when to use it), and a parameters schema (a JSON Schema object defining the expected input). The definition is provided by the developer as part of the API request, alongside the conversation messages.

The function invocation is generated by the model when it determines that calling the function would help answer the user request. The invocation includes the function name and an arguments object containing the parameter values the model has chosen. These arguments must conform to the JSON Schema defined in the function definition. The model does not execute the function, it generates a structured request that represents what it wants to execute.

The function result is the output of the actual function execution, formatted as a message and returned to the model. The application executes the function with the model-provided arguments, captures the result, and sends it back in the conversation. The model then uses this real-world data to formulate its response to the user.

Function Definitions Across Providers

Each provider uses a slightly different format for function definitions, though the core elements are the same. OpenAI uses a "tools" array where each tool has a "type" field set to "function" and a "function" object containing the name, description, and parameters. Anthropic uses a similar "tools" array with "name", "description", and "input_schema" fields. Google Gemini uses "function_declarations" within a "tools" array.

The parameter schema format is consistent across providers because they all use JSON Schema. This means a function that accepts a string "query" parameter and an optional integer "limit" parameter is described the same way regardless of provider: using "type", "properties", "required", and "description" fields within the schema object. This consistency makes it relatively straightforward to write function definitions that work across multiple providers with minimal adaptation.

Enum types are particularly useful in function definitions because they constrain the model to a fixed set of valid values. A "sort_order" parameter with enum values ["ascending", "descending"] prevents the model from generating invalid values like "asc" or "up". Enums reduce error rates and eliminate the need for fuzzy matching on the application side. Any parameter with a fixed set of valid values should use an enum.

The Role of Descriptions

Function and parameter descriptions are not just documentation for human developers. They are instructions that directly influence model behavior. The model reads these descriptions to decide when to call a function, which function to call when multiple options are available, and what values to use for each parameter. A poorly described function leads to poor calling decisions. A well-described function leads to accurate, appropriate invocations.

Effective function descriptions answer three questions: what does this function do, when should the model use it, and what does it return. A description like "search" is almost useless. A description like "searches the product catalog for items matching the query string, supports filtering by category and price range, returns up to 20 matching products with name, price, and availability status" gives the model precise guidance about what the function does, what inputs it expects, and what output it produces.

Parameter descriptions should include format requirements, valid ranges, and the effect of different values. A "date" parameter described as "string" gives the model no format guidance. The same parameter described as "date in ISO 8601 format (YYYY-MM-DD), must be within the last 365 days" tells the model exactly what format to use and what range is valid. This specificity dramatically reduces argument errors.

Function Calling Modes

Providers offer controls that influence how the model uses available functions. These controls range from "auto" (the model decides whether to call functions) to "required" (the model must call at least one function) to "none" (function calling is disabled even though definitions are present). Some providers also support forcing a specific function, where the model must call a particular named function.

Auto mode is appropriate for most conversational use cases where the model should use its judgment about when tools are helpful. Required mode is useful for structured extraction tasks where you always want the model to produce function call output rather than text. None mode lets you temporarily disable function calling without removing the definitions from the request, which is useful for follow-up turns where you want the model to summarize results rather than making additional calls.

Forcing a specific function is useful for guided workflows where the current step requires a particular action. In a multi-step form filling process, you might force the model to call "get_next_question" at each step to ensure the workflow progresses predictably. This removes the model autonomy about which function to call while still leveraging its ability to generate appropriate arguments.

Handling Multiple Function Calls

Modern models can generate multiple function calls in a single response turn. When the model identifies that multiple independent pieces of information are needed, it can emit all the necessary function calls at once rather than making them sequentially. The calling application executes all calls concurrently and returns all results in a single follow-up message.

Parallel function calling significantly reduces latency for tasks that require data from multiple sources. A travel planning request might trigger simultaneous calls to flight search, hotel search, and weather forecast APIs. Sequential execution would require six API round trips (three tool calls plus three result processing turns). Parallel execution requires only two round trips (one turn for all three calls, one turn for final response generation).

The application must handle the logistics of parallel execution: tracking which result belongs to which call (using call IDs), managing concurrent API requests, handling partial failures (when some calls succeed and others fail), and formatting the results in the correct order for the model. Most provider SDKs include utilities for managing parallel tool call execution.

Error Handling in Function Calls

When a function call fails, whether due to invalid arguments, execution errors, or external service failures, the application should return a clear error message as the tool result rather than throwing an exception or silently failing. The model can process error messages and adjust its approach: correcting invalid arguments, trying alternative functions, or informing the user about the limitation.

Common error patterns include the model generating arguments that are syntactically valid JSON but semantically incorrect (like a date in the future when only past dates are valid), calling functions with missing required parameters, or providing values outside the expected range. Schema validation catches most structural errors, but semantic validation requires application-level checks.

Applications should implement retry logic for transient failures (network timeouts, rate limits) and clear error reporting for permanent failures (invalid credentials, resource not found). The error message format should be consistent and informative, giving the model enough context to understand what went wrong and what corrective action, if any, is possible.

Key Takeaway

Function calling is the structured API mechanism that makes tool use possible in AI systems. Well-crafted function definitions with precise descriptions and clear parameter schemas are the single most important factor in achieving reliable, accurate tool invocations across any model provider.