Function Calling: Giving LLMs Hands
"Language models are confined to their training data cutoff. Function Calling breaks them out of this sandbox, allowing them to fetch live data, send emails, or query databases by interacting with external APIs."
The Problem with Raw Text
Historically, if you asked an LLM to "turn on the lights," you would have to write a complex regex parser to extract intent from its conversational response ("Sure! I will turn on the lights now..."). This was brittle and error-prone.
The Solution: Structured Tool Use
Function Calling allows you to provide an array of tools along with the user prompt. Each tool contains a JSON Schema describing a function. When the model determines a tool is needed, it halts generation and outputs a strictly formatted JSON object containing the arguments to call that tool.
Diagram illustrating User -> LLM -> API -> LLM loop
Deterministic Outputs
By enforcing strict: true (in supported models like OpenAI's `gpt-4o`), the model is guaranteed to adhere exactly to the schema you provide. It will not hallucinate parameter types or omit required fields, ensuring your backend application won't crash when attempting to parse the payload.
🤖 Gen AI Developer FAQ
Does the LLM actually run my code?
No. The LLM has no execution environment for your local APIs. It merely "plans" the execution by returning a JSON object containing the function name and arguments. Your application code must catch this JSON, execute the HTTP request or local function, and then hand the result back to the LLM.
What is JSON Schema and why is it required?
JSON Schema is a vocabulary that allows you to annotate and validate JSON documents. In Function Calling, it tells the model exactly what parameters your function accepts (e.g., `location` is a `string`, `celsius` is a `boolean`). Without it, the model wouldn't know how to format its output.
Can an LLM call multiple functions at once?
Yes. Modern models support Parallel Function Calling. If a user asks "What's the weather in Tokyo and Paris?", the model can return an array of multiple `tool_calls` in a single response, which you can execute concurrently in your backend.