Can I use function calling to get reliable JSON from an LLM?

Yes, but it is the wrong tool for extraction tasks. Function calling is a trained behaviour: the model is more likely to produce valid JSON, but there is no schema enforcement at the token level. Structured outputs use constrained decoding, which provides a guarantee rather than a probability. For extraction pipelines where you know the output shape in advance, structured outputs are the correct choice.

Is 'tool use' the same thing as 'function calling'?

Semantically yes. Both describe a model signalling which external function to call and with what arguments. OpenAI originally called it function calling; Anthropic calls it tool use. OpenAI has moved largely to the 'tools' terminology in recent API versions. The underlying mechanism and use case are identical: the model decides to invoke external code.

Does switching from function calling to structured outputs require a large refactor?

Usually not. The main change is moving the schema from the tools array to the response_format field (OpenAI) or using the structured output parameter (Anthropic). The schema itself can usually be reused directly. If you have a single extraction tool with no real external side effects, the migration is typically a few lines of code.

Can I use both mechanisms in the same application?

Yes, and often you should. An agentic workflow might use tool use at the outer level (the model decides which tools to invoke) and structured outputs at the inner level for extraction subtasks. The two mechanisms compose naturally; you do not have to choose one for the whole system.

AI & LLMsMay 7, 20266 min readReviewed May 7, 2026

Function calling, tool use, and structured outputs are not the same thing

Three LLM mechanisms that look alike but work differently, and why choosing wrong costs you at production scale

By FlowVerify Editorial Team

Three different names, two major providers, one persistent confusion. Most teams building on LLM APIs in 2026 reach for function calling when they need reliable JSON back from a model. It works well enough in testing. At production volume, it becomes the source of intermittent validation failures that are hard to reproduce and expensive to debug.

The terminology does not help. OpenAI shipped 'function calling,' then renamed it to 'tools.' Anthropic calls the same concept 'tool use.' Both providers now support 'structured outputs' as a separate mechanism. Gemini added its own variant. A developer reading the docs could reasonably conclude these are all different names for the same thing.

They are not. The three mechanisms have different internal implementations, different reliability profiles, different latency costs, and different semantic meanings. Picking the wrong one at architecture time creates compounding problems at scale.

Why the confusion exists

Function calling and structured outputs both produce JSON. Both accept a JSON schema to describe the output shape. The API surface looks similar: you pass a schema, you get back structured data. The difference is what happens inside the model.

Structured outputs work through constrained decoding. The model's sampling process is constrained at the token level so it can only produce tokens that are valid continuations of the target schema. Invalid tokens are masked out before sampling occurs. This is enforced at inference time, not learned through training.

Function calling works differently. It is a trained behaviour: the model learned during fine-tuning to produce structured output when given a function schema. There is no token-level enforcement. The model is more likely to produce valid JSON, but it can still hallucinate field names, produce values outside a specified enum, or omit required fields. The schema is a strong prior, not a guarantee.

What function calling actually does

Function calling has one semantic purpose: it lets the model signal that it wants to hand control to external code. The model decides whether to call a function, which function to call, and what arguments to pass. Your application runs the function and optionally returns the result. The model reads that result and decides what to do next.

This is agency. The model is deciding what to do, not just formatting an answer you already know the shape of. That is appropriate when you need a model to choose between multiple tools, decide whether to ask a clarifying question, or sequence a multi-step operation where each step depends on the result of the last.

It is not appropriate when you already know exactly what you want the model to produce and are using function calling purely to get reliable JSON. In that case, you are paying for the agency semantics without getting anything back for the overhead.

Structured outputs: constrained decoding, not a training artefact

OpenAI shipped structured outputs as a distinct mechanism from function calling in mid-2024. Anthropic added native support in early 2026. The mechanism: you pass a JSON schema via a response_format parameter, and the API guarantees the response matches that schema. Not 'usually matches.' Guarantees.

The guarantee holds because constrained decoding is enforced mechanically. At each token, the sampling process considers only tokens that keep the response parseable against the schema. If a field is an enum of ["pending", "approved", "rejected"], the model cannot produce "Approved" or "approved by finance" — those token sequences are masked before sampling occurs. The model has no path to a schema-invalid output.

For extraction tasks (pulling structured information from a document, classifying input into a fixed category set, converting unstructured notes into a typed record), structured outputs are the correct choice. The model reads input and produces output in a format you specified. There is no external function call, no tool invocation, no agency decision.

The reliability improvement over function calling for pure extraction is measurable. Teams running high-volume extraction pipelines consistently report a 3 to 7 per cent reduction in output validation failures after switching from function calling to structured outputs. At any meaningful scale, that is thousands fewer error-handling cycles per week.

Tool use: the agentic primitive

'Tool use' in Anthropic's API and 'function calling' in OpenAI's are functionally the same concept. The model signals which tool to call and with what arguments, your code executes the tool, and you optionally return the result. The name difference is historical: 'function calling' was OpenAI's original term and leaked into the API surface. OpenAI has moved largely to the 'tools' terminology in recent versions.

Tool use is appropriate when the model needs to make decisions, not just format something it already knows. Specifically, when:

The model must choose between multiple possible actions based on context
The action involves an external system: a database read, an API call, a file operation
The result of the action changes what the model does next
The correct response might be to ask a clarifying question rather than take immediate action

An agent that looks up a customer record, checks their account tier, and decides whether to escalate a support ticket is using tool use. The model is deciding what to do and in what sequence.

A pipeline that reads a contract and extracts the effective date, parties, and governing law into a typed record is using structured outputs. The model is not deciding what to do; it is doing what you told it to, in a format you specified.

Mechanism	How it works	Use when	Avoid when
Structured outputs	Constrained decoding; schema enforced at inference time, not by training	Extraction, classification, formatting with a known output shape	The model needs to decide what to do next
Tool use / Function calling	Trained behaviour; model signals which external function to invoke	Agentic flows, multi-step decisions, external system interaction	You just need valid JSON and there is no real function to call
JSON mode (legacy)	Instruction-based; model told to output JSON, no schema enforcement	Not recommended in production	Any extraction or agentic use case at scale

When to use each mechanism

The production cost of choosing wrong

The most common mistake over the past two years was using function calling for extraction at scale. Teams that built this way typically did not notice in development, because development datasets are clean and representative. The failure mode is distribution shift: documents that look different from training data, unusual formatting, non-English input, tables instead of prose.

Function calling's reliability is a function of what the model saw during fine-tuning. When your input drifts from that distribution, output quality degrades. Structured outputs do not have this property. The constraint is mechanical and applies equally to unusual input.

The second common mistake runs the other direction: using structured outputs for a task where the model genuinely needs agency. The symptom is a model that produces correctly shaped JSON filled with hallucinated values, because it has no way to express uncertainty or request more information. The schema forces a complete, valid response even when the right answer is 'I do not have enough information.'

If your extraction results are valid JSON but the values are wrong, the model is guessing at fields it cannot determine from the input. You likely need tool use with an explicit 'I cannot determine this' path in the schema, or a redesigned flow that allows a clarifying question before committing to a structured response.

What the 2026 provider landscape looks like

Provider	Structured outputs	Tool use	Notes
OpenAI	response_format: json_schema; mature, stable	tools array; generally available, function_call param deprecated	Most ecosystem tooling (LangChain, LlamaIndex) built against OpenAI's API shape
Anthropic	Native structured output added early 2026 (via betas parameter)	tool_use content blocks; full multi-turn support	Pre-2026 extraction pipelines often used a single extraction tool as a workaround; migration path now exists
Google Gemini	response_schema field; generally available	functionDeclarations; generally available	Ecosystem tooling less tested against Gemini than OpenAI; API surface evolving faster

Structured outputs and tool use across major providers (May 2026)

If you built extraction pipelines against Anthropic's API before early 2026 using a single extraction tool as a structured-output workaround, the migration to native structured outputs is worth benchmarking on your specific data. Constrained decoding changes error characteristics, particularly for fields with strict enum constraints, in ways that matter for downstream validation.

One question to answer before picking a mechanism

Is the model deciding what to do, or are you deciding and the model is executing?

If the model is deciding what to call, what action to take, or whether to ask a follow-up question, use tool use. The model needs agency: the ability to invoke external code, receive results, and decide what happens next.

If you are deciding what the output should look like (extract these fields, classify this input, convert this text to a typed record), use structured outputs. You know the output shape; the model fills it reliably.

The two compose naturally. An agentic workflow can use tool use at the outer level (the model decides to call an extraction subtask) and structured outputs at the inner level (the extraction itself produces guaranteed-valid JSON). You do not have to choose one mechanism for the whole system.

Function calling was the first mechanism that let models interact with external systems in a structured way, and it got used for extraction by teams who needed reliable JSON and had no better option. Structured outputs closed that gap in 2024. In 2026, there is no good reason to use function calling when you just need valid JSON from a model that is not making any decisions. The mechanism that gives you a schema guarantee is available on all major providers. Use it where it fits.

Frequently asked questions

The AI wrapper debate, three years in: what the survivors built

May 13, 2026Read full article →

AI & LLMsMay 7, 20266 min readReviewed May 7, 2026

Function calling, tool use, and structured outputs are not the same thing

Three LLM mechanisms that look alike but work differently, and why choosing wrong costs you at production scale

By FlowVerify Editorial Team

Why the confusion exists

What function calling actually does

Structured outputs: constrained decoding, not a training artefact

Tool use: the agentic primitive

Tool use is appropriate when the model needs to make decisions, not just format something it already knows. Specifically, when:

The model must choose between multiple possible actions based on context
The action involves an external system: a database read, an API call, a file operation
The result of the action changes what the model does next
The correct response might be to ask a clarifying question rather than take immediate action

An agent that looks up a customer record, checks their account tier, and decides whether to escalate a support ticket is using tool use. The model is deciding what to do and in what sequence.

Mechanism	How it works	Use when	Avoid when
Structured outputs	Constrained decoding; schema enforced at inference time, not by training	Extraction, classification, formatting with a known output shape	The model needs to decide what to do next
Tool use / Function calling	Trained behaviour; model signals which external function to invoke	Agentic flows, multi-step decisions, external system interaction	You just need valid JSON and there is no real function to call
JSON mode (legacy)	Instruction-based; model told to output JSON, no schema enforcement	Not recommended in production	Any extraction or agentic use case at scale

When to use each mechanism

The production cost of choosing wrong

What the 2026 provider landscape looks like

Provider	Structured outputs	Tool use	Notes
OpenAI	response_format: json_schema; mature, stable	tools array; generally available, function_call param deprecated	Most ecosystem tooling (LangChain, LlamaIndex) built against OpenAI's API shape
Anthropic	Native structured output added early 2026 (via betas parameter)	tool_use content blocks; full multi-turn support	Pre-2026 extraction pipelines often used a single extraction tool as a workaround; migration path now exists
Google Gemini	response_schema field; generally available	functionDeclarations; generally available	Ecosystem tooling less tested against Gemini than OpenAI; API surface evolving faster

Structured outputs and tool use across major providers (May 2026)

One question to answer before picking a mechanism

Is the model deciding what to do, or are you deciding and the model is executing?

Function calling, tool use, and structured outputs are not the same thing

Why the confusion exists

What function calling actually does

Structured outputs: constrained decoding, not a training artefact

Tool use: the agentic primitive

The production cost of choosing wrong

What the 2026 provider landscape looks like

One question to answer before picking a mechanism

Frequently asked questions

Related reading

The AI wrapper debate, three years in: what the survivors built

LLM database access: the RBAC gap most teams don't see

The AI coding productivity data keeps contradicting itself. Here's why.

Stay ahead on eSignatures, compliance, and document workflows

The AI wrapper debate, three years in: what the survivors built

Function calling, tool use, and structured outputs are not the same thing

Why the confusion exists

What function calling actually does

Structured outputs: constrained decoding, not a training artefact

Tool use: the agentic primitive

The production cost of choosing wrong

What the 2026 provider landscape looks like

One question to answer before picking a mechanism

Frequently asked questions

Related reading

The AI wrapper debate, three years in: what the survivors built

LLM database access: the RBAC gap most teams don't see

The AI coding productivity data keeps contradicting itself. Here's why.

Stay ahead on eSignatures, compliance, and document workflows

The AI wrapper debate, three years in: what the survivors built