- Authors
- Name
- What is Function Calling?
- OpenAI Function Calling
- Anthropic Tool Use
- Error Handling Patterns
- Structured Output and Function Calling
- Function Calling with Open-Source Models
- Production Design Patterns
- Benchmark: Function Calling Performance Comparison
What is Function Calling?
Function Calling (Tool Use) is a mechanism that enables LLMs to invoke external functions and APIs. The LLM does not execute code itself — instead, it decides which function to call with which arguments, and the application handles the actual execution.
Execution Flow
User: "Tell me the weather in Seoul"
|
LLM: tool_call(get_weather, city="Seoul")
|
App: execute get_weather("Seoul") -> {"temp": 5, "condition": "clear"}
|
LLM: "The current weather in Seoul is 5 degrees C and clear."
OpenAI Function Calling
Basic Usage
from openai import OpenAI
import json
client = OpenAI()
# Tool definitions
tools = [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Retrieves current weather information for a specific city",
"parameters": {
"type": "object",
"properties": {
"city": {
"type": "string",
"description": "City name (e.g., Seoul, Tokyo, New York)"
},
"unit": {
"type": "string",
"enum": ["celsius", "fahrenheit"],
"description": "Temperature unit"
}
},
"required": ["city"]
}
}
},
{
"type": "function",
"function": {
"name": "search_products",
"description": "Searches for products",
"parameters": {
"type": "object",
"properties": {
"query": {"type": "string", "description": "Search query"},
"category": {"type": "string", "enum": ["electronics", "clothing", "food"]},
"max_price": {"type": "number", "description": "Maximum price"},
"sort_by": {"type": "string", "enum": ["price", "rating", "newest"]}
},
"required": ["query"]
}
}
}
]
# First call
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Tell me Seoul's weather and search for umbrellas"}],
tools=tools,
tool_choice="auto"
)
message = response.choices[0].message
print(f"Tool calls: {len(message.tool_calls)}")
# Execute tools and pass results
messages = [
{"role": "user", "content": "Tell me Seoul's weather and search for umbrellas"},
message # assistant's tool_call message
]
# Add results for each tool_call
for tool_call in message.tool_calls:
func_name = tool_call.function.name
args = json.loads(tool_call.function.arguments)
if func_name == "get_weather":
result = {"temp": 5, "condition": "cloudy", "humidity": 65}
elif func_name == "search_products":
result = [
{"name": "Folding Umbrella", "price": 15000, "rating": 4.5},
{"name": "Automatic Umbrella", "price": 25000, "rating": 4.8}
]
else:
result = {"error": "Unknown function"}
messages.append({
"role": "tool",
"tool_call_id": tool_call.id,
"content": json.dumps(result, ensure_ascii=False)
})
# Generate final response
final_response = client.chat.completions.create(
model="gpt-4o",
messages=messages,
tools=tools
)
print(final_response.choices[0].message.content)
Parallel Tool Calls
GPT-4o can call multiple functions simultaneously:
# "Tell me Seoul's weather and search for umbrellas" -> 2 tool_calls returned at once
# tool_calls = [
# {"function": {"name": "get_weather", "arguments": '{"city": "Seoul"}'}},
# {"function": {"name": "search_products", "arguments": '{"query": "umbrella"}'}}
# ]
import asyncio
async def execute_tool_calls(tool_calls: list) -> list:
"""Execute tool calls in parallel."""
async def execute_one(tc):
func_name = tc.function.name
args = json.loads(tc.function.arguments)
# In practice, these would be async API calls
if func_name == "get_weather":
return await get_weather_async(**args)
elif func_name == "search_products":
return await search_products_async(**args)
results = await asyncio.gather(*[execute_one(tc) for tc in tool_calls])
return results
Anthropic Tool Use
Anthropic (Claude) Tool Use uses a slightly different API format:
import anthropic
client = anthropic.Anthropic()
tools = [
{
"name": "get_weather",
"description": "Retrieves current weather information for a specific city",
"input_schema": {
"type": "object",
"properties": {
"city": {"type": "string", "description": "City name"},
"unit": {"type": "string", "enum": ["celsius", "fahrenheit"]}
},
"required": ["city"]
}
}
]
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
tools=tools,
messages=[{"role": "user", "content": "How's the weather in Seoul?"}]
)
# When stop_reason is "tool_use"
if response.stop_reason == "tool_use":
tool_use_block = next(
block for block in response.content
if block.type == "tool_use"
)
# Execute the tool
result = get_weather(city=tool_use_block.input["city"])
# Pass the result
final_response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
tools=tools,
messages=[
{"role": "user", "content": "How's the weather in Seoul?"},
{"role": "assistant", "content": response.content},
{
"role": "user",
"content": [{
"type": "tool_result",
"tool_use_id": tool_use_block.id,
"content": json.dumps(result, ensure_ascii=False)
}]
}
]
)
Error Handling Patterns
Robust Function Execution Loop
import json
from typing import Callable
class ToolExecutor:
def __init__(self):
self.tools: dict[str, Callable] = {}
self.max_retries = 3
def register(self, name: str, func: Callable):
self.tools[name] = func
async def execute(self, tool_call) -> dict:
func_name = tool_call.function.name
try:
args = json.loads(tool_call.function.arguments)
except json.JSONDecodeError:
return {"error": f"Invalid JSON arguments: {tool_call.function.arguments}"}
if func_name not in self.tools:
return {"error": f"Unknown function: {func_name}"}
for attempt in range(self.max_retries):
try:
result = await self.tools[func_name](**args)
return {"success": True, "data": result}
except TypeError as e:
return {"error": f"Invalid arguments: {str(e)}"}
except Exception as e:
if attempt == self.max_retries - 1:
return {"error": f"Failed after {self.max_retries} retries: {str(e)}"}
await asyncio.sleep(2 ** attempt)
async def run_conversation(self, client, messages, tools_spec):
"""Run the conversation loop."""
while True:
response = client.chat.completions.create(
model="gpt-4o",
messages=messages,
tools=tools_spec
)
choice = response.choices[0]
if choice.finish_reason == "stop":
return choice.message.content
if choice.finish_reason == "tool_calls":
messages.append(choice.message)
for tc in choice.message.tool_calls:
result = await self.execute(tc)
messages.append({
"role": "tool",
"tool_call_id": tc.id,
"content": json.dumps(result, ensure_ascii=False)
})
Structured Output and Function Calling
from pydantic import BaseModel, Field
class WeatherResponse(BaseModel):
city: str = Field(description="City name")
temperature: float = Field(description="Current temperature")
condition: str = Field(description="Weather condition")
recommendation: str = Field(description="Clothing recommendation")
# Convert Pydantic model to JSON Schema
def model_to_tool(model_class, name: str, description: str) -> dict:
return {
"type": "function",
"function": {
"name": name,
"description": description,
"parameters": model_class.model_json_schema()
}
}
weather_tool = model_to_tool(
WeatherResponse,
"format_weather",
"Returns weather information in a structured format"
)
Function Calling with Open-Source Models
Ollama + Tool Use
import ollama
response = ollama.chat(
model='llama3.1',
messages=[{'role': 'user', 'content': 'Tell me the weather in Seoul'}],
tools=[{
'type': 'function',
'function': {
'name': 'get_weather',
'description': 'Retrieve weather information',
'parameters': {
'type': 'object',
'properties': {
'city': {'type': 'string', 'description': 'City name'}
},
'required': ['city']
}
}
}]
)
if response['message'].get('tool_calls'):
for tool_call in response['message']['tool_calls']:
print(f"Function: {tool_call['function']['name']}")
print(f"Args: {tool_call['function']['arguments']}")
Serving Tool Use with vLLM
# Start vLLM server
# vllm serve meta-llama/Llama-3.1-8B-Instruct \
# --enable-auto-tool-choice \
# --tool-call-parser hermes
from openai import OpenAI
client = OpenAI(base_url="http://localhost:8000/v1", api_key="dummy")
response = client.chat.completions.create(
model="meta-llama/Llama-3.1-8B-Instruct",
messages=[{"role": "user", "content": "Tell me the weather in Seoul"}],
tools=tools,
tool_choice="auto"
)
Production Design Patterns
Tool Selection Control
# Force a specific function call
response = client.chat.completions.create(
model="gpt-4o",
messages=messages,
tools=tools,
tool_choice={"type": "function", "function": {"name": "get_weather"}}
)
# Disable function calling
response = client.chat.completions.create(
model="gpt-4o",
messages=messages,
tools=tools,
tool_choice="none"
)
Permission-Based Tool Filtering
class ToolRegistry:
def __init__(self):
self.tools = {}
self.permissions = {}
def register(self, name: str, func: Callable, spec: dict, required_role: str = "user"):
self.tools[name] = func
self.permissions[name] = required_role
def get_tools_for_role(self, role: str) -> list:
role_hierarchy = {"admin": 3, "operator": 2, "user": 1}
user_level = role_hierarchy.get(role, 0)
return [
{"type": "function", "function": spec}
for name, spec in self.tools.items()
if role_hierarchy.get(self.permissions[name], 0) <= user_level
]
Token Optimization
def optimize_tool_result(result: dict, max_chars: int = 2000) -> str:
"""Convert tool results in a token-efficient manner."""
result_str = json.dumps(result, ensure_ascii=False)
if len(result_str) <= max_chars:
return result_str
# Summarize large results
if isinstance(result, list) and len(result) > 10:
return json.dumps({
"total_count": len(result),
"showing": "first 10",
"items": result[:10],
"note": f"Showing top 10 out of {len(result)} total"
}, ensure_ascii=False)
return result_str[:max_chars] + "... (truncated)"
Benchmark: Function Calling Performance Comparison
| Model | Single Call Accuracy | Parallel Call Accuracy | Argument Parsing Accuracy |
|---|---|---|---|
| GPT-4o | 97.2% | 94.5% | 98.1% |
| Claude 3.5 Sonnet | 96.8% | 93.2% | 97.5% |
| Llama 3.1 70B | 91.5% | 85.3% | 93.2% |
| Llama 3.1 8B | 84.2% | 72.1% | 88.7% |
Review Quiz (6 Questions)
Q1. What is the role of the LLM in Function Calling?
The LLM decides which function to call with which arguments. The actual execution is handled by the application.
Q2. What is the biggest difference between OpenAI and Anthropic Function Calling APIs?
OpenAI uses tool_calls/tool messages, while Anthropic uses tool_use/tool_result types within content blocks.
Q3. When are Parallel Tool Calls useful?
When requesting multiple independent tasks simultaneously (e.g., weather lookup + product search), reducing latency.
Q4. What are the three values for the tool_choice parameter and their meanings?
auto (LLM decides), none (disable function calling), specific function (forced invocation)
Q5. What configuration is needed for Function Calling with open-source models?
Enable --enable-auto-tool-choice and --tool-call-parser options in vLLM, or use the tools parameter in Ollama.
Q6. What is the token optimization strategy when tool results are too long?
Truncate the results, or in the case of lists, return only the top N items and include the total count as metadata.