Complete Guide to LLM Function Calling: From Tool Use Patterns to Production Design

What is Function Calling?
- Execution Flow
OpenAI Function Calling
- Basic Usage
- Parallel Tool Calls
Anthropic Tool Use
Error Handling Patterns
- Robust Function Execution Loop
Structured Output and Function Calling
Function Calling with Open-Source Models
- Ollama + Tool Use
- Serving Tool Use with vLLM
Production Design Patterns
Benchmark: Function Calling Performance Comparison

What is Function Calling?

Function Calling (Tool Use) is a mechanism that enables LLMs to invoke external functions and APIs. The LLM does not execute code itself — instead, it decides which function to call with which arguments, and the application handles the actual execution.

Execution Flow

User: "Tell me the weather in Seoul"
    |
LLM: tool_call(get_weather, city="Seoul")
    |
App: execute get_weather("Seoul") -> {"temp": 5, "condition": "clear"}
    |
LLM: "The current weather in Seoul is 5 degrees C and clear."

OpenAI Function Calling

Basic Usage

from openai import OpenAI
import json

client = OpenAI()

# Tool definitions
tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Retrieves current weather information for a specific city",
            "parameters": {
                "type": "object",
                "properties": {
                    "city": {
                        "type": "string",
                        "description": "City name (e.g., Seoul, Tokyo, New York)"
                    },
                    "unit": {
                        "type": "string",
                        "enum": ["celsius", "fahrenheit"],
                        "description": "Temperature unit"
                    }
                },
                "required": ["city"]
            }
        }
    },
    {
        "type": "function",
        "function": {
            "name": "search_products",
            "description": "Searches for products",
            "parameters": {
                "type": "object",
                "properties": {
                    "query": {"type": "string", "description": "Search query"},
                    "category": {"type": "string", "enum": ["electronics", "clothing", "food"]},
                    "max_price": {"type": "number", "description": "Maximum price"},
                    "sort_by": {"type": "string", "enum": ["price", "rating", "newest"]}
                },
                "required": ["query"]
            }
        }
    }
]

# First call
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Tell me Seoul's weather and search for umbrellas"}],
    tools=tools,
    tool_choice="auto"
)

message = response.choices[0].message
print(f"Tool calls: {len(message.tool_calls)}")

# Execute tools and pass results
messages = [
    {"role": "user", "content": "Tell me Seoul's weather and search for umbrellas"},
    message  # assistant's tool_call message
]

# Add results for each tool_call
for tool_call in message.tool_calls:
    func_name = tool_call.function.name
    args = json.loads(tool_call.function.arguments)

    if func_name == "get_weather":
        result = {"temp": 5, "condition": "cloudy", "humidity": 65}
    elif func_name == "search_products":
        result = [
            {"name": "Folding Umbrella", "price": 15000, "rating": 4.5},
            {"name": "Automatic Umbrella", "price": 25000, "rating": 4.8}
        ]
    else:
        result = {"error": "Unknown function"}

    messages.append({
        "role": "tool",
        "tool_call_id": tool_call.id,
        "content": json.dumps(result, ensure_ascii=False)
    })

# Generate final response
final_response = client.chat.completions.create(
    model="gpt-4o",
    messages=messages,
    tools=tools
)

print(final_response.choices[0].message.content)

Parallel Tool Calls

GPT-4o can call multiple functions simultaneously:

# "Tell me Seoul's weather and search for umbrellas" -> 2 tool_calls returned at once
# tool_calls = [
#   {"function": {"name": "get_weather", "arguments": '{"city": "Seoul"}'}},
#   {"function": {"name": "search_products", "arguments": '{"query": "umbrella"}'}}
# ]

import asyncio

async def execute_tool_calls(tool_calls: list) -> list:
    """Execute tool calls in parallel."""
    async def execute_one(tc):
        func_name = tc.function.name
        args = json.loads(tc.function.arguments)

        # In practice, these would be async API calls
        if func_name == "get_weather":
            return await get_weather_async(**args)
        elif func_name == "search_products":
            return await search_products_async(**args)

    results = await asyncio.gather(*[execute_one(tc) for tc in tool_calls])
    return results

Anthropic Tool Use

Anthropic (Claude) Tool Use uses a slightly different API format:

import anthropic

client = anthropic.Anthropic()

tools = [
    {
        "name": "get_weather",
        "description": "Retrieves current weather information for a specific city",
        "input_schema": {
            "type": "object",
            "properties": {
                "city": {"type": "string", "description": "City name"},
                "unit": {"type": "string", "enum": ["celsius", "fahrenheit"]}
            },
            "required": ["city"]
        }
    }
]

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    tools=tools,
    messages=[{"role": "user", "content": "How's the weather in Seoul?"}]
)

# When stop_reason is "tool_use"
if response.stop_reason == "tool_use":
    tool_use_block = next(
        block for block in response.content
        if block.type == "tool_use"
    )

    # Execute the tool
    result = get_weather(city=tool_use_block.input["city"])

    # Pass the result
    final_response = client.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=1024,
        tools=tools,
        messages=[
            {"role": "user", "content": "How's the weather in Seoul?"},
            {"role": "assistant", "content": response.content},
            {
                "role": "user",
                "content": [{
                    "type": "tool_result",
                    "tool_use_id": tool_use_block.id,
                    "content": json.dumps(result, ensure_ascii=False)
                }]
            }
        ]
    )

Error Handling Patterns

Robust Function Execution Loop

import json
from typing import Callable

class ToolExecutor:
    def __init__(self):
        self.tools: dict[str, Callable] = {}
        self.max_retries = 3

    def register(self, name: str, func: Callable):
        self.tools[name] = func

    async def execute(self, tool_call) -> dict:
        func_name = tool_call.function.name
        try:
            args = json.loads(tool_call.function.arguments)
        except json.JSONDecodeError:
            return {"error": f"Invalid JSON arguments: {tool_call.function.arguments}"}

        if func_name not in self.tools:
            return {"error": f"Unknown function: {func_name}"}

        for attempt in range(self.max_retries):
            try:
                result = await self.tools[func_name](**args)
                return {"success": True, "data": result}
            except TypeError as e:
                return {"error": f"Invalid arguments: {str(e)}"}
            except Exception as e:
                if attempt == self.max_retries - 1:
                    return {"error": f"Failed after {self.max_retries} retries: {str(e)}"}
                await asyncio.sleep(2 ** attempt)

    async def run_conversation(self, client, messages, tools_spec):
        """Run the conversation loop."""
        while True:
            response = client.chat.completions.create(
                model="gpt-4o",
                messages=messages,
                tools=tools_spec
            )

            choice = response.choices[0]

            if choice.finish_reason == "stop":
                return choice.message.content

            if choice.finish_reason == "tool_calls":
                messages.append(choice.message)

                for tc in choice.message.tool_calls:
                    result = await self.execute(tc)
                    messages.append({
                        "role": "tool",
                        "tool_call_id": tc.id,
                        "content": json.dumps(result, ensure_ascii=False)
                    })

Structured Output and Function Calling

from pydantic import BaseModel, Field

class WeatherResponse(BaseModel):
    city: str = Field(description="City name")
    temperature: float = Field(description="Current temperature")
    condition: str = Field(description="Weather condition")
    recommendation: str = Field(description="Clothing recommendation")

# Convert Pydantic model to JSON Schema
def model_to_tool(model_class, name: str, description: str) -> dict:
    return {
        "type": "function",
        "function": {
            "name": name,
            "description": description,
            "parameters": model_class.model_json_schema()
        }
    }

weather_tool = model_to_tool(
    WeatherResponse,
    "format_weather",
    "Returns weather information in a structured format"
)

Function Calling with Open-Source Models

Ollama + Tool Use

import ollama

response = ollama.chat(
    model='llama3.1',
    messages=[{'role': 'user', 'content': 'Tell me the weather in Seoul'}],
    tools=[{
        'type': 'function',
        'function': {
            'name': 'get_weather',
            'description': 'Retrieve weather information',
            'parameters': {
                'type': 'object',
                'properties': {
                    'city': {'type': 'string', 'description': 'City name'}
                },
                'required': ['city']
            }
        }
    }]
)

if response['message'].get('tool_calls'):
    for tool_call in response['message']['tool_calls']:
        print(f"Function: {tool_call['function']['name']}")
        print(f"Args: {tool_call['function']['arguments']}")

Serving Tool Use with vLLM

# Start vLLM server
# vllm serve meta-llama/Llama-3.1-8B-Instruct \
#   --enable-auto-tool-choice \
#   --tool-call-parser hermes

from openai import OpenAI

client = OpenAI(base_url="http://localhost:8000/v1", api_key="dummy")

response = client.chat.completions.create(
    model="meta-llama/Llama-3.1-8B-Instruct",
    messages=[{"role": "user", "content": "Tell me the weather in Seoul"}],
    tools=tools,
    tool_choice="auto"
)

Production Design Patterns

Tool Selection Control

# Force a specific function call
response = client.chat.completions.create(
    model="gpt-4o",
    messages=messages,
    tools=tools,
    tool_choice={"type": "function", "function": {"name": "get_weather"}}
)

# Disable function calling
response = client.chat.completions.create(
    model="gpt-4o",
    messages=messages,
    tools=tools,
    tool_choice="none"
)

Permission-Based Tool Filtering

class ToolRegistry:
    def __init__(self):
        self.tools = {}
        self.permissions = {}

    def register(self, name: str, func: Callable, spec: dict, required_role: str = "user"):
        self.tools[name] = func
        self.permissions[name] = required_role

    def get_tools_for_role(self, role: str) -> list:
        role_hierarchy = {"admin": 3, "operator": 2, "user": 1}
        user_level = role_hierarchy.get(role, 0)

        return [
            {"type": "function", "function": spec}
            for name, spec in self.tools.items()
            if role_hierarchy.get(self.permissions[name], 0) <= user_level
        ]

Token Optimization

def optimize_tool_result(result: dict, max_chars: int = 2000) -> str:
    """Convert tool results in a token-efficient manner."""
    result_str = json.dumps(result, ensure_ascii=False)

    if len(result_str) <= max_chars:
        return result_str

    # Summarize large results
    if isinstance(result, list) and len(result) > 10:
        return json.dumps({
            "total_count": len(result),
            "showing": "first 10",
            "items": result[:10],
            "note": f"Showing top 10 out of {len(result)} total"
        }, ensure_ascii=False)

    return result_str[:max_chars] + "... (truncated)"

Benchmark: Function Calling Performance Comparison

Model	Single Call Accuracy	Parallel Call Accuracy	Argument Parsing Accuracy
GPT-4o	97.2%	94.5%	98.1%
Claude 3.5 Sonnet	96.8%	93.2%	97.5%
Llama 3.1 70B	91.5%	85.3%	93.2%
Llama 3.1 8B	84.2%	72.1%	88.7%

Review Quiz (6 Questions)

Q1. What is the role of the LLM in Function Calling?

The LLM decides which function to call with which arguments. The actual execution is handled by the application.

Q2. What is the biggest difference between OpenAI and Anthropic Function Calling APIs?

OpenAI uses tool_calls/tool messages, while Anthropic uses tool_use/tool_result types within content blocks.

Q3. When are Parallel Tool Calls useful?

When requesting multiple independent tasks simultaneously (e.g., weather lookup + product search), reducing latency.

Q4. What are the three values for the tool_choice parameter and their meanings?

auto (LLM decides), none (disable function calling), specific function (forced invocation)

Q5. What configuration is needed for Function Calling with open-source models?

Enable --enable-auto-tool-choice and --tool-call-parser options in vLLM, or use the tools parameter in Ollama.

Q6. What is the token optimization strategy when tool results are too long?

Truncate the results, or in the case of lists, return only the top N items and include the total count as metadata.