Split View: LLM Function Calling 완벽 가이드: Tool Use 패턴부터 프로덕션 설계까지

LLM Function Calling 완벽 가이드: Tool Use 패턴부터 프로덕션 설계까지

Function Calling이란?
- 동작 흐름
OpenAI Function Calling
- 기본 사용법
- 병렬 함수 호출 (Parallel Tool Calls)
Anthropic Tool Use
에러 처리 패턴
- 강건한 함수 실행 루프
Structured Output과 Function Calling
오픈소스 모델에서의 Function Calling
- Ollama + Tool Use
- vLLM으로 Tool Use 서빙
프로덕션 설계 패턴
벤치마크: Function Calling 성능 비교

Function Calling이란?

Function Calling(Tool Use)은 LLM이 외부 함수/API를 호출할 수 있게 하는 메커니즘입니다. LLM 자체가 코드를 실행하는 것이 아니라, 어떤 함수를 어떤 인자로 호출해야 하는지 결정하고, 실제 실행은 애플리케이션이 담당합니다.

동작 흐름

사용자: "서울 날씨 알려줘"
    ↓
LLM: tool_call(get_weather, city="서울")
    ↓
앱: get_weather("서울") 실행 → {"temp": 5, "condition": "맑음"}
    ↓
LLM: "서울의 현재 날씨는 기온 5°C, 맑음입니다."

OpenAI Function Calling

기본 사용법

from openai import OpenAI
import json

client = OpenAI()

# 도구 정의
tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "특정 도시의 현재 날씨 정보를 가져옵니다",
            "parameters": {
                "type": "object",
                "properties": {
                    "city": {
                        "type": "string",
                        "description": "도시 이름 (예: 서울, Tokyo, New York)"
                    },
                    "unit": {
                        "type": "string",
                        "enum": ["celsius", "fahrenheit"],
                        "description": "온도 단위"
                    }
                },
                "required": ["city"]
            }
        }
    },
    {
        "type": "function",
        "function": {
            "name": "search_products",
            "description": "상품을 검색합니다",
            "parameters": {
                "type": "object",
                "properties": {
                    "query": {"type": "string", "description": "검색어"},
                    "category": {"type": "string", "enum": ["electronics", "clothing", "food"]},
                    "max_price": {"type": "number", "description": "최대 가격 (원)"},
                    "sort_by": {"type": "string", "enum": ["price", "rating", "newest"]}
                },
                "required": ["query"]
            }
        }
    }
]

# 첫 번째 호출
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "서울 날씨 알려주고, 우산도 검색해줘"}],
    tools=tools,
    tool_choice="auto"
)

message = response.choices[0].message
print(f"Tool calls: {len(message.tool_calls)}")

# 도구 실행 및 결과 전달
messages = [
    {"role": "user", "content": "서울 날씨 알려주고, 우산도 검색해줘"},
    message  # assistant의 tool_call 메시지
]

# 각 tool_call에 대한 결과 추가
for tool_call in message.tool_calls:
    func_name = tool_call.function.name
    args = json.loads(tool_call.function.arguments)

    if func_name == "get_weather":
        result = {"temp": 5, "condition": "흐림", "humidity": 65}
    elif func_name == "search_products":
        result = [
            {"name": "접이식 우산", "price": 15000, "rating": 4.5},
            {"name": "자동 우산", "price": 25000, "rating": 4.8}
        ]
    else:
        result = {"error": "Unknown function"}

    messages.append({
        "role": "tool",
        "tool_call_id": tool_call.id,
        "content": json.dumps(result, ensure_ascii=False)
    })

# 최종 응답 생성
final_response = client.chat.completions.create(
    model="gpt-4o",
    messages=messages,
    tools=tools
)

print(final_response.choices[0].message.content)

병렬 함수 호출 (Parallel Tool Calls)

GPT-4o는 여러 함수를 동시에 호출할 수 있습니다:

# "서울 날씨 알려주고, 우산도 검색해줘" → 2개 tool_call이 동시에 반환됨
# tool_calls = [
#   {"function": {"name": "get_weather", "arguments": '{"city": "서울"}'}},
#   {"function": {"name": "search_products", "arguments": '{"query": "우산"}'}}
# ]

import asyncio

async def execute_tool_calls(tool_calls: list) -> list:
    """병렬로 도구 호출을 실행합니다."""
    async def execute_one(tc):
        func_name = tc.function.name
        args = json.loads(tc.function.arguments)

        # 실제로는 API 호출 등 비동기 작업
        if func_name == "get_weather":
            return await get_weather_async(**args)
        elif func_name == "search_products":
            return await search_products_async(**args)

    results = await asyncio.gather(*[execute_one(tc) for tc in tool_calls])
    return results

Anthropic Tool Use

Anthropic(Claude)의 Tool Use는 약간 다른 API 형식을 사용합니다:

import anthropic

client = anthropic.Anthropic()

tools = [
    {
        "name": "get_weather",
        "description": "특정 도시의 현재 날씨 정보를 가져옵니다",
        "input_schema": {
            "type": "object",
            "properties": {
                "city": {"type": "string", "description": "도시 이름"},
                "unit": {"type": "string", "enum": ["celsius", "fahrenheit"]}
            },
            "required": ["city"]
        }
    }
]

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    tools=tools,
    messages=[{"role": "user", "content": "서울 날씨 어때?"}]
)

# stop_reason이 "tool_use"인 경우
if response.stop_reason == "tool_use":
    tool_use_block = next(
        block for block in response.content
        if block.type == "tool_use"
    )

    # 도구 실행
    result = get_weather(city=tool_use_block.input["city"])

    # 결과 전달
    final_response = client.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=1024,
        tools=tools,
        messages=[
            {"role": "user", "content": "서울 날씨 어때?"},
            {"role": "assistant", "content": response.content},
            {
                "role": "user",
                "content": [{
                    "type": "tool_result",
                    "tool_use_id": tool_use_block.id,
                    "content": json.dumps(result, ensure_ascii=False)
                }]
            }
        ]
    )

에러 처리 패턴

강건한 함수 실행 루프

import json
from typing import Callable

class ToolExecutor:
    def __init__(self):
        self.tools: dict[str, Callable] = {}
        self.max_retries = 3

    def register(self, name: str, func: Callable):
        self.tools[name] = func

    async def execute(self, tool_call) -> dict:
        func_name = tool_call.function.name
        try:
            args = json.loads(tool_call.function.arguments)
        except json.JSONDecodeError:
            return {"error": f"Invalid JSON arguments: {tool_call.function.arguments}"}

        if func_name not in self.tools:
            return {"error": f"Unknown function: {func_name}"}

        for attempt in range(self.max_retries):
            try:
                result = await self.tools[func_name](**args)
                return {"success": True, "data": result}
            except TypeError as e:
                return {"error": f"Invalid arguments: {str(e)}"}
            except Exception as e:
                if attempt == self.max_retries - 1:
                    return {"error": f"Failed after {self.max_retries} retries: {str(e)}"}
                await asyncio.sleep(2 ** attempt)

    async def run_conversation(self, client, messages, tools_spec):
        """대화 루프를 실행합니다."""
        while True:
            response = client.chat.completions.create(
                model="gpt-4o",
                messages=messages,
                tools=tools_spec
            )

            choice = response.choices[0]

            if choice.finish_reason == "stop":
                return choice.message.content

            if choice.finish_reason == "tool_calls":
                messages.append(choice.message)

                for tc in choice.message.tool_calls:
                    result = await self.execute(tc)
                    messages.append({
                        "role": "tool",
                        "tool_call_id": tc.id,
                        "content": json.dumps(result, ensure_ascii=False)
                    })

Structured Output과 Function Calling

from pydantic import BaseModel, Field

class WeatherResponse(BaseModel):
    city: str = Field(description="도시 이름")
    temperature: float = Field(description="현재 기온")
    condition: str = Field(description="날씨 상태")
    recommendation: str = Field(description="옷차림 추천")

# Pydantic 모델을 JSON Schema로 변환
def model_to_tool(model_class, name: str, description: str) -> dict:
    return {
        "type": "function",
        "function": {
            "name": name,
            "description": description,
            "parameters": model_class.model_json_schema()
        }
    }

weather_tool = model_to_tool(
    WeatherResponse,
    "format_weather",
    "날씨 정보를 구조화된 형식으로 반환합니다"
)

오픈소스 모델에서의 Function Calling

Ollama + Tool Use

import ollama

response = ollama.chat(
    model='llama3.1',
    messages=[{'role': 'user', 'content': '서울 날씨 알려줘'}],
    tools=[{
        'type': 'function',
        'function': {
            'name': 'get_weather',
            'description': '날씨 정보 조회',
            'parameters': {
                'type': 'object',
                'properties': {
                    'city': {'type': 'string', 'description': '도시 이름'}
                },
                'required': ['city']
            }
        }
    }]
)

if response['message'].get('tool_calls'):
    for tool_call in response['message']['tool_calls']:
        print(f"Function: {tool_call['function']['name']}")
        print(f"Args: {tool_call['function']['arguments']}")

vLLM으로 Tool Use 서빙

# vLLM 서버 실행
# vllm serve meta-llama/Llama-3.1-8B-Instruct \
#   --enable-auto-tool-choice \
#   --tool-call-parser hermes

from openai import OpenAI

client = OpenAI(base_url="http://localhost:8000/v1", api_key="dummy")

response = client.chat.completions.create(
    model="meta-llama/Llama-3.1-8B-Instruct",
    messages=[{"role": "user", "content": "서울 날씨 알려줘"}],
    tools=tools,
    tool_choice="auto"
)

프로덕션 설계 패턴

도구 선택 제어

# 특정 함수 강제 호출
response = client.chat.completions.create(
    model="gpt-4o",
    messages=messages,
    tools=tools,
    tool_choice={"type": "function", "function": {"name": "get_weather"}}
)

# 함수 호출 비활성화
response = client.chat.completions.create(
    model="gpt-4o",
    messages=messages,
    tools=tools,
    tool_choice="none"
)

권한 기반 도구 필터링

class ToolRegistry:
    def __init__(self):
        self.tools = {}
        self.permissions = {}

    def register(self, name: str, func: Callable, spec: dict, required_role: str = "user"):
        self.tools[name] = func
        self.permissions[name] = required_role

    def get_tools_for_role(self, role: str) -> list:
        role_hierarchy = {"admin": 3, "operator": 2, "user": 1}
        user_level = role_hierarchy.get(role, 0)

        return [
            {"type": "function", "function": spec}
            for name, spec in self.tools.items()
            if role_hierarchy.get(self.permissions[name], 0) <= user_level
        ]

토큰 최적화

def optimize_tool_result(result: dict, max_chars: int = 2000) -> str:
    """도구 결과를 토큰 효율적으로 변환합니다."""
    result_str = json.dumps(result, ensure_ascii=False)

    if len(result_str) <= max_chars:
        return result_str

    # 큰 결과는 요약
    if isinstance(result, list) and len(result) > 10:
        return json.dumps({
            "total_count": len(result),
            "showing": "first 10",
            "items": result[:10],
            "note": f"총 {len(result)}개 중 상위 10개만 표시"
        }, ensure_ascii=False)

    return result_str[:max_chars] + "... (truncated)"

벤치마크: Function Calling 성능 비교

모델	단일 호출 정확도	병렬 호출 정확도	인자 파싱 정확도
GPT-4o	97.2%	94.5%	98.1%
Claude 3.5 Sonnet	96.8%	93.2%	97.5%
Llama 3.1 70B	91.5%	85.3%	93.2%
Llama 3.1 8B	84.2%	72.1%	88.7%

📝 확인 퀴즈 (6문제)

Q1. Function Calling에서 LLM의 역할은?

LLM은 어떤 함수를 어떤 인자로 호출해야 하는지 결정합니다. 실제 실행은 애플리케이션이 담당합니다.

Q2. OpenAI와 Anthropic의 Function Calling API에서 가장 큰 차이점은?

OpenAI는 tool_calls/tool 메시지를 사용하고, Anthropic은 content 블록의 tool_use/tool_result 타입을 사용합니다.

Q3. 병렬 함수 호출(Parallel Tool Calls)이 유용한 경우는?

서로 독립적인 여러 작업을 동시에 요청할 때 (예: 날씨 조회 + 상품 검색) 지연 시간을 줄일 수 있습니다.

Q4. tool_choice 파라미터의 세 가지 값과 의미는?

auto(LLM이 판단), none(함수 호출 비활성화), 특정 함수 지정(강제 호출)

Q5. 오픈소스 모델에서 Function Calling을 지원하려면 어떤 설정이 필요한가요?

vLLM에서 --enable-auto-tool-choice와 --tool-call-parser 옵션을 활성화하거나, Ollama에서 tools 파라미터를 사용합니다.

Q6. 도구 결과가 너무 길 때 토큰 최적화 전략은?

결과를 잘라내거나(truncation), 리스트의 경우 상위 N개만 반환하고 전체 건수를 메타데이터로 포함합니다.

Complete Guide to LLM Function Calling: From Tool Use Patterns to Production Design

What is Function Calling?
- Execution Flow
OpenAI Function Calling
- Basic Usage
- Parallel Tool Calls
Anthropic Tool Use
Error Handling Patterns
- Robust Function Execution Loop
Structured Output and Function Calling
Function Calling with Open-Source Models
- Ollama + Tool Use
- Serving Tool Use with vLLM
Production Design Patterns
Benchmark: Function Calling Performance Comparison
Quiz

What is Function Calling?

Function Calling (Tool Use) is a mechanism that enables LLMs to invoke external functions and APIs. The LLM does not execute code itself — instead, it decides which function to call with which arguments, and the application handles the actual execution.

Execution Flow

User: "Tell me the weather in Seoul"
    |
LLM: tool_call(get_weather, city="Seoul")
    |
App: execute get_weather("Seoul") -> {"temp": 5, "condition": "clear"}
    |
LLM: "The current weather in Seoul is 5 degrees C and clear."

OpenAI Function Calling

Basic Usage

from openai import OpenAI
import json

client = OpenAI()

# Tool definitions
tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Retrieves current weather information for a specific city",
            "parameters": {
                "type": "object",
                "properties": {
                    "city": {
                        "type": "string",
                        "description": "City name (e.g., Seoul, Tokyo, New York)"
                    },
                    "unit": {
                        "type": "string",
                        "enum": ["celsius", "fahrenheit"],
                        "description": "Temperature unit"
                    }
                },
                "required": ["city"]
            }
        }
    },
    {
        "type": "function",
        "function": {
            "name": "search_products",
            "description": "Searches for products",
            "parameters": {
                "type": "object",
                "properties": {
                    "query": {"type": "string", "description": "Search query"},
                    "category": {"type": "string", "enum": ["electronics", "clothing", "food"]},
                    "max_price": {"type": "number", "description": "Maximum price"},
                    "sort_by": {"type": "string", "enum": ["price", "rating", "newest"]}
                },
                "required": ["query"]
            }
        }
    }
]

# First call
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Tell me Seoul's weather and search for umbrellas"}],
    tools=tools,
    tool_choice="auto"
)

message = response.choices[0].message
print(f"Tool calls: {len(message.tool_calls)}")

# Execute tools and pass results
messages = [
    {"role": "user", "content": "Tell me Seoul's weather and search for umbrellas"},
    message  # assistant's tool_call message
]

# Add results for each tool_call
for tool_call in message.tool_calls:
    func_name = tool_call.function.name
    args = json.loads(tool_call.function.arguments)

    if func_name == "get_weather":
        result = {"temp": 5, "condition": "cloudy", "humidity": 65}
    elif func_name == "search_products":
        result = [
            {"name": "Folding Umbrella", "price": 15000, "rating": 4.5},
            {"name": "Automatic Umbrella", "price": 25000, "rating": 4.8}
        ]
    else:
        result = {"error": "Unknown function"}

    messages.append({
        "role": "tool",
        "tool_call_id": tool_call.id,
        "content": json.dumps(result, ensure_ascii=False)
    })

# Generate final response
final_response = client.chat.completions.create(
    model="gpt-4o",
    messages=messages,
    tools=tools
)

print(final_response.choices[0].message.content)

Parallel Tool Calls

GPT-4o can call multiple functions simultaneously:

# "Tell me Seoul's weather and search for umbrellas" -> 2 tool_calls returned at once
# tool_calls = [
#   {"function": {"name": "get_weather", "arguments": '{"city": "Seoul"}'}},
#   {"function": {"name": "search_products", "arguments": '{"query": "umbrella"}'}}
# ]

import asyncio

async def execute_tool_calls(tool_calls: list) -> list:
    """Execute tool calls in parallel."""
    async def execute_one(tc):
        func_name = tc.function.name
        args = json.loads(tc.function.arguments)

        # In practice, these would be async API calls
        if func_name == "get_weather":
            return await get_weather_async(**args)
        elif func_name == "search_products":
            return await search_products_async(**args)

    results = await asyncio.gather(*[execute_one(tc) for tc in tool_calls])
    return results

Anthropic Tool Use

Anthropic (Claude) Tool Use uses a slightly different API format:

import anthropic

client = anthropic.Anthropic()

tools = [
    {
        "name": "get_weather",
        "description": "Retrieves current weather information for a specific city",
        "input_schema": {
            "type": "object",
            "properties": {
                "city": {"type": "string", "description": "City name"},
                "unit": {"type": "string", "enum": ["celsius", "fahrenheit"]}
            },
            "required": ["city"]
        }
    }
]

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    tools=tools,
    messages=[{"role": "user", "content": "How's the weather in Seoul?"}]
)

# When stop_reason is "tool_use"
if response.stop_reason == "tool_use":
    tool_use_block = next(
        block for block in response.content
        if block.type == "tool_use"
    )

    # Execute the tool
    result = get_weather(city=tool_use_block.input["city"])

    # Pass the result
    final_response = client.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=1024,
        tools=tools,
        messages=[
            {"role": "user", "content": "How's the weather in Seoul?"},
            {"role": "assistant", "content": response.content},
            {
                "role": "user",
                "content": [{
                    "type": "tool_result",
                    "tool_use_id": tool_use_block.id,
                    "content": json.dumps(result, ensure_ascii=False)
                }]
            }
        ]
    )

Error Handling Patterns

Robust Function Execution Loop

import json
from typing import Callable

class ToolExecutor:
    def __init__(self):
        self.tools: dict[str, Callable] = {}
        self.max_retries = 3

    def register(self, name: str, func: Callable):
        self.tools[name] = func

    async def execute(self, tool_call) -> dict:
        func_name = tool_call.function.name
        try:
            args = json.loads(tool_call.function.arguments)
        except json.JSONDecodeError:
            return {"error": f"Invalid JSON arguments: {tool_call.function.arguments}"}

        if func_name not in self.tools:
            return {"error": f"Unknown function: {func_name}"}

        for attempt in range(self.max_retries):
            try:
                result = await self.tools[func_name](**args)
                return {"success": True, "data": result}
            except TypeError as e:
                return {"error": f"Invalid arguments: {str(e)}"}
            except Exception as e:
                if attempt == self.max_retries - 1:
                    return {"error": f"Failed after {self.max_retries} retries: {str(e)}"}
                await asyncio.sleep(2 ** attempt)

    async def run_conversation(self, client, messages, tools_spec):
        """Run the conversation loop."""
        while True:
            response = client.chat.completions.create(
                model="gpt-4o",
                messages=messages,
                tools=tools_spec
            )

            choice = response.choices[0]

            if choice.finish_reason == "stop":
                return choice.message.content

            if choice.finish_reason == "tool_calls":
                messages.append(choice.message)

                for tc in choice.message.tool_calls:
                    result = await self.execute(tc)
                    messages.append({
                        "role": "tool",
                        "tool_call_id": tc.id,
                        "content": json.dumps(result, ensure_ascii=False)
                    })

Structured Output and Function Calling

from pydantic import BaseModel, Field

class WeatherResponse(BaseModel):
    city: str = Field(description="City name")
    temperature: float = Field(description="Current temperature")
    condition: str = Field(description="Weather condition")
    recommendation: str = Field(description="Clothing recommendation")

# Convert Pydantic model to JSON Schema
def model_to_tool(model_class, name: str, description: str) -> dict:
    return {
        "type": "function",
        "function": {
            "name": name,
            "description": description,
            "parameters": model_class.model_json_schema()
        }
    }

weather_tool = model_to_tool(
    WeatherResponse,
    "format_weather",
    "Returns weather information in a structured format"
)

Function Calling with Open-Source Models

Ollama + Tool Use

import ollama

response = ollama.chat(
    model='llama3.1',
    messages=[{'role': 'user', 'content': 'Tell me the weather in Seoul'}],
    tools=[{
        'type': 'function',
        'function': {
            'name': 'get_weather',
            'description': 'Retrieve weather information',
            'parameters': {
                'type': 'object',
                'properties': {
                    'city': {'type': 'string', 'description': 'City name'}
                },
                'required': ['city']
            }
        }
    }]
)

if response['message'].get('tool_calls'):
    for tool_call in response['message']['tool_calls']:
        print(f"Function: {tool_call['function']['name']}")
        print(f"Args: {tool_call['function']['arguments']}")

Serving Tool Use with vLLM

# Start vLLM server
# vllm serve meta-llama/Llama-3.1-8B-Instruct \
#   --enable-auto-tool-choice \
#   --tool-call-parser hermes

from openai import OpenAI

client = OpenAI(base_url="http://localhost:8000/v1", api_key="dummy")

response = client.chat.completions.create(
    model="meta-llama/Llama-3.1-8B-Instruct",
    messages=[{"role": "user", "content": "Tell me the weather in Seoul"}],
    tools=tools,
    tool_choice="auto"
)

Production Design Patterns

Tool Selection Control

# Force a specific function call
response = client.chat.completions.create(
    model="gpt-4o",
    messages=messages,
    tools=tools,
    tool_choice={"type": "function", "function": {"name": "get_weather"}}
)

# Disable function calling
response = client.chat.completions.create(
    model="gpt-4o",
    messages=messages,
    tools=tools,
    tool_choice="none"
)

Permission-Based Tool Filtering

class ToolRegistry:
    def __init__(self):
        self.tools = {}
        self.permissions = {}

    def register(self, name: str, func: Callable, spec: dict, required_role: str = "user"):
        self.tools[name] = func
        self.permissions[name] = required_role

    def get_tools_for_role(self, role: str) -> list:
        role_hierarchy = {"admin": 3, "operator": 2, "user": 1}
        user_level = role_hierarchy.get(role, 0)

        return [
            {"type": "function", "function": spec}
            for name, spec in self.tools.items()
            if role_hierarchy.get(self.permissions[name], 0) <= user_level
        ]

Token Optimization

def optimize_tool_result(result: dict, max_chars: int = 2000) -> str:
    """Convert tool results in a token-efficient manner."""
    result_str = json.dumps(result, ensure_ascii=False)

    if len(result_str) <= max_chars:
        return result_str

    # Summarize large results
    if isinstance(result, list) and len(result) > 10:
        return json.dumps({
            "total_count": len(result),
            "showing": "first 10",
            "items": result[:10],
            "note": f"Showing top 10 out of {len(result)} total"
        }, ensure_ascii=False)

    return result_str[:max_chars] + "... (truncated)"

Benchmark: Function Calling Performance Comparison

Model	Single Call Accuracy	Parallel Call Accuracy	Argument Parsing Accuracy
GPT-4o	97.2%	94.5%	98.1%
Claude 3.5 Sonnet	96.8%	93.2%	97.5%
Llama 3.1 70B	91.5%	85.3%	93.2%
Llama 3.1 8B	84.2%	72.1%	88.7%

Review Quiz (6 Questions)

Q1. What is the role of the LLM in Function Calling?

The LLM decides which function to call with which arguments. The actual execution is handled by the application.

Q2. What is the biggest difference between OpenAI and Anthropic Function Calling APIs?

OpenAI uses tool_calls/tool messages, while Anthropic uses tool_use/tool_result types within content blocks.

Q3. When are Parallel Tool Calls useful?

When requesting multiple independent tasks simultaneously (e.g., weather lookup + product search), reducing latency.

Q4. What are the three values for the tool_choice parameter and their meanings?

auto (LLM decides), none (disable function calling), specific function (forced invocation)

Q5. What configuration is needed for Function Calling with open-source models?

Enable --enable-auto-tool-choice and --tool-call-parser options in vLLM, or use the tools parameter in Ollama.

Q6. What is the token optimization strategy when tool results are too long?

Truncate the results, or in the case of lists, return only the top N items and include the total count as metadata.

Quiz

Q1: What is the main topic covered in "Complete Guide to LLM Function Calling: From Tool Use Patterns to Production Design"?

A deep dive into the Function Calling (Tool Use) mechanism of LLMs. Covers implementation methods across OpenAI, Anthropic, and open-source models, along with error handling, parallel invocation, and production design patterns.

Q2: What is Function Calling??

Function Calling (Tool Use) is a mechanism that enables LLMs to invoke external functions and APIs. The LLM does not execute code itself — instead, it decides which function to call with which arguments, and the application handles the actual execution. Execution Flow

Q3: Explain the core concept of OpenAI Function Calling.

Basic Usage Parallel Tool Calls GPT-4o can call multiple functions simultaneously:

Q4: What are the key aspects of Anthropic Tool Use?

Anthropic (Claude) Tool Use uses a slightly different API format:

Q5: How does Error Handling Patterns work?

Robust Function Execution Loop