Split View: LLM Function Calling 완벽 가이드: Tool Use 패턴부터 프로덕션 설계까지
LLM Function Calling 완벽 가이드: Tool Use 패턴부터 프로덕션 설계까지
- Function Calling이란?
- OpenAI Function Calling
- Anthropic Tool Use
- 에러 처리 패턴
- Structured Output과 Function Calling
- 오픈소스 모델에서의 Function Calling
- 프로덕션 설계 패턴
- 벤치마크: Function Calling 성능 비교
Function Calling이란?
Function Calling(Tool Use)은 LLM이 외부 함수/API를 호출할 수 있게 하는 메커니즘입니다. LLM 자체가 코드를 실행하는 것이 아니라, 어떤 함수를 어떤 인자로 호출해야 하는지 결정하고, 실제 실행은 애플리케이션이 담당합니다.
동작 흐름
사용자: "서울 날씨 알려줘"
↓
LLM: tool_call(get_weather, city="서울")
↓
앱: get_weather("서울") 실행 → {"temp": 5, "condition": "맑음"}
↓
LLM: "서울의 현재 날씨는 기온 5°C, 맑음입니다."
OpenAI Function Calling
기본 사용법
from openai import OpenAI
import json
client = OpenAI()
# 도구 정의
tools = [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "특정 도시의 현재 날씨 정보를 가져옵니다",
"parameters": {
"type": "object",
"properties": {
"city": {
"type": "string",
"description": "도시 이름 (예: 서울, Tokyo, New York)"
},
"unit": {
"type": "string",
"enum": ["celsius", "fahrenheit"],
"description": "온도 단위"
}
},
"required": ["city"]
}
}
},
{
"type": "function",
"function": {
"name": "search_products",
"description": "상품을 검색합니다",
"parameters": {
"type": "object",
"properties": {
"query": {"type": "string", "description": "검색어"},
"category": {"type": "string", "enum": ["electronics", "clothing", "food"]},
"max_price": {"type": "number", "description": "최대 가격 (원)"},
"sort_by": {"type": "string", "enum": ["price", "rating", "newest"]}
},
"required": ["query"]
}
}
}
]
# 첫 번째 호출
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "서울 날씨 알려주고, 우산도 검색해줘"}],
tools=tools,
tool_choice="auto"
)
message = response.choices[0].message
print(f"Tool calls: {len(message.tool_calls)}")
# 도구 실행 및 결과 전달
messages = [
{"role": "user", "content": "서울 날씨 알려주고, 우산도 검색해줘"},
message # assistant의 tool_call 메시지
]
# 각 tool_call에 대한 결과 추가
for tool_call in message.tool_calls:
func_name = tool_call.function.name
args = json.loads(tool_call.function.arguments)
if func_name == "get_weather":
result = {"temp": 5, "condition": "흐림", "humidity": 65}
elif func_name == "search_products":
result = [
{"name": "접이식 우산", "price": 15000, "rating": 4.5},
{"name": "자동 우산", "price": 25000, "rating": 4.8}
]
else:
result = {"error": "Unknown function"}
messages.append({
"role": "tool",
"tool_call_id": tool_call.id,
"content": json.dumps(result, ensure_ascii=False)
})
# 최종 응답 생성
final_response = client.chat.completions.create(
model="gpt-4o",
messages=messages,
tools=tools
)
print(final_response.choices[0].message.content)
병렬 함수 호출 (Parallel Tool Calls)
GPT-4o는 여러 함수를 동시에 호출할 수 있습니다:
# "서울 날씨 알려주고, 우산도 검색해줘" → 2개 tool_call이 동시에 반환됨
# tool_calls = [
# {"function": {"name": "get_weather", "arguments": '{"city": "서울"}'}},
# {"function": {"name": "search_products", "arguments": '{"query": "우산"}'}}
# ]
import asyncio
async def execute_tool_calls(tool_calls: list) -> list:
"""병렬로 도구 호출을 실행합니다."""
async def execute_one(tc):
func_name = tc.function.name
args = json.loads(tc.function.arguments)
# 실제로는 API 호출 등 비동기 작업
if func_name == "get_weather":
return await get_weather_async(**args)
elif func_name == "search_products":
return await search_products_async(**args)
results = await asyncio.gather(*[execute_one(tc) for tc in tool_calls])
return results
Anthropic Tool Use
Anthropic(Claude)의 Tool Use는 약간 다른 API 형식을 사용합니다:
import anthropic
client = anthropic.Anthropic()
tools = [
{
"name": "get_weather",
"description": "특정 도시의 현재 날씨 정보를 가져옵니다",
"input_schema": {
"type": "object",
"properties": {
"city": {"type": "string", "description": "도시 이름"},
"unit": {"type": "string", "enum": ["celsius", "fahrenheit"]}
},
"required": ["city"]
}
}
]
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
tools=tools,
messages=[{"role": "user", "content": "서울 날씨 어때?"}]
)
# stop_reason이 "tool_use"인 경우
if response.stop_reason == "tool_use":
tool_use_block = next(
block for block in response.content
if block.type == "tool_use"
)
# 도구 실행
result = get_weather(city=tool_use_block.input["city"])
# 결과 전달
final_response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
tools=tools,
messages=[
{"role": "user", "content": "서울 날씨 어때?"},
{"role": "assistant", "content": response.content},
{
"role": "user",
"content": [{
"type": "tool_result",
"tool_use_id": tool_use_block.id,
"content": json.dumps(result, ensure_ascii=False)
}]
}
]
)
에러 처리 패턴
강건한 함수 실행 루프
import json
from typing import Callable
class ToolExecutor:
def __init__(self):
self.tools: dict[str, Callable] = {}
self.max_retries = 3
def register(self, name: str, func: Callable):
self.tools[name] = func
async def execute(self, tool_call) -> dict:
func_name = tool_call.function.name
try:
args = json.loads(tool_call.function.arguments)
except json.JSONDecodeError:
return {"error": f"Invalid JSON arguments: {tool_call.function.arguments}"}
if func_name not in self.tools:
return {"error": f"Unknown function: {func_name}"}
for attempt in range(self.max_retries):
try:
result = await self.tools[func_name](**args)
return {"success": True, "data": result}
except TypeError as e:
return {"error": f"Invalid arguments: {str(e)}"}
except Exception as e:
if attempt == self.max_retries - 1:
return {"error": f"Failed after {self.max_retries} retries: {str(e)}"}
await asyncio.sleep(2 ** attempt)
async def run_conversation(self, client, messages, tools_spec):
"""대화 루프를 실행합니다."""
while True:
response = client.chat.completions.create(
model="gpt-4o",
messages=messages,
tools=tools_spec
)
choice = response.choices[0]
if choice.finish_reason == "stop":
return choice.message.content
if choice.finish_reason == "tool_calls":
messages.append(choice.message)
for tc in choice.message.tool_calls:
result = await self.execute(tc)
messages.append({
"role": "tool",
"tool_call_id": tc.id,
"content": json.dumps(result, ensure_ascii=False)
})
Structured Output과 Function Calling
from pydantic import BaseModel, Field
class WeatherResponse(BaseModel):
city: str = Field(description="도시 이름")
temperature: float = Field(description="현재 기온")
condition: str = Field(description="날씨 상태")
recommendation: str = Field(description="옷차림 추천")
# Pydantic 모델을 JSON Schema로 변환
def model_to_tool(model_class, name: str, description: str) -> dict:
return {
"type": "function",
"function": {
"name": name,
"description": description,
"parameters": model_class.model_json_schema()
}
}
weather_tool = model_to_tool(
WeatherResponse,
"format_weather",
"날씨 정보를 구조화된 형식으로 반환합니다"
)
오픈소스 모델에서의 Function Calling
Ollama + Tool Use
import ollama
response = ollama.chat(
model='llama3.1',
messages=[{'role': 'user', 'content': '서울 날씨 알려줘'}],
tools=[{
'type': 'function',
'function': {
'name': 'get_weather',
'description': '날씨 정보 조회',
'parameters': {
'type': 'object',
'properties': {
'city': {'type': 'string', 'description': '도시 이름'}
},
'required': ['city']
}
}
}]
)
if response['message'].get('tool_calls'):
for tool_call in response['message']['tool_calls']:
print(f"Function: {tool_call['function']['name']}")
print(f"Args: {tool_call['function']['arguments']}")
vLLM으로 Tool Use 서빙
# vLLM 서버 실행
# vllm serve meta-llama/Llama-3.1-8B-Instruct \
# --enable-auto-tool-choice \
# --tool-call-parser hermes
from openai import OpenAI
client = OpenAI(base_url="http://localhost:8000/v1", api_key="dummy")
response = client.chat.completions.create(
model="meta-llama/Llama-3.1-8B-Instruct",
messages=[{"role": "user", "content": "서울 날씨 알려줘"}],
tools=tools,
tool_choice="auto"
)
프로덕션 설계 패턴
도구 선택 제어
# 특정 함수 강제 호출
response = client.chat.completions.create(
model="gpt-4o",
messages=messages,
tools=tools,
tool_choice={"type": "function", "function": {"name": "get_weather"}}
)
# 함수 호출 비활성화
response = client.chat.completions.create(
model="gpt-4o",
messages=messages,
tools=tools,
tool_choice="none"
)
권한 기반 도구 필터링
class ToolRegistry:
def __init__(self):
self.tools = {}
self.permissions = {}
def register(self, name: str, func: Callable, spec: dict, required_role: str = "user"):
self.tools[name] = func
self.permissions[name] = required_role
def get_tools_for_role(self, role: str) -> list:
role_hierarchy = {"admin": 3, "operator": 2, "user": 1}
user_level = role_hierarchy.get(role, 0)
return [
{"type": "function", "function": spec}
for name, spec in self.tools.items()
if role_hierarchy.get(self.permissions[name], 0) <= user_level
]
토큰 최적화
def optimize_tool_result(result: dict, max_chars: int = 2000) -> str:
"""도구 결과를 토큰 효율적으로 변환합니다."""
result_str = json.dumps(result, ensure_ascii=False)
if len(result_str) <= max_chars:
return result_str
# 큰 결과는 요약
if isinstance(result, list) and len(result) > 10:
return json.dumps({
"total_count": len(result),
"showing": "first 10",
"items": result[:10],
"note": f"총 {len(result)}개 중 상위 10개만 표시"
}, ensure_ascii=False)
return result_str[:max_chars] + "... (truncated)"
벤치마크: Function Calling 성능 비교
| 모델 | 단일 호출 정확도 | 병렬 호출 정확도 | 인자 파싱 정확도 |
|---|---|---|---|
| GPT-4o | 97.2% | 94.5% | 98.1% |
| Claude 3.5 Sonnet | 96.8% | 93.2% | 97.5% |
| Llama 3.1 70B | 91.5% | 85.3% | 93.2% |
| Llama 3.1 8B | 84.2% | 72.1% | 88.7% |
📝 확인 퀴즈 (6문제)
Q1. Function Calling에서 LLM의 역할은?
LLM은 어떤 함수를 어떤 인자로 호출해야 하는지 결정합니다. 실제 실행은 애플리케이션이 담당합니다.
Q2. OpenAI와 Anthropic의 Function Calling API에서 가장 큰 차이점은?
OpenAI는 tool_calls/tool 메시지를 사용하고, Anthropic은 content 블록의 tool_use/tool_result 타입을 사용합니다.
Q3. 병렬 함수 호출(Parallel Tool Calls)이 유용한 경우는?
서로 독립적인 여러 작업을 동시에 요청할 때 (예: 날씨 조회 + 상품 검색) 지연 시간을 줄일 수 있습니다.
Q4. tool_choice 파라미터의 세 가지 값과 의미는?
auto(LLM이 판단), none(함수 호출 비활성화), 특정 함수 지정(강제 호출)
Q5. 오픈소스 모델에서 Function Calling을 지원하려면 어떤 설정이 필요한가요?
vLLM에서 --enable-auto-tool-choice와 --tool-call-parser 옵션을 활성화하거나, Ollama에서 tools 파라미터를 사용합니다.
Q6. 도구 결과가 너무 길 때 토큰 최적화 전략은?
결과를 잘라내거나(truncation), 리스트의 경우 상위 N개만 반환하고 전체 건수를 메타데이터로 포함합니다.
Complete Guide to LLM Function Calling: From Tool Use Patterns to Production Design
- What is Function Calling?
- OpenAI Function Calling
- Anthropic Tool Use
- Error Handling Patterns
- Structured Output and Function Calling
- Function Calling with Open-Source Models
- Production Design Patterns
- Benchmark: Function Calling Performance Comparison
- Quiz
What is Function Calling?
Function Calling (Tool Use) is a mechanism that enables LLMs to invoke external functions and APIs. The LLM does not execute code itself — instead, it decides which function to call with which arguments, and the application handles the actual execution.
Execution Flow
User: "Tell me the weather in Seoul"
|
LLM: tool_call(get_weather, city="Seoul")
|
App: execute get_weather("Seoul") -> {"temp": 5, "condition": "clear"}
|
LLM: "The current weather in Seoul is 5 degrees C and clear."
OpenAI Function Calling
Basic Usage
from openai import OpenAI
import json
client = OpenAI()
# Tool definitions
tools = [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Retrieves current weather information for a specific city",
"parameters": {
"type": "object",
"properties": {
"city": {
"type": "string",
"description": "City name (e.g., Seoul, Tokyo, New York)"
},
"unit": {
"type": "string",
"enum": ["celsius", "fahrenheit"],
"description": "Temperature unit"
}
},
"required": ["city"]
}
}
},
{
"type": "function",
"function": {
"name": "search_products",
"description": "Searches for products",
"parameters": {
"type": "object",
"properties": {
"query": {"type": "string", "description": "Search query"},
"category": {"type": "string", "enum": ["electronics", "clothing", "food"]},
"max_price": {"type": "number", "description": "Maximum price"},
"sort_by": {"type": "string", "enum": ["price", "rating", "newest"]}
},
"required": ["query"]
}
}
}
]
# First call
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Tell me Seoul's weather and search for umbrellas"}],
tools=tools,
tool_choice="auto"
)
message = response.choices[0].message
print(f"Tool calls: {len(message.tool_calls)}")
# Execute tools and pass results
messages = [
{"role": "user", "content": "Tell me Seoul's weather and search for umbrellas"},
message # assistant's tool_call message
]
# Add results for each tool_call
for tool_call in message.tool_calls:
func_name = tool_call.function.name
args = json.loads(tool_call.function.arguments)
if func_name == "get_weather":
result = {"temp": 5, "condition": "cloudy", "humidity": 65}
elif func_name == "search_products":
result = [
{"name": "Folding Umbrella", "price": 15000, "rating": 4.5},
{"name": "Automatic Umbrella", "price": 25000, "rating": 4.8}
]
else:
result = {"error": "Unknown function"}
messages.append({
"role": "tool",
"tool_call_id": tool_call.id,
"content": json.dumps(result, ensure_ascii=False)
})
# Generate final response
final_response = client.chat.completions.create(
model="gpt-4o",
messages=messages,
tools=tools
)
print(final_response.choices[0].message.content)
Parallel Tool Calls
GPT-4o can call multiple functions simultaneously:
# "Tell me Seoul's weather and search for umbrellas" -> 2 tool_calls returned at once
# tool_calls = [
# {"function": {"name": "get_weather", "arguments": '{"city": "Seoul"}'}},
# {"function": {"name": "search_products", "arguments": '{"query": "umbrella"}'}}
# ]
import asyncio
async def execute_tool_calls(tool_calls: list) -> list:
"""Execute tool calls in parallel."""
async def execute_one(tc):
func_name = tc.function.name
args = json.loads(tc.function.arguments)
# In practice, these would be async API calls
if func_name == "get_weather":
return await get_weather_async(**args)
elif func_name == "search_products":
return await search_products_async(**args)
results = await asyncio.gather(*[execute_one(tc) for tc in tool_calls])
return results
Anthropic Tool Use
Anthropic (Claude) Tool Use uses a slightly different API format:
import anthropic
client = anthropic.Anthropic()
tools = [
{
"name": "get_weather",
"description": "Retrieves current weather information for a specific city",
"input_schema": {
"type": "object",
"properties": {
"city": {"type": "string", "description": "City name"},
"unit": {"type": "string", "enum": ["celsius", "fahrenheit"]}
},
"required": ["city"]
}
}
]
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
tools=tools,
messages=[{"role": "user", "content": "How's the weather in Seoul?"}]
)
# When stop_reason is "tool_use"
if response.stop_reason == "tool_use":
tool_use_block = next(
block for block in response.content
if block.type == "tool_use"
)
# Execute the tool
result = get_weather(city=tool_use_block.input["city"])
# Pass the result
final_response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
tools=tools,
messages=[
{"role": "user", "content": "How's the weather in Seoul?"},
{"role": "assistant", "content": response.content},
{
"role": "user",
"content": [{
"type": "tool_result",
"tool_use_id": tool_use_block.id,
"content": json.dumps(result, ensure_ascii=False)
}]
}
]
)
Error Handling Patterns
Robust Function Execution Loop
import json
from typing import Callable
class ToolExecutor:
def __init__(self):
self.tools: dict[str, Callable] = {}
self.max_retries = 3
def register(self, name: str, func: Callable):
self.tools[name] = func
async def execute(self, tool_call) -> dict:
func_name = tool_call.function.name
try:
args = json.loads(tool_call.function.arguments)
except json.JSONDecodeError:
return {"error": f"Invalid JSON arguments: {tool_call.function.arguments}"}
if func_name not in self.tools:
return {"error": f"Unknown function: {func_name}"}
for attempt in range(self.max_retries):
try:
result = await self.tools[func_name](**args)
return {"success": True, "data": result}
except TypeError as e:
return {"error": f"Invalid arguments: {str(e)}"}
except Exception as e:
if attempt == self.max_retries - 1:
return {"error": f"Failed after {self.max_retries} retries: {str(e)}"}
await asyncio.sleep(2 ** attempt)
async def run_conversation(self, client, messages, tools_spec):
"""Run the conversation loop."""
while True:
response = client.chat.completions.create(
model="gpt-4o",
messages=messages,
tools=tools_spec
)
choice = response.choices[0]
if choice.finish_reason == "stop":
return choice.message.content
if choice.finish_reason == "tool_calls":
messages.append(choice.message)
for tc in choice.message.tool_calls:
result = await self.execute(tc)
messages.append({
"role": "tool",
"tool_call_id": tc.id,
"content": json.dumps(result, ensure_ascii=False)
})
Structured Output and Function Calling
from pydantic import BaseModel, Field
class WeatherResponse(BaseModel):
city: str = Field(description="City name")
temperature: float = Field(description="Current temperature")
condition: str = Field(description="Weather condition")
recommendation: str = Field(description="Clothing recommendation")
# Convert Pydantic model to JSON Schema
def model_to_tool(model_class, name: str, description: str) -> dict:
return {
"type": "function",
"function": {
"name": name,
"description": description,
"parameters": model_class.model_json_schema()
}
}
weather_tool = model_to_tool(
WeatherResponse,
"format_weather",
"Returns weather information in a structured format"
)
Function Calling with Open-Source Models
Ollama + Tool Use
import ollama
response = ollama.chat(
model='llama3.1',
messages=[{'role': 'user', 'content': 'Tell me the weather in Seoul'}],
tools=[{
'type': 'function',
'function': {
'name': 'get_weather',
'description': 'Retrieve weather information',
'parameters': {
'type': 'object',
'properties': {
'city': {'type': 'string', 'description': 'City name'}
},
'required': ['city']
}
}
}]
)
if response['message'].get('tool_calls'):
for tool_call in response['message']['tool_calls']:
print(f"Function: {tool_call['function']['name']}")
print(f"Args: {tool_call['function']['arguments']}")
Serving Tool Use with vLLM
# Start vLLM server
# vllm serve meta-llama/Llama-3.1-8B-Instruct \
# --enable-auto-tool-choice \
# --tool-call-parser hermes
from openai import OpenAI
client = OpenAI(base_url="http://localhost:8000/v1", api_key="dummy")
response = client.chat.completions.create(
model="meta-llama/Llama-3.1-8B-Instruct",
messages=[{"role": "user", "content": "Tell me the weather in Seoul"}],
tools=tools,
tool_choice="auto"
)
Production Design Patterns
Tool Selection Control
# Force a specific function call
response = client.chat.completions.create(
model="gpt-4o",
messages=messages,
tools=tools,
tool_choice={"type": "function", "function": {"name": "get_weather"}}
)
# Disable function calling
response = client.chat.completions.create(
model="gpt-4o",
messages=messages,
tools=tools,
tool_choice="none"
)
Permission-Based Tool Filtering
class ToolRegistry:
def __init__(self):
self.tools = {}
self.permissions = {}
def register(self, name: str, func: Callable, spec: dict, required_role: str = "user"):
self.tools[name] = func
self.permissions[name] = required_role
def get_tools_for_role(self, role: str) -> list:
role_hierarchy = {"admin": 3, "operator": 2, "user": 1}
user_level = role_hierarchy.get(role, 0)
return [
{"type": "function", "function": spec}
for name, spec in self.tools.items()
if role_hierarchy.get(self.permissions[name], 0) <= user_level
]
Token Optimization
def optimize_tool_result(result: dict, max_chars: int = 2000) -> str:
"""Convert tool results in a token-efficient manner."""
result_str = json.dumps(result, ensure_ascii=False)
if len(result_str) <= max_chars:
return result_str
# Summarize large results
if isinstance(result, list) and len(result) > 10:
return json.dumps({
"total_count": len(result),
"showing": "first 10",
"items": result[:10],
"note": f"Showing top 10 out of {len(result)} total"
}, ensure_ascii=False)
return result_str[:max_chars] + "... (truncated)"
Benchmark: Function Calling Performance Comparison
| Model | Single Call Accuracy | Parallel Call Accuracy | Argument Parsing Accuracy |
|---|---|---|---|
| GPT-4o | 97.2% | 94.5% | 98.1% |
| Claude 3.5 Sonnet | 96.8% | 93.2% | 97.5% |
| Llama 3.1 70B | 91.5% | 85.3% | 93.2% |
| Llama 3.1 8B | 84.2% | 72.1% | 88.7% |
Review Quiz (6 Questions)
Q1. What is the role of the LLM in Function Calling?
The LLM decides which function to call with which arguments. The actual execution is handled by the application.
Q2. What is the biggest difference between OpenAI and Anthropic Function Calling APIs?
OpenAI uses tool_calls/tool messages, while Anthropic uses tool_use/tool_result types within content blocks.
Q3. When are Parallel Tool Calls useful?
When requesting multiple independent tasks simultaneously (e.g., weather lookup + product search), reducing latency.
Q4. What are the three values for the tool_choice parameter and their meanings?
auto (LLM decides), none (disable function calling), specific function (forced invocation)
Q5. What configuration is needed for Function Calling with open-source models?
Enable --enable-auto-tool-choice and --tool-call-parser options in vLLM, or use the tools parameter in Ollama.
Q6. What is the token optimization strategy when tool results are too long?
Truncate the results, or in the case of lists, return only the top N items and include the total count as metadata.
Quiz
Q1: What is the main topic covered in "Complete Guide to LLM Function Calling: From Tool Use
Patterns to Production Design"?
A deep dive into the Function Calling (Tool Use) mechanism of LLMs. Covers implementation methods across OpenAI, Anthropic, and open-source models, along with error handling, parallel invocation, and production design patterns.
Q2: What is Function Calling??
Function Calling (Tool Use) is a mechanism that enables LLMs to invoke external functions and
APIs. The LLM does not execute code itself — instead, it decides which function to call with which
arguments, and the application handles the actual execution. Execution Flow
Q3: Explain the core concept of OpenAI Function Calling.
Basic Usage Parallel Tool Calls GPT-4o can call multiple functions simultaneously:
Q4: What are the key aspects of Anthropic Tool Use?
Anthropic (Claude) Tool Use uses a slightly different API format:
Q5: How does Error Handling Patterns work?
Robust Function Execution Loop