LLM Function Calling 완벽 가이드: Tool Use 패턴부터 프로덕션 설계까지

Function Calling이란?
- 동작 흐름
OpenAI Function Calling
- 기본 사용법
- 병렬 함수 호출 (Parallel Tool Calls)
Anthropic Tool Use
에러 처리 패턴
- 강건한 함수 실행 루프
Structured Output과 Function Calling
오픈소스 모델에서의 Function Calling
- Ollama + Tool Use
- vLLM으로 Tool Use 서빙
프로덕션 설계 패턴
벤치마크: Function Calling 성능 비교

Function Calling이란?

Function Calling(Tool Use)은 LLM이 외부 함수/API를 호출할 수 있게 하는 메커니즘입니다. LLM 자체가 코드를 실행하는 것이 아니라, 어떤 함수를 어떤 인자로 호출해야 하는지 결정하고, 실제 실행은 애플리케이션이 담당합니다.

동작 흐름

사용자: "서울 날씨 알려줘"
    ↓
LLM: tool_call(get_weather, city="서울")
    ↓
앱: get_weather("서울") 실행 → {"temp": 5, "condition": "맑음"}
    ↓
LLM: "서울의 현재 날씨는 기온 5°C, 맑음입니다."

OpenAI Function Calling

기본 사용법

from openai import OpenAI
import json

client = OpenAI()

# 도구 정의
tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "특정 도시의 현재 날씨 정보를 가져옵니다",
            "parameters": {
                "type": "object",
                "properties": {
                    "city": {
                        "type": "string",
                        "description": "도시 이름 (예: 서울, Tokyo, New York)"
                    },
                    "unit": {
                        "type": "string",
                        "enum": ["celsius", "fahrenheit"],
                        "description": "온도 단위"
                    }
                },
                "required": ["city"]
            }
        }
    },
    {
        "type": "function",
        "function": {
            "name": "search_products",
            "description": "상품을 검색합니다",
            "parameters": {
                "type": "object",
                "properties": {
                    "query": {"type": "string", "description": "검색어"},
                    "category": {"type": "string", "enum": ["electronics", "clothing", "food"]},
                    "max_price": {"type": "number", "description": "최대 가격 (원)"},
                    "sort_by": {"type": "string", "enum": ["price", "rating", "newest"]}
                },
                "required": ["query"]
            }
        }
    }
]

# 첫 번째 호출
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "서울 날씨 알려주고, 우산도 검색해줘"}],
    tools=tools,
    tool_choice="auto"
)

message = response.choices[0].message
print(f"Tool calls: {len(message.tool_calls)}")

# 도구 실행 및 결과 전달
messages = [
    {"role": "user", "content": "서울 날씨 알려주고, 우산도 검색해줘"},
    message  # assistant의 tool_call 메시지
]

# 각 tool_call에 대한 결과 추가
for tool_call in message.tool_calls:
    func_name = tool_call.function.name
    args = json.loads(tool_call.function.arguments)

    if func_name == "get_weather":
        result = {"temp": 5, "condition": "흐림", "humidity": 65}
    elif func_name == "search_products":
        result = [
            {"name": "접이식 우산", "price": 15000, "rating": 4.5},
            {"name": "자동 우산", "price": 25000, "rating": 4.8}
        ]
    else:
        result = {"error": "Unknown function"}

    messages.append({
        "role": "tool",
        "tool_call_id": tool_call.id,
        "content": json.dumps(result, ensure_ascii=False)
    })

# 최종 응답 생성
final_response = client.chat.completions.create(
    model="gpt-4o",
    messages=messages,
    tools=tools
)

print(final_response.choices[0].message.content)

병렬 함수 호출 (Parallel Tool Calls)

GPT-4o는 여러 함수를 동시에 호출할 수 있습니다:

# "서울 날씨 알려주고, 우산도 검색해줘" → 2개 tool_call이 동시에 반환됨
# tool_calls = [
#   {"function": {"name": "get_weather", "arguments": '{"city": "서울"}'}},
#   {"function": {"name": "search_products", "arguments": '{"query": "우산"}'}}
# ]

import asyncio

async def execute_tool_calls(tool_calls: list) -> list:
    """병렬로 도구 호출을 실행합니다."""
    async def execute_one(tc):
        func_name = tc.function.name
        args = json.loads(tc.function.arguments)

        # 실제로는 API 호출 등 비동기 작업
        if func_name == "get_weather":
            return await get_weather_async(**args)
        elif func_name == "search_products":
            return await search_products_async(**args)

    results = await asyncio.gather(*[execute_one(tc) for tc in tool_calls])
    return results

Anthropic Tool Use

Anthropic(Claude)의 Tool Use는 약간 다른 API 형식을 사용합니다:

import anthropic

client = anthropic.Anthropic()

tools = [
    {
        "name": "get_weather",
        "description": "특정 도시의 현재 날씨 정보를 가져옵니다",
        "input_schema": {
            "type": "object",
            "properties": {
                "city": {"type": "string", "description": "도시 이름"},
                "unit": {"type": "string", "enum": ["celsius", "fahrenheit"]}
            },
            "required": ["city"]
        }
    }
]

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    tools=tools,
    messages=[{"role": "user", "content": "서울 날씨 어때?"}]
)

# stop_reason이 "tool_use"인 경우
if response.stop_reason == "tool_use":
    tool_use_block = next(
        block for block in response.content
        if block.type == "tool_use"
    )

    # 도구 실행
    result = get_weather(city=tool_use_block.input["city"])

    # 결과 전달
    final_response = client.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=1024,
        tools=tools,
        messages=[
            {"role": "user", "content": "서울 날씨 어때?"},
            {"role": "assistant", "content": response.content},
            {
                "role": "user",
                "content": [{
                    "type": "tool_result",
                    "tool_use_id": tool_use_block.id,
                    "content": json.dumps(result, ensure_ascii=False)
                }]
            }
        ]
    )

에러 처리 패턴

강건한 함수 실행 루프

import json
from typing import Callable

class ToolExecutor:
    def __init__(self):
        self.tools: dict[str, Callable] = {}
        self.max_retries = 3

    def register(self, name: str, func: Callable):
        self.tools[name] = func

    async def execute(self, tool_call) -> dict:
        func_name = tool_call.function.name
        try:
            args = json.loads(tool_call.function.arguments)
        except json.JSONDecodeError:
            return {"error": f"Invalid JSON arguments: {tool_call.function.arguments}"}

        if func_name not in self.tools:
            return {"error": f"Unknown function: {func_name}"}

        for attempt in range(self.max_retries):
            try:
                result = await self.tools[func_name](**args)
                return {"success": True, "data": result}
            except TypeError as e:
                return {"error": f"Invalid arguments: {str(e)}"}
            except Exception as e:
                if attempt == self.max_retries - 1:
                    return {"error": f"Failed after {self.max_retries} retries: {str(e)}"}
                await asyncio.sleep(2 ** attempt)

    async def run_conversation(self, client, messages, tools_spec):
        """대화 루프를 실행합니다."""
        while True:
            response = client.chat.completions.create(
                model="gpt-4o",
                messages=messages,
                tools=tools_spec
            )

            choice = response.choices[0]

            if choice.finish_reason == "stop":
                return choice.message.content

            if choice.finish_reason == "tool_calls":
                messages.append(choice.message)

                for tc in choice.message.tool_calls:
                    result = await self.execute(tc)
                    messages.append({
                        "role": "tool",
                        "tool_call_id": tc.id,
                        "content": json.dumps(result, ensure_ascii=False)
                    })

Structured Output과 Function Calling

from pydantic import BaseModel, Field

class WeatherResponse(BaseModel):
    city: str = Field(description="도시 이름")
    temperature: float = Field(description="현재 기온")
    condition: str = Field(description="날씨 상태")
    recommendation: str = Field(description="옷차림 추천")

# Pydantic 모델을 JSON Schema로 변환
def model_to_tool(model_class, name: str, description: str) -> dict:
    return {
        "type": "function",
        "function": {
            "name": name,
            "description": description,
            "parameters": model_class.model_json_schema()
        }
    }

weather_tool = model_to_tool(
    WeatherResponse,
    "format_weather",
    "날씨 정보를 구조화된 형식으로 반환합니다"
)

오픈소스 모델에서의 Function Calling

Ollama + Tool Use

import ollama

response = ollama.chat(
    model='llama3.1',
    messages=[{'role': 'user', 'content': '서울 날씨 알려줘'}],
    tools=[{
        'type': 'function',
        'function': {
            'name': 'get_weather',
            'description': '날씨 정보 조회',
            'parameters': {
                'type': 'object',
                'properties': {
                    'city': {'type': 'string', 'description': '도시 이름'}
                },
                'required': ['city']
            }
        }
    }]
)

if response['message'].get('tool_calls'):
    for tool_call in response['message']['tool_calls']:
        print(f"Function: {tool_call['function']['name']}")
        print(f"Args: {tool_call['function']['arguments']}")

vLLM으로 Tool Use 서빙

# vLLM 서버 실행
# vllm serve meta-llama/Llama-3.1-8B-Instruct \
#   --enable-auto-tool-choice \
#   --tool-call-parser hermes

from openai import OpenAI

client = OpenAI(base_url="http://localhost:8000/v1", api_key="dummy")

response = client.chat.completions.create(
    model="meta-llama/Llama-3.1-8B-Instruct",
    messages=[{"role": "user", "content": "서울 날씨 알려줘"}],
    tools=tools,
    tool_choice="auto"
)

프로덕션 설계 패턴

도구 선택 제어

# 특정 함수 강제 호출
response = client.chat.completions.create(
    model="gpt-4o",
    messages=messages,
    tools=tools,
    tool_choice={"type": "function", "function": {"name": "get_weather"}}
)

# 함수 호출 비활성화
response = client.chat.completions.create(
    model="gpt-4o",
    messages=messages,
    tools=tools,
    tool_choice="none"
)

권한 기반 도구 필터링

class ToolRegistry:
    def __init__(self):
        self.tools = {}
        self.permissions = {}

    def register(self, name: str, func: Callable, spec: dict, required_role: str = "user"):
        self.tools[name] = func
        self.permissions[name] = required_role

    def get_tools_for_role(self, role: str) -> list:
        role_hierarchy = {"admin": 3, "operator": 2, "user": 1}
        user_level = role_hierarchy.get(role, 0)

        return [
            {"type": "function", "function": spec}
            for name, spec in self.tools.items()
            if role_hierarchy.get(self.permissions[name], 0) <= user_level
        ]

토큰 최적화

def optimize_tool_result(result: dict, max_chars: int = 2000) -> str:
    """도구 결과를 토큰 효율적으로 변환합니다."""
    result_str = json.dumps(result, ensure_ascii=False)

    if len(result_str) <= max_chars:
        return result_str

    # 큰 결과는 요약
    if isinstance(result, list) and len(result) > 10:
        return json.dumps({
            "total_count": len(result),
            "showing": "first 10",
            "items": result[:10],
            "note": f"총 {len(result)}개 중 상위 10개만 표시"
        }, ensure_ascii=False)

    return result_str[:max_chars] + "... (truncated)"

벤치마크: Function Calling 성능 비교

모델	단일 호출 정확도	병렬 호출 정확도	인자 파싱 정확도
GPT-4o	97.2%	94.5%	98.1%
Claude 3.5 Sonnet	96.8%	93.2%	97.5%
Llama 3.1 70B	91.5%	85.3%	93.2%
Llama 3.1 8B	84.2%	72.1%	88.7%

📝 확인 퀴즈 (6문제)

Q1. Function Calling에서 LLM의 역할은?

LLM은 어떤 함수를 어떤 인자로 호출해야 하는지 결정합니다. 실제 실행은 애플리케이션이 담당합니다.

Q2. OpenAI와 Anthropic의 Function Calling API에서 가장 큰 차이점은?

OpenAI는 tool_calls/tool 메시지를 사용하고, Anthropic은 content 블록의 tool_use/tool_result 타입을 사용합니다.

Q3. 병렬 함수 호출(Parallel Tool Calls)이 유용한 경우는?

서로 독립적인 여러 작업을 동시에 요청할 때 (예: 날씨 조회 + 상품 검색) 지연 시간을 줄일 수 있습니다.

Q4. tool_choice 파라미터의 세 가지 값과 의미는?

auto(LLM이 판단), none(함수 호출 비활성화), 특정 함수 지정(강제 호출)

Q5. 오픈소스 모델에서 Function Calling을 지원하려면 어떤 설정이 필요한가요?

vLLM에서 --enable-auto-tool-choice와 --tool-call-parser 옵션을 활성화하거나, Ollama에서 tools 파라미터를 사용합니다.

Q6. 도구 결과가 너무 길 때 토큰 최적화 전략은?

결과를 잘라내거나(truncation), 리스트의 경우 상위 N개만 반환하고 전체 건수를 메타데이터로 포함합니다.