AI Agent 개발 완전 가이드 2025: Tool Calling, ReAct, Multi-Agent, MCP까지

도입
1. AI Agent란 무엇인가?
2. Tool Calling / Function Calling 심화
3. Tool Calling 성능 최적화 (핵심 섹션)
4. ReAct 패턴 구현
5. Agent 프레임워크 비교
6. Multi-Agent 아키텍처
7. MCP (Model Context Protocol)
8. Memory 시스템
9. 프로덕션 배포
10. Quiz
11. 참고 자료

도입

2025년, AI 개발의 패러다임이 급격히 변화하고 있습니다. 단순히 텍스트를 생성하는 챗봇에서 외부 도구를 호출하고, 계획을 세우고, 다른 에이전트와 협력하는 자율적 AI Agent로의 전환이 본격화되었습니다.

OpenAI의 Function Calling, Anthropic의 Tool Use, Google의 Function Calling이 모두 성숙한 단계에 이르렀고, MCP(Model Context Protocol)라는 표준 도구 프로토콜까지 등장했습니다. LangGraph, CrewAI, AutoGen 같은 멀티 에이전트 프레임워크는 프로덕션 수준의 안정성을 확보했습니다.

이 가이드에서는 AI Agent 개발에 필요한 모든 것을 다룹니다. Tool Calling의 원리부터 ReAct 패턴, 멀티 에이전트 아키텍처, MCP, 프로덕션 배포 전략까지 실전 코드와 함께 체계적으로 학습할 수 있습니다.

1. AI Agent란 무엇인가?

1.1 Agent의 정의

AI Agent는 LLM + Tools + Memory + Planning의 4가지 핵심 요소로 구성된 자율적 시스템입니다. 단순히 텍스트를 생성하는 것을 넘어, 외부 세계와 상호작용하며 목표를 달성합니다.

┌─────────────────────────────────────────┐
│              AI Agent                    │
│                                          │
│  ┌──────────┐  ┌──────────┐             │
│  │   LLM    │  │ Planning │             │
│  │ (Brain)  │──│ (Goals)  │             │
│  └────┬─────┘  └──────────┘             │
│       │                                  │
│  ┌────┴─────┐  ┌──────────┐             │
│  │  Tools   │  │  Memory  │             │
│  │ (Actions)│  │ (Context)│             │
│  └──────────┘  └──────────┘             │
└─────────────────────────────────────────┘

4가지 핵심 요소:

요소	역할	예시
LLM	추론, 의사결정	GPT-4o, Claude 3.5, Gemini 2.0
Tools	외부 세계와 상호작용	API 호출, DB 쿼리, 파일 읽기
Memory	컨텍스트 유지	대화 기록, 벡터 DB, 요약
Planning	목표 달성 계획 수립	작업 분해, 우선순위 결정

1.2 Agent vs Chatbot vs RAG Pipeline

┌─────────────────────────────────────────────────────┐
│  Level 0: Chatbot                                    │
│  Input → LLM → Text Output                          │
│  (단순 텍스트 생성)                                    │
├─────────────────────────────────────────────────────┤
│  Level 1: RAG Pipeline                               │
│  Input → Retrieve docs → LLM → Text Output           │
│  (검색 증강 생성)                                      │
├─────────────────────────────────────────────────────┤
│  Level 2: Tool-Calling Agent                         │
│  Input → LLM → Tool Call → Result → LLM → Output    │
│  (외부 도구 사용)                                      │
├─────────────────────────────────────────────────────┤
│  Level 3: Planning Agent                             │
│  Input → Plan → [Tool Call → Observe]* → Output      │
│  (계획 수립 + 반복 실행)                                │
├─────────────────────────────────────────────────────┤
│  Level 4: Multi-Agent System                         │
│  Input → Coordinator → Agent1 + Agent2 → Output      │
│  (여러 전문 에이전트 협력)                               │
└─────────────────────────────────────────────────────┘

1.3 Agent Capabilities Pyramid

에이전트의 능력은 피라미드 형태로 쌓입니다:

텍스트 생성: 기본 LLM 능력 (질문 응답, 요약, 번역)
도구 사용: 외부 API/함수 호출 (검색, 계산, 데이터 조회)
계획 수립: 복잡한 작업을 단계별로 분해
다중 단계 실행: Thought-Action-Observation 루프 반복
멀티 에이전트 협력: 전문 에이전트 간 역할 분담과 피드백

2. Tool Calling / Function Calling 심화

2.1 Tool Calling 작동 원리

Tool Calling은 LLM이 외부 함수를 호출할 수 있게 해주는 메커니즘입니다. 핵심 흐름은 다음과 같습니다:

사용자: "서울 날씨 알려줘"
    ↓
LLM: "get_weather 함수를 호출해야겠다"
    ↓ (구조화된 JSON 출력)
시스템: get_weather(location="Seoul") 실행
    ↓ (결과 반환)
LLM: "서울의 현재 기온은 18도이고 맑습니다"

중요 포인트: LLM은 함수를 직접 실행하지 않습니다. LLM은 "어떤 함수를 어떤 인자로 호출해야 하는지"를 JSON 형태로 출력하고, 실제 실행은 애플리케이션 코드에서 수행합니다.

2.2 OpenAI Function Calling 형식

import openai

client = openai.OpenAI()

# 도구 정의
tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get the current weather for a given location",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "City name, e.g. Seoul, Tokyo"
                    },
                    "unit": {
                        "type": "string",
                        "enum": ["celsius", "fahrenheit"],
                        "description": "Temperature unit"
                    }
                },
                "required": ["location"]
            }
        }
    }
]

# Tool Calling 요청
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "user", "content": "서울 날씨 알려줘"}
    ],
    tools=tools,
    tool_choice="auto"  # auto, none, required, 또는 특정 함수 강제
)

# 응답에서 tool call 추출
tool_call = response.choices[0].message.tool_calls[0]
print(tool_call.function.name)       # "get_weather"
print(tool_call.function.arguments)  # '{"location": "Seoul", "unit": "celsius"}'

# 함수 실행 후 결과를 다시 LLM에 전달
messages = [
    {"role": "user", "content": "서울 날씨 알려줘"},
    response.choices[0].message,
    {
        "role": "tool",
        "tool_call_id": tool_call.id,
        "content": '{"temperature": 18, "condition": "sunny", "humidity": 45}'
    }
]

final_response = client.chat.completions.create(
    model="gpt-4o",
    messages=messages,
    tools=tools
)
print(final_response.choices[0].message.content)
# "서울의 현재 기온은 18°C이며, 맑은 날씨입니다. 습도는 45%입니다."

2.3 Anthropic Tool Use 형식

import anthropic

client = anthropic.Anthropic()

# Anthropic 도구 정의
tools = [
    {
        "name": "get_weather",
        "description": "Get the current weather for a given location",
        "input_schema": {
            "type": "object",
            "properties": {
                "location": {
                    "type": "string",
                    "description": "City name, e.g. Seoul, Tokyo"
                },
                "unit": {
                    "type": "string",
                    "enum": ["celsius", "fahrenheit"],
                    "description": "Temperature unit"
                }
            },
            "required": ["location"]
        }
    }
]

# Tool Use 요청
response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    tools=tools,
    messages=[
        {"role": "user", "content": "서울 날씨 알려줘"}
    ]
)

# 응답 처리
for block in response.content:
    if block.type == "tool_use":
        print(f"Tool: {block.name}")         # "get_weather"
        print(f"Input: {block.input}")       # {"location": "Seoul"}
        print(f"Tool ID: {block.id}")

        # 도구 실행 후 결과 전달
        result_response = client.messages.create(
            model="claude-sonnet-4-20250514",
            max_tokens=1024,
            tools=tools,
            messages=[
                {"role": "user", "content": "서울 날씨 알려줘"},
                {"role": "assistant", "content": response.content},
                {
                    "role": "user",
                    "content": [
                        {
                            "type": "tool_result",
                            "tool_use_id": block.id,
                            "content": '{"temperature": 18, "condition": "sunny"}'
                        }
                    ]
                }
            ]
        )

2.4 Google Gemini Function Calling

import google.generativeai as genai

# Gemini 함수 선언
get_weather_func = genai.protos.FunctionDeclaration(
    name="get_weather",
    description="Get the current weather for a location",
    parameters=genai.protos.Schema(
        type=genai.protos.Type.OBJECT,
        properties={
            "location": genai.protos.Schema(
                type=genai.protos.Type.STRING,
                description="City name"
            ),
        },
        required=["location"]
    )
)

tool = genai.protos.Tool(function_declarations=[get_weather_func])

model = genai.GenerativeModel(
    model_name="gemini-2.0-flash",
    tools=[tool]
)

chat = model.start_chat()
response = chat.send_message("서울 날씨 알려줘")

# function call 처리
for part in response.parts:
    if fn := part.function_call:
        print(f"Function: {fn.name}")
        print(f"Args: {dict(fn.args)}")

2.5 Tool Schema 정의 (JSON Schema)

효과적인 Tool Schema 작성은 Tool Calling 성능에 직접적인 영향을 미칩니다:

{
  "name": "search_products",
  "description": "Search for products in the e-commerce catalog. Returns matching products with price, rating, and availability. Use this when the user wants to find or browse products.",
  "parameters": {
    "type": "object",
    "properties": {
      "query": {
        "type": "string",
        "description": "Search query string. Can include product names, categories, or features. Example: 'wireless bluetooth headphones'"
      },
      "category": {
        "type": "string",
        "enum": ["electronics", "clothing", "home", "sports", "books"],
        "description": "Product category to filter results"
      },
      "min_price": {
        "type": "number",
        "description": "Minimum price in USD"
      },
      "max_price": {
        "type": "number",
        "description": "Maximum price in USD"
      },
      "sort_by": {
        "type": "string",
        "enum": ["relevance", "price_asc", "price_desc", "rating", "newest"],
        "description": "Sort order for results. Default: relevance"
      },
      "limit": {
        "type": "integer",
        "description": "Maximum number of results to return (1-50). Default: 10"
      }
    },
    "required": ["query"]
  }
}

2.6 Parallel Tool Calls

여러 도구를 동시에 호출하는 Parallel Tool Calling:

# OpenAI에서 parallel tool calls 처리
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "user", "content": "서울과 도쿄의 날씨를 비교해줘"}
    ],
    tools=tools,
    parallel_tool_calls=True  # 기본값: True
)

# 여러 tool call이 반환됨
tool_calls = response.choices[0].message.tool_calls
# tool_calls[0]: get_weather(location="Seoul")
# tool_calls[1]: get_weather(location="Tokyo")

# 모든 결과를 한번에 전달
messages = [
    {"role": "user", "content": "서울과 도쿄의 날씨를 비교해줘"},
    response.choices[0].message,
]

for tc in tool_calls:
    result = execute_function(tc.function.name, tc.function.arguments)
    messages.append({
        "role": "tool",
        "tool_call_id": tc.id,
        "content": json.dumps(result)
    })

final = client.chat.completions.create(
    model="gpt-4o",
    messages=messages,
    tools=tools
)

2.7 Forced Tool Use vs Auto

# Auto: LLM이 도구 사용 여부를 판단
tool_choice = "auto"

# Required: 반드시 하나 이상의 도구를 사용해야 함
tool_choice = "required"

# None: 도구 사용 불가 (텍스트만 생성)
tool_choice = "none"

# 특정 함수 강제 호출
tool_choice = {"type": "function", "function": {"name": "get_weather"}}

2.8 Tool Calling with Streaming

# 스트리밍에서 tool call 처리
stream = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "서울 날씨 알려줘"}],
    tools=tools,
    stream=True
)

tool_calls_buffer = {}
for chunk in stream:
    delta = chunk.choices[0].delta
    if delta.tool_calls:
        for tc in delta.tool_calls:
            idx = tc.index
            if idx not in tool_calls_buffer:
                tool_calls_buffer[idx] = {
                    "id": tc.id,
                    "function": {"name": "", "arguments": ""}
                }
            if tc.function.name:
                tool_calls_buffer[idx]["function"]["name"] += tc.function.name
            if tc.function.arguments:
                tool_calls_buffer[idx]["function"]["arguments"] += tc.function.arguments

# 스트리밍 완료 후 tool call 실행
for idx, tc in tool_calls_buffer.items():
    result = execute_function(tc["function"]["name"], tc["function"]["arguments"])
    print(f"Tool: {tc['function']['name']}, Result: {result}")

3. Tool Calling 성능 최적화 (핵심 섹션)

3.1 Tool Description Engineering

Tool Description은 Tool Calling 성능의 80%를 결정합니다. 좋은 description 작성 원칙:

# BAD: 너무 모호한 설명
{
    "name": "search",
    "description": "Search for things"
}

# GOOD: 명확하고 구체적인 설명
{
    "name": "search_knowledge_base",
    "description": "Search the internal knowledge base for technical documentation and troubleshooting guides. Returns relevant articles ranked by relevance score. Use this tool when the user asks about product features, technical specifications, or needs help resolving technical issues. Do NOT use this for general web search or current events."
}

# BEST: 설명 + 사용 시나리오 + 주의사항
{
    "name": "create_calendar_event",
    "description": "Create a new calendar event. Required fields: title, start_time. Optional: end_time (defaults to 1 hour after start), attendees (email list), location, description. Use this when the user wants to schedule a meeting or event. Returns the created event ID and a confirmation link. Note: Times must be in ISO 8601 format (YYYY-MM-DDTHH:MM:SS). If the user specifies a relative time like 'tomorrow at 3pm', convert it to the absolute format first."
}

3.2 Parameter Description Quality

# BAD: 파라미터 설명 없음
"properties": {
    "date": {"type": "string"}
}

# GOOD: 형식, 예시, 기본값 포함
"properties": {
    "date": {
        "type": "string",
        "description": "Date in YYYY-MM-DD format. Example: '2025-03-15'. Defaults to today if not specified."
    }
}

3.3 Tool 수 줄이기

# BAD: 관련 기능이 분산된 10개 도구
tools = [
    "get_user_name", "get_user_email", "get_user_phone",
    "get_user_address", "get_user_preferences",
    "update_user_name", "update_user_email", "update_user_phone",
    "update_user_address", "update_user_preferences"
]

# GOOD: 2개의 통합 도구
tools = [
    {
        "name": "get_user_info",
        "description": "Get user information. Specify which fields to retrieve.",
        "parameters": {
            "properties": {
                "user_id": {"type": "string"},
                "fields": {
                    "type": "array",
                    "items": {"type": "string", "enum": ["name", "email", "phone", "address", "preferences"]},
                    "description": "List of fields to retrieve. If empty, returns all fields."
                }
            }
        }
    },
    {
        "name": "update_user_info",
        "description": "Update user information.",
        "parameters": {
            "properties": {
                "user_id": {"type": "string"},
                "updates": {
                    "type": "object",
                    "description": "Key-value pairs of fields to update"
                }
            }
        }
    }
]

3.4 Few-shot Examples in System Prompt

system_prompt = """You are a helpful assistant with access to tools.

Here are examples of how to use tools correctly:

User: "What's the weather like in Paris?"
Tool call: get_weather(location="Paris", unit="celsius")

User: "Find cheap flights from Seoul to Tokyo next Friday"
Tool call: search_flights(origin="ICN", destination="NRT", date="2025-03-28", sort_by="price_asc")

User: "How are you today?"
No tool call needed - just respond conversationally.
"""

3.5 Schema 단순화

# BAD: 깊은 중첩 구조
{
    "parameters": {
        "type": "object",
        "properties": {
            "filter": {
                "type": "object",
                "properties": {
                    "conditions": {
                        "type": "array",
                        "items": {
                            "type": "object",
                            "properties": {
                                "field": {"type": "string"},
                                "operator": {"type": "string"},
                                "value": {"type": "object"}  # 또 다른 중첩
                            }
                        }
                    }
                }
            }
        }
    }
}

# GOOD: 평면화된 구조
{
    "parameters": {
        "type": "object",
        "properties": {
            "filter_field": {"type": "string", "description": "Field to filter on"},
            "filter_operator": {"type": "string", "enum": ["eq", "gt", "lt", "contains"]},
            "filter_value": {"type": "string", "description": "Value to filter by (as string)"}
        }
    }
}

3.6 Error Handling과 Retry Logic

import json
import time
from tenacity import retry, stop_after_attempt, wait_exponential

class ToolExecutor:
    def __init__(self):
        self.tool_registry = {}

    def register(self, name, func):
        self.tool_registry[name] = func

    @retry(stop=stop_after_attempt(3), wait=wait_exponential(min=1, max=10))
    def execute(self, tool_name, arguments_str):
        """도구 실행 with 에러 핸들링 및 재시도"""
        try:
            args = json.loads(arguments_str)
        except json.JSONDecodeError as e:
            return {"error": f"Invalid JSON arguments: {str(e)}"}

        func = self.tool_registry.get(tool_name)
        if not func:
            return {"error": f"Unknown tool: {tool_name}"}

        try:
            result = func(**args)
            return {"success": True, "data": result}
        except TypeError as e:
            return {"error": f"Invalid parameters: {str(e)}"}
        except Exception as e:
            return {"error": f"Tool execution failed: {str(e)}"}

    def execute_with_fallback(self, tool_name, arguments_str, fallback_message):
        """실패 시 fallback 메시지 반환"""
        result = self.execute(tool_name, arguments_str)
        if "error" in result:
            return {"data": fallback_message, "was_fallback": True}
        return result

3.7 Caching Tool Results

import hashlib
from functools import lru_cache
from datetime import datetime, timedelta

class ToolCache:
    def __init__(self, ttl_seconds=300):
        self.cache = {}
        self.ttl = timedelta(seconds=ttl_seconds)

    def _make_key(self, tool_name, arguments):
        raw = f"{tool_name}:{json.dumps(arguments, sort_keys=True)}"
        return hashlib.sha256(raw.encode()).hexdigest()

    def get(self, tool_name, arguments):
        key = self._make_key(tool_name, arguments)
        if key in self.cache:
            entry = self.cache[key]
            if datetime.now() - entry["timestamp"] < self.ttl:
                return entry["result"]
            del self.cache[key]
        return None

    def set(self, tool_name, arguments, result):
        key = self._make_key(tool_name, arguments)
        self.cache[key] = {
            "result": result,
            "timestamp": datetime.now()
        }

# 사용 예시
cache = ToolCache(ttl_seconds=60)

def execute_with_cache(tool_name, arguments):
    cached = cache.get(tool_name, arguments)
    if cached:
        return cached
    result = execute_tool(tool_name, arguments)
    cache.set(tool_name, arguments, result)
    return result

3.8 Latency Optimization

import asyncio
import aiohttp

async def parallel_tool_execution(tool_calls):
    """여러 도구를 비동기 병렬 실행"""
    async def execute_one(tc):
        tool_name = tc["function"]["name"]
        args = json.loads(tc["function"]["arguments"])
        return {
            "tool_call_id": tc["id"],
            "result": await async_execute_tool(tool_name, args)
        }

    results = await asyncio.gather(
        *[execute_one(tc) for tc in tool_calls],
        return_exceptions=True
    )
    return results

# 스트리밍 + Tool Calling 조합
async def stream_with_tools(messages, tools):
    """스트리밍 응답 중 tool call이 감지되면 즉시 실행"""
    stream = await client.chat.completions.create(
        model="gpt-4o",
        messages=messages,
        tools=tools,
        stream=True
    )

    text_buffer = ""
    tool_calls = []

    async for chunk in stream:
        delta = chunk.choices[0].delta
        if delta.content:
            text_buffer += delta.content
            yield {"type": "text", "content": delta.content}
        if delta.tool_calls:
            # tool call 청크 수집
            collect_tool_call_chunks(tool_calls, delta.tool_calls)

    # 스트리밍 완료 후 tool call 실행
    if tool_calls:
        results = await parallel_tool_execution(tool_calls)
        yield {"type": "tool_results", "results": results}

3.9 Fine-tuning for Tool Calling

# Unsloth를 사용한 Tool Calling 파인튜닝
from unsloth import FastLanguageModel

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="unsloth/Llama-3.1-8B-Instruct",
    max_seq_length=4096,
    load_in_4bit=True,
)

# Tool Calling 데이터셋 형식
training_data = [
    {
        "messages": [
            {
                "role": "system",
                "content": "You have access to the following tools: ..."
            },
            {
                "role": "user",
                "content": "What is the stock price of AAPL?"
            },
            {
                "role": "assistant",
                "content": None,
                "tool_calls": [
                    {
                        "type": "function",
                        "function": {
                            "name": "get_stock_price",
                            "arguments": "{\"symbol\": \"AAPL\"}"
                        }
                    }
                ]
            }
        ]
    }
]

# LoRA 파인튜닝
model = FastLanguageModel.get_peft_model(
    model,
    r=16,
    target_modules=["q_proj", "k_proj", "v_proj", "o_proj"],
    lora_alpha=16,
    lora_dropout=0,
)

4. ReAct 패턴 구현

4.1 Thought-Action-Observation 루프

ReAct(Reasoning + Acting)는 LLM이 사고(Thought) - 행동(Action) - 관찰(Observation) 루프를 반복하여 문제를 해결하는 패턴입니다.

Question: 2025년 서울의 인구가 도쿄보다 많은가?

Thought 1: 서울과 도쿄의 현재 인구를 검색해야 한다.
Action 1: search("Seoul population 2025")
Observation 1: Seoul metropolitan area: ~9.7 million

Thought 2: 도쿄의 인구도 검색해야 한다.
Action 2: search("Tokyo population 2025")
Observation 2: Tokyo metropolitan area: ~13.9 million

Thought 3: 서울(9.7M) < 도쿄(13.9M)이므로, 서울 인구가 도쿄보다 적다.
Action 3: finish("아닙니다. 서울(약 970만)이 도쿄(약 1390만)보다 적습니다.")

4.2 Python 구현

import re
import json

class ReActAgent:
    def __init__(self, llm_client, tools, max_iterations=10):
        self.llm = llm_client
        self.tools = {t["name"]: t for t in tools}
        self.tool_functions = {}
        self.max_iterations = max_iterations

    def register_function(self, name, func):
        self.tool_functions[name] = func

    def _build_system_prompt(self):
        tool_descriptions = "\n".join([
            f"- {t['name']}: {t['description']}"
            for t in self.tools.values()
        ])

        return f"""You are a helpful assistant that solves problems step by step.

Available tools:
{tool_descriptions}

For each step, you MUST output in this exact format:
Thought: [your reasoning about what to do next]
Action: [tool_name(param1="value1", param2="value2")]

After receiving an observation, continue with the next Thought.
When you have the final answer, use:
Thought: [your final reasoning]
Action: finish(answer="[your final answer]")
"""

    def _parse_action(self, text):
        """Action 문자열에서 함수명과 인자 추출"""
        match = re.search(r'Action:\s*(\w+)\((.*?)\)', text, re.DOTALL)
        if not match:
            return None, None

        func_name = match.group(1)
        args_str = match.group(2)

        # 간단한 인자 파싱
        args = {}
        for arg_match in re.finditer(r'(\w+)="([^"]*)"', args_str):
            args[arg_match.group(1)] = arg_match.group(2)

        return func_name, args

    def run(self, user_query):
        """ReAct 루프 실행"""
        messages = [
            {"role": "system", "content": self._build_system_prompt()},
            {"role": "user", "content": user_query}
        ]

        for i in range(self.max_iterations):
            # LLM에게 다음 Thought + Action 요청
            response = self.llm.chat.completions.create(
                model="gpt-4o",
                messages=messages,
                temperature=0
            )

            assistant_message = response.choices[0].message.content
            messages.append({"role": "assistant", "content": assistant_message})

            print(f"\n--- Step {i+1} ---")
            print(assistant_message)

            # Action 파싱
            func_name, args = self._parse_action(assistant_message)

            if func_name == "finish":
                return args.get("answer", "No answer provided")

            if func_name and func_name in self.tool_functions:
                # 도구 실행
                try:
                    result = self.tool_functions[func_name](**args)
                    observation = f"Observation: {result}"
                except Exception as e:
                    observation = f"Observation: Error - {str(e)}"
            else:
                observation = f"Observation: Unknown tool '{func_name}'"

            print(observation)
            messages.append({"role": "user", "content": observation})

        return "Max iterations reached without a final answer."

# 사용 예시
agent = ReActAgent(llm_client=openai.OpenAI(), tools=[...])
agent.register_function("search", lambda query: web_search(query))
agent.register_function("calculate", lambda expression: eval(expression))

answer = agent.run("인도의 GDP가 일본을 추월한 해는 언제인가?")

4.3 ReAct vs Simple Tool Calling

상황	Simple Tool Calling	ReAct 패턴
단순 조회 (날씨, 주가)	적합	과도함
단일 계산	적합	과도함
다단계 추론	부적합	적합
중간 결과 활용	부적합	적합
복잡한 리서치	부적합	적합
조건부 분기	어려움	적합

4.4 한계와 해결책

한계:

루프가 무한히 계속될 수 있음 (max iterations로 제한)
Thought가 잘못된 방향으로 갈 수 있음 (self-correction 필요)
느린 응답 (매 단계마다 LLM 호출)

해결책:

최대 반복 횟수 설정
Observation에 에러 메시지 포함
캐싱과 병렬 처리로 속도 개선
더 나은 접근: LangGraph의 그래프 기반 제어 흐름

5. Agent 프레임워크 비교

5.1 LangChain / LangGraph

LangGraph는 그래프 기반 에이전트 워크플로우를 구축하는 프레임워크입니다:

from langgraph.graph import StateGraph, MessagesState, START, END
from langgraph.prebuilt import ToolNode, tools_condition
from langchain_openai import ChatOpenAI
from langchain_core.tools import tool

@tool
def search_web(query: str) -> str:
    """Search the web for information."""
    return f"Search results for: {query}"

@tool
def calculate(expression: str) -> str:
    """Calculate a mathematical expression."""
    return str(eval(expression))

# 모델 설정
llm = ChatOpenAI(model="gpt-4o").bind_tools([search_web, calculate])

def call_model(state: MessagesState):
    response = llm.invoke(state["messages"])
    return {"messages": [response]}

# 그래프 구성
graph = StateGraph(MessagesState)
graph.add_node("agent", call_model)
graph.add_node("tools", ToolNode(tools=[search_web, calculate]))

graph.add_edge(START, "agent")
graph.add_conditional_edges("agent", tools_condition)
graph.add_edge("tools", "agent")

app = graph.compile()

# 실행
result = app.invoke({
    "messages": [("user", "What is 15% of the population of France?")]
})

5.2 CrewAI (역할 기반 멀티 에이전트)

from crewai import Agent, Task, Crew, Process

# 에이전트 정의
researcher = Agent(
    role="Senior Research Analyst",
    goal="Uncover cutting-edge developments in AI",
    backstory="You are an expert research analyst at a leading tech think tank.",
    verbose=True,
    allow_delegation=False,
    tools=[search_tool, scrape_tool]
)

writer = Agent(
    role="Tech Content Writer",
    goal="Write engaging blog posts about AI developments",
    backstory="You are a renowned content strategist known for insightful articles.",
    verbose=True,
    allow_delegation=False
)

editor = Agent(
    role="Chief Editor",
    goal="Ensure high quality and accuracy of the final content",
    backstory="You are a meticulous editor with decades of publishing experience.",
    verbose=True
)

# 태스크 정의
research_task = Task(
    description="Research the latest AI agent frameworks released in 2025.",
    expected_output="A comprehensive report on latest AI agent frameworks.",
    agent=researcher
)

writing_task = Task(
    description="Write a blog post based on the research findings.",
    expected_output="A polished blog post of 1000+ words.",
    agent=writer
)

editing_task = Task(
    description="Review and edit the blog post for quality and accuracy.",
    expected_output="A final, publication-ready blog post.",
    agent=editor
)

# Crew 구성 및 실행
crew = Crew(
    agents=[researcher, writer, editor],
    tasks=[research_task, writing_task, editing_task],
    process=Process.sequential,
    verbose=True
)

result = crew.kickoff()

5.3 AutoGen (Microsoft, 대화 기반)

from autogen import AssistantAgent, UserProxyAgent

# 어시스턴트 에이전트
assistant = AssistantAgent(
    name="assistant",
    llm_config={
        "model": "gpt-4o",
        "temperature": 0,
    },
    system_message="You are a helpful AI assistant. Solve tasks step by step."
)

# 사용자 프록시 (코드 실행 환경)
user_proxy = UserProxyAgent(
    name="user_proxy",
    human_input_mode="NEVER",
    max_consecutive_auto_reply=10,
    code_execution_config={
        "work_dir": "workspace",
        "use_docker": False,
    }
)

# 대화 시작
user_proxy.initiate_chat(
    assistant,
    message="Analyze the top 10 AI papers from arXiv this week and create a summary."
)

5.4 Smolagents (HuggingFace)

from smolagents import CodeAgent, ToolCallingAgent, HfApiModel, tool

@tool
def get_weather(location: str) -> str:
    """Get weather for a location.

    Args:
        location: The city name to get weather for.
    """
    return f"Weather in {location}: 22C, sunny"

# Code Agent: Python 코드를 생성하고 실행
agent = CodeAgent(
    tools=[get_weather],
    model=HfApiModel("Qwen/Qwen2.5-72B-Instruct"),
)

result = agent.run("What's the weather in Seoul and Tokyo?")

5.5 Claude Agent SDK (Anthropic)

import anthropic

client = anthropic.Anthropic()

# Claude의 에이전트 루프
tools = [
    {
        "name": "execute_python",
        "description": "Execute Python code and return the output",
        "input_schema": {
            "type": "object",
            "properties": {
                "code": {"type": "string", "description": "Python code to execute"}
            },
            "required": ["code"]
        }
    }
]

def agent_loop(user_message):
    messages = [{"role": "user", "content": user_message}]

    while True:
        response = client.messages.create(
            model="claude-sonnet-4-20250514",
            max_tokens=4096,
            tools=tools,
            messages=messages
        )

        # 텍스트 응답 수집
        if response.stop_reason == "end_turn":
            return extract_text(response.content)

        # Tool use 처리
        if response.stop_reason == "tool_use":
            messages.append({"role": "assistant", "content": response.content})
            tool_results = []
            for block in response.content:
                if block.type == "tool_use":
                    result = execute_tool(block.name, block.input)
                    tool_results.append({
                        "type": "tool_result",
                        "tool_use_id": block.id,
                        "content": str(result)
                    })
            messages.append({"role": "user", "content": tool_results})

5.6 프레임워크 비교표

특성	LangGraph	CrewAI	AutoGen	Smolagents	Claude SDK
개발사	LangChain	CrewAI	Microsoft	HuggingFace	Anthropic
패러다임	그래프 기반	역할 기반	대화 기반	코드 생성	루프 기반
멀티 에이전트	강력	핵심 기능	핵심 기능	기본	수동 구현
학습 곡선	높음	낮음	중간	낮음	낮음
커스터마이징	매우 높음	중간	높음	중간	매우 높음
프로덕션 안정성	높음	중간	중간	초기 단계	높음
상태 관리	내장 (체크포인트)	기본	기본	없음	수동
스트리밍	지원	미지원	미지원	미지원	지원
Human-in-the-loop	내장	미지원	지원	미지원	수동
오픈소스	Yes	Yes	Yes	Yes	SDK만
LLM 지원	다양함	다양함	다양함	HF 모델 중심	Claude 전용

6. Multi-Agent 아키텍처

6.1 Supervisor 패턴

하나의 조정자(Supervisor)가 전문 에이전트들을 관리합니다:

from langgraph.graph import StateGraph, MessagesState, START, END

def supervisor_node(state: MessagesState):
    """작업을 분석하고 적절한 에이전트에게 위임"""
    system_msg = """You are a supervisor managing a team of specialists:
    - researcher: for finding information
    - coder: for writing and reviewing code
    - writer: for creating content

    Based on the user request, decide which agent should handle it.
    Respond with the agent name: 'researcher', 'coder', or 'writer'.
    If the task is complete, respond with 'FINISH'."""

    response = llm.invoke([
        {"role": "system", "content": system_msg},
        *state["messages"]
    ])
    return {"next": response.content.strip()}

def researcher_node(state: MessagesState):
    response = researcher_llm.invoke(state["messages"])
    return {"messages": [{"role": "assistant", "content": f"[Researcher] {response.content}"}]}

def coder_node(state: MessagesState):
    response = coder_llm.invoke(state["messages"])
    return {"messages": [{"role": "assistant", "content": f"[Coder] {response.content}"}]}

# 그래프 구성
graph = StateGraph(MessagesState)
graph.add_node("supervisor", supervisor_node)
graph.add_node("researcher", researcher_node)
graph.add_node("coder", coder_node)
graph.add_node("writer", writer_node)

graph.add_edge(START, "supervisor")
graph.add_conditional_edges("supervisor", lambda s: s["next"], {
    "researcher": "researcher",
    "coder": "coder",
    "writer": "writer",
    "FINISH": END
})
graph.add_edge("researcher", "supervisor")
graph.add_edge("coder", "supervisor")
graph.add_edge("writer", "supervisor")

6.2 Peer-to-Peer 패턴

에이전트들이 직접 소통합니다:

┌──────────┐    message    ┌──────────┐
│ Agent A  │──────────────▶│ Agent B  │
│ (Researcher)             │ (Analyst)│
│          │◀──────────────│          │
└──────────┘    response   └──────────┘
      │                          │
      │         message          │
      └──────────┬───────────────┘
                 ▼
           ┌──────────┐
           │ Agent C  │
           │ (Writer) │
           └──────────┘

6.3 Hierarchical 패턴

         ┌────────────┐
         │  Director   │
         │  Agent      │
         └──────┬─────┘
        ┌───────┴───────┐
   ┌────▼────┐    ┌─────▼────┐
   │ Team     │    │ Team     │
   │ Lead A   │    │ Lead B   │
   └────┬─────┘    └────┬─────┘
   ┌────┴────┐    ┌─────┴────┐
┌──▼──┐ ┌──▼──┐ ┌──▼──┐ ┌──▼──┐
│Wkr 1│ │Wkr 2│ │Wkr 3│ │Wkr 4│
└─────┘ └─────┘ └─────┘ └─────┘

6.4 CrewAI 실전 예제

from crewai import Agent, Task, Crew, Process
from crewai_tools import SerperDevTool, ScrapeWebsiteTool

# 도구 설정
search_tool = SerperDevTool()
scrape_tool = ScrapeWebsiteTool()

# 리서치 에이전트
research_agent = Agent(
    role="AI Technology Researcher",
    goal="Find the most recent and relevant information about AI agent frameworks",
    backstory="""You are a senior AI researcher specializing in LLM applications.
    You excel at finding credible sources and synthesizing complex technical information.""",
    tools=[search_tool, scrape_tool],
    verbose=True
)

# 분석 에이전트
analysis_agent = Agent(
    role="Technology Analyst",
    goal="Analyze and compare different AI agent frameworks objectively",
    backstory="""You are a technology analyst with deep expertise in software architecture.
    You provide balanced, data-driven comparisons.""",
    verbose=True
)

# 작성 에이전트
writing_agent = Agent(
    role="Technical Content Writer",
    goal="Create an engaging, well-structured technical blog post",
    backstory="""You are an award-winning technical writer who makes complex topics
    accessible to developers of all levels.""",
    verbose=True
)

# 태스크 체인
research_task = Task(
    description="""Research the top 5 AI agent frameworks in 2025.
    For each framework, find: key features, pros/cons, community size, recent updates.""",
    expected_output="Detailed research report with sources",
    agent=research_agent
)

analysis_task = Task(
    description="Create a comparative analysis of the frameworks based on the research.",
    expected_output="Structured comparison with scoring matrix",
    agent=analysis_agent,
    context=[research_task]
)

writing_task = Task(
    description="Write a 2000-word blog post based on the research and analysis.",
    expected_output="Publication-ready blog post in markdown format",
    agent=writing_agent,
    context=[research_task, analysis_task]
)

# 실행
crew = Crew(
    agents=[research_agent, analysis_agent, writing_agent],
    tasks=[research_task, analysis_task, writing_task],
    process=Process.sequential,
    verbose=True
)

result = crew.kickoff()
print(result)

6.5 LangGraph Multi-Agent Workflow

from langgraph.graph import StateGraph, START, END
from typing import TypedDict, Literal, Annotated
from langgraph.graph.message import add_messages

class AgentState(TypedDict):
    messages: Annotated[list, add_messages]
    current_agent: str
    research_complete: bool
    draft_complete: bool

def route_to_agent(state: AgentState) -> Literal["researcher", "writer", "reviewer", "end"]:
    if not state.get("research_complete"):
        return "researcher"
    elif not state.get("draft_complete"):
        return "writer"
    else:
        return "reviewer"

def researcher(state: AgentState):
    # 리서치 수행
    result = research_llm.invoke(state["messages"])
    return {
        "messages": [result],
        "current_agent": "researcher",
        "research_complete": True
    }

def writer(state: AgentState):
    # 글 작성
    result = writer_llm.invoke(state["messages"])
    return {
        "messages": [result],
        "current_agent": "writer",
        "draft_complete": True
    }

def reviewer(state: AgentState):
    # 리뷰 수행
    result = reviewer_llm.invoke(state["messages"])
    return {
        "messages": [result],
        "current_agent": "reviewer"
    }

graph = StateGraph(AgentState)
graph.add_node("researcher", researcher)
graph.add_node("writer", writer)
graph.add_node("reviewer", reviewer)

graph.add_conditional_edges(START, route_to_agent)
graph.add_conditional_edges("researcher", route_to_agent)
graph.add_conditional_edges("writer", route_to_agent)
graph.add_edge("reviewer", END)

app = graph.compile()

7. MCP (Model Context Protocol)

7.1 MCP란?

MCP(Model Context Protocol)는 Anthropic이 제안한 에이전트와 외부 도구 간의 표준 통신 프로토콜입니다. USB가 다양한 기기를 표준화된 인터페이스로 연결하듯, MCP는 AI Agent가 다양한 외부 도구와 표준화된 방식으로 소통할 수 있게 합니다.

┌─────────────┐     MCP Protocol     ┌─────────────────┐
│  AI Agent    │◀───────────────────▶│  MCP Server      │
│  (Client)    │    JSON-RPC 2.0     │  (Tool Provider) │
│              │                      │                  │
│  - Claude    │    ┌──────────┐     │  - Filesystem    │
│  - GPT-4     │    │ Resources│     │  - Database      │
│  - Custom    │    │ Tools    │     │  - GitHub        │
│              │    │ Prompts  │     │  - Slack         │
└─────────────┘    └──────────┘     └─────────────────┘

7.2 왜 MCP인가?

기존 방식의 문제점:

각 LLM Provider마다 다른 Tool Calling 형식
새로운 도구 추가 시 마다 커스텀 통합 코드 작성 필요
도구 간 상호운용성 없음
컨텍스트(파일, DB 내용 등) 접근 방식도 제각각

MCP의 해결:

표준화된 도구 인터페이스 (어떤 LLM이든 동일한 MCP 서버 사용)
풍부한 생태계 (한번 만들면 모든 클라이언트에서 사용)
Resources + Tools + Prompts를 하나의 프로토콜로 통합

7.3 MCP Server 구축 (Python)

from mcp.server.fastmcp import FastMCP

# MCP 서버 생성
mcp = FastMCP("My Tool Server")

@mcp.tool()
def search_database(query: str, table: str = "products") -> str:
    """Search the database for records matching the query.

    Args:
        query: Search query string
        table: Database table to search (default: products)
    """
    # 실제 DB 검색 로직
    results = db.search(table, query)
    return json.dumps(results)

@mcp.tool()
def send_email(to: str, subject: str, body: str) -> str:
    """Send an email to the specified recipient.

    Args:
        to: Recipient email address
        subject: Email subject line
        body: Email body content
    """
    # 이메일 발송 로직
    email_service.send(to=to, subject=subject, body=body)
    return f"Email sent successfully to {to}"

@mcp.resource("config://app")
def get_app_config() -> str:
    """Get the current application configuration."""
    return json.dumps(app_config)

@mcp.prompt()
def analyze_data(dataset_name: str) -> str:
    """Generate a prompt for data analysis."""
    return f"Please analyze the '{dataset_name}' dataset. Focus on trends, outliers, and actionable insights."

# 서버 실행
if __name__ == "__main__":
    mcp.run(transport="stdio")

7.4 MCP Server 구축 (TypeScript)

import { McpServer } from "@modelcontextprotocol/sdk/server/mcp.js";
import { StdioServerTransport } from "@modelcontextprotocol/sdk/server/stdio.js";
import { z } from "zod";

const server = new McpServer({
  name: "my-tool-server",
  version: "1.0.0"
});

// 도구 등록
server.tool(
  "search_files",
  {
    query: z.string().describe("Search query"),
    path: z.string().optional().describe("Directory path to search in")
  },
  async ({ query, path }) => {
    const results = await searchFiles(query, path || ".");
    return {
      content: [{ type: "text", text: JSON.stringify(results) }]
    };
  }
);

// 리소스 등록
server.resource(
  "file",
  "file://{path}",
  async (uri) => {
    const content = await fs.readFile(uri.pathname, "utf-8");
    return {
      contents: [{ uri: uri.href, text: content, mimeType: "text/plain" }]
    };
  }
);

// 서버 시작
const transport = new StdioServerTransport();
await server.connect(transport);

7.5 사용 가능한 MCP 서버

MCP 서버	기능	주요 도구
filesystem	파일 시스템 접근	읽기, 쓰기, 검색, 디렉토리 탐색
postgres	PostgreSQL DB	쿼리, 스키마 조회
sqlite	SQLite DB	쿼리, 테이블 관리
github	GitHub 연동	PR, Issue, 코드 검색
slack	Slack 연동	메시지 전송, 채널 관리
brave-search	웹 검색	Brave 검색 API
puppeteer	웹 브라우저 제어	스크린샷, 네비게이션
memory	영속 메모리	지식 그래프 저장/검색
google-drive	Google Drive	파일 검색, 읽기
notion	Notion 연동	페이지, 데이터베이스

7.6 MCP vs 커스텀 Tool 구현

측면	MCP	커스텀 구현
표준화	표준 프로토콜	자체 정의
재사용성	높음 (생태계 공유)	낮음 (프로젝트별)
초기 설정	약간의 보일러플레이트	자유로움
LLM 호환성	모든 MCP 클라이언트	특정 Provider에 종속될 수 있음
보안	내장 샌드박싱	직접 구현 필요
디버깅	표준 도구 사용	자체 로깅 필요

8. Memory 시스템

8.1 Short-term Memory (대화 컨텍스트)

class ConversationMemory:
    def __init__(self, max_messages=50):
        self.messages = []
        self.max_messages = max_messages

    def add(self, role, content):
        self.messages.append({"role": role, "content": content})
        if len(self.messages) > self.max_messages:
            # 오래된 메시지 제거 (시스템 메시지는 유지)
            non_system = [m for m in self.messages if m["role"] != "system"]
            system = [m for m in self.messages if m["role"] == "system"]
            self.messages = system + non_system[-self.max_messages:]

    def get_messages(self):
        return self.messages.copy()

8.2 Long-term Memory (벡터 DB)

from langchain_community.vectorstores import Chroma
from langchain_openai import OpenAIEmbeddings

class LongTermMemory:
    def __init__(self, collection_name="agent_memory"):
        self.embeddings = OpenAIEmbeddings()
        self.vectorstore = Chroma(
            collection_name=collection_name,
            embedding_function=self.embeddings,
            persist_directory="./memory_db"
        )

    def store(self, text, metadata=None):
        """중요한 정보를 장기 메모리에 저장"""
        self.vectorstore.add_texts(
            texts=[text],
            metadatas=[metadata or {}]
        )

    def recall(self, query, k=5):
        """관련 기억 검색"""
        results = self.vectorstore.similarity_search(query, k=k)
        return [doc.page_content for doc in results]

    def summarize_and_store(self, conversation, llm):
        """대화를 요약하여 장기 메모리에 저장"""
        summary = llm.invoke(
            f"Summarize the key facts from this conversation:\n{conversation}"
        )
        self.store(summary.content, metadata={"type": "conversation_summary"})

8.3 Episodic Memory (과거 상호작용)

class EpisodicMemory:
    def __init__(self):
        self.episodes = []

    def record_episode(self, task, actions, outcome, learnings):
        """과거 작업 경험을 기록"""
        self.episodes.append({
            "task": task,
            "actions": actions,
            "outcome": outcome,
            "learnings": learnings,
            "timestamp": datetime.now().isoformat()
        })

    def find_similar_episodes(self, current_task, k=3):
        """현재 작업과 유사한 과거 경험 검색"""
        # 임베딩 기반 유사성 검색
        similar = self.vectorstore.similarity_search(current_task, k=k)
        return similar

8.4 Working Memory (Scratchpad)

class WorkingMemory:
    def __init__(self):
        self.scratchpad = {}
        self.current_plan = []
        self.intermediate_results = []

    def set(self, key, value):
        """작업 중 임시 데이터 저장"""
        self.scratchpad[key] = value

    def get(self, key):
        return self.scratchpad.get(key)

    def add_plan_step(self, step):
        self.current_plan.append(step)

    def mark_step_complete(self, step_index):
        if step_index < len(self.current_plan):
            self.current_plan[step_index]["status"] = "complete"

    def get_context_string(self):
        """현재 작업 메모리를 LLM 프롬프트에 포함할 문자열로 변환"""
        parts = []
        if self.current_plan:
            parts.append("Current Plan:")
            for i, step in enumerate(self.current_plan):
                status = step.get("status", "pending")
                parts.append(f"  {i+1}. [{status}] {step['description']}")
        if self.intermediate_results:
            parts.append("\nIntermediate Results:")
            for result in self.intermediate_results[-5:]:
                parts.append(f"  - {result}")
        return "\n".join(parts)

9. 프로덕션 배포

9.1 안전성 (Safety)

class SafetyGuard:
    def __init__(self):
        self.allowed_tools = set()
        self.rate_limits = {}
        self.blocked_patterns = []

    def whitelist_tool(self, tool_name, max_calls_per_minute=10):
        """허용된 도구 등록"""
        self.allowed_tools.add(tool_name)
        self.rate_limits[tool_name] = {
            "max_per_minute": max_calls_per_minute,
            "calls": []
        }

    def check_tool_call(self, tool_name, arguments):
        """도구 호출 전 안전성 검사"""
        # 1. 화이트리스트 확인
        if tool_name not in self.allowed_tools:
            raise SecurityError(f"Tool '{tool_name}' is not whitelisted")

        # 2. Rate limiting
        now = datetime.now()
        calls = self.rate_limits[tool_name]["calls"]
        calls = [c for c in calls if (now - c).seconds < 60]
        if len(calls) >= self.rate_limits[tool_name]["max_per_minute"]:
            raise RateLimitError(f"Rate limit exceeded for '{tool_name}'")
        self.rate_limits[tool_name]["calls"] = calls + [now]

        # 3. 위험한 패턴 검사
        args_str = json.dumps(arguments)
        for pattern in self.blocked_patterns:
            if re.search(pattern, args_str):
                raise SecurityError(f"Blocked pattern detected in arguments")

        return True

    def human_in_the_loop(self, tool_name, arguments):
        """위험 도구 호출 시 사람의 승인 요청"""
        dangerous_tools = {"delete_file", "send_email", "execute_code"}
        if tool_name in dangerous_tools:
            print(f"\n[APPROVAL REQUIRED] Tool: {tool_name}")
            print(f"Arguments: {json.dumps(arguments, indent=2)}")
            approval = input("Approve? (yes/no): ")
            return approval.lower() == "yes"
        return True

9.2 관측성 (Observability)

# LangSmith 연동
import os
os.environ["LANGCHAIN_TRACING_V2"] = "true"
os.environ["LANGCHAIN_API_KEY"] = "your-api-key"
os.environ["LANGCHAIN_PROJECT"] = "my-agent-project"

# Langfuse 연동
from langfuse import Langfuse
from langfuse.decorators import observe

langfuse = Langfuse()

@observe()
def agent_step(messages, tools):
    """관측 가능한 에이전트 스텝"""
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=messages,
        tools=tools
    )
    return response

# 커스텀 메트릭 로깅
class AgentMetrics:
    def __init__(self):
        self.tool_calls = []
        self.latencies = []
        self.errors = []
        self.token_usage = []

    def log_tool_call(self, tool_name, duration_ms, success, tokens_used):
        self.tool_calls.append({
            "tool": tool_name,
            "duration_ms": duration_ms,
            "success": success,
            "timestamp": datetime.now().isoformat()
        })
        self.latencies.append(duration_ms)
        self.token_usage.append(tokens_used)
        if not success:
            self.errors.append(tool_name)

    def get_summary(self):
        return {
            "total_calls": len(self.tool_calls),
            "avg_latency_ms": sum(self.latencies) / len(self.latencies) if self.latencies else 0,
            "error_rate": len(self.errors) / len(self.tool_calls) if self.tool_calls else 0,
            "total_tokens": sum(self.token_usage)
        }

9.3 비용 관리

class CostManager:
    # 모델별 가격 (USD per 1M tokens, 2025년 기준)
    PRICING = {
        "gpt-4o": {"input": 2.50, "output": 10.00},
        "gpt-4o-mini": {"input": 0.15, "output": 0.60},
        "claude-3.5-sonnet": {"input": 3.00, "output": 15.00},
        "claude-3.5-haiku": {"input": 0.80, "output": 4.00},
    }

    def __init__(self, budget_limit_usd=100.0):
        self.total_cost = 0.0
        self.budget_limit = budget_limit_usd
        self.usage_log = []

    def track_usage(self, model, input_tokens, output_tokens):
        pricing = self.PRICING.get(model, {"input": 0, "output": 0})
        cost = (input_tokens * pricing["input"] + output_tokens * pricing["output"]) / 1_000_000
        self.total_cost += cost
        self.usage_log.append({
            "model": model,
            "input_tokens": input_tokens,
            "output_tokens": output_tokens,
            "cost": cost
        })

        if self.total_cost > self.budget_limit:
            raise BudgetExceededError(
                f"Budget limit exceeded: ${self.total_cost:.2f} > ${self.budget_limit:.2f}"
            )

    def optimize_model_selection(self, task_complexity):
        """작업 복잡도에 따라 모델 자동 선택"""
        if task_complexity == "simple":
            return "gpt-4o-mini"  # 저렴한 모델
        elif task_complexity == "moderate":
            return "claude-3.5-haiku"
        else:
            return "gpt-4o"  # 고성능 모델

9.4 에러 복구 패턴

class AgentErrorRecovery:
    def __init__(self, max_retries=3):
        self.max_retries = max_retries
        self.error_handlers = {}

    def register_handler(self, error_type, handler):
        self.error_handlers[error_type] = handler

    async def execute_with_recovery(self, agent_step, context):
        """에이전트 스텝 실행 with 에러 복구"""
        last_error = None

        for attempt in range(self.max_retries):
            try:
                return await agent_step(context)
            except ToolTimeoutError as e:
                last_error = e
                # 타임아웃: 재시도 또는 대안 도구 사용
                context["timeout_multiplier"] = 2 ** attempt
            except ToolNotFoundError as e:
                last_error = e
                # 도구 없음: fallback 도구 사용
                context["use_fallback"] = True
            except InvalidArgumentsError as e:
                last_error = e
                # 잘못된 인자: LLM에게 에러 피드백
                context["error_feedback"] = str(e)
            except RateLimitError as e:
                last_error = e
                # Rate limit: 대기 후 재시도
                await asyncio.sleep(2 ** attempt)

        raise AgentFailedError(f"Agent failed after {self.max_retries} attempts: {last_error}")

10. Quiz

Q1. AI Agent의 4가지 핵심 구성 요소를 모두 나열하시오.

정답: LLM(추론 엔진), Tools(외부 도구), Memory(컨텍스트 유지), Planning(계획 수립)

LLM은 에이전트의 두뇌 역할을 하며, Tools로 외부 세계와 상호작용하고, Memory로 맥락을 유지하며, Planning으로 복잡한 작업을 단계별로 분해합니다.

Q2. Tool Calling에서 LLM이 함수를 직접 실행하는가?

정답: 아닙니다. LLM은 함수를 직접 실행하지 않습니다.

LLM은 "어떤 함수를 어떤 인자로 호출해야 하는지"를 JSON 형태로 출력하고, 실제 함수 실행은 애플리케이션 코드(클라이언트 측)에서 수행합니다. 실행 결과를 다시 LLM에 전달하여 최종 응답을 생성합니다.

Q3. ReAct 패턴의 3단계 루프를 설명하시오.

정답: Thought(사고) - Action(행동) - Observation(관찰)

Thought: LLM이 현재 상황을 분석하고 다음 행동을 추론
Action: 추론 결과에 따라 도구를 호출하거나 최종 답변 생성
Observation: 도구 실행 결과를 확인하고 다시 Thought로 돌아감

이 루프를 최종 답변이 나올 때까지 반복합니다.

Q4. MCP(Model Context Protocol)의 3가지 주요 구성 요소는?

정답: Resources, Tools, Prompts

Resources: 에이전트가 접근할 수 있는 데이터 소스 (파일, DB 등)
Tools: 에이전트가 실행할 수 있는 함수/액션
Prompts: 미리 정의된 프롬프트 템플릿

이 세 가지를 하나의 표준 프로토콜(JSON-RPC 2.0)로 통합합니다.

Q5. Tool Calling 성능 최적화에서 가장 중요한 요소는?

정답: Tool Description Engineering (도구 설명 최적화)

Tool Description의 품질이 Tool Calling 성능의 80%를 결정합니다. 명확하고 구체적인 설명, 사용 시나리오, 예시를 포함해야 합니다. 그 외에도 파라미터 설명 품질, 도구 수 최소화, 스키마 단순화, Few-shot 예제 등이 중요합니다.

11. 참고 자료

OpenAI Function Calling Documentation - OpenAI API Reference
Anthropic Tool Use Guide - Anthropic Developer Documentation
Google Gemini Function Calling - Google AI Developer Documentation
Model Context Protocol (MCP) Specification - Anthropic, 2024
ReAct: Synergizing Reasoning and Acting in Language Models - Yao et al., 2023
LangGraph Documentation - LangChain
CrewAI Documentation - CrewAI
AutoGen: Enabling Next-Gen LLM Applications - Microsoft Research
Smolagents Documentation - HuggingFace
Gorilla: Large Language Model Connected with Massive APIs - UC Berkeley, 2023
ToolLLM: Facilitating LLMs to Master 16000+ Real-world APIs - Qin et al., 2024
A Survey on Large Language Model based Autonomous Agents - Wang et al., 2024
Voyager: An Open-Ended Embodied Agent with Large Language Models - Wang et al., 2023
BFCL (Berkeley Function Calling Leaderboard) - UC Berkeley Gorilla Project