Skip to content
Published on

AI Agent Development Complete Guide 2025: Tool Calling, ReAct, Multi-Agent, and MCP

Authors

Introduction

In 2025, the paradigm of AI development is shifting dramatically. The transition from simple text-generating chatbots to autonomous AI Agents that invoke external tools, formulate plans, and collaborate with other agents is now in full swing.

OpenAI's Function Calling, Anthropic's Tool Use, and Google's Function Calling have all reached maturity. A standardized tool protocol called MCP (Model Context Protocol) has emerged. Multi-agent frameworks like LangGraph, CrewAI, and AutoGen have achieved production-level stability.

This guide covers everything you need to build AI Agents. From Tool Calling fundamentals to ReAct patterns, multi-agent architectures, MCP, and production deployment strategies, you will learn systematically with practical code examples.


1. What is an AI Agent?

1.1 Agent Definition

An AI Agent is an autonomous system composed of four core components: LLM + Tools + Memory + Planning. It goes beyond simply generating text to interact with the external world and achieve goals.

┌─────────────────────────────────────────┐
AI Agent│                                          │
│  ┌──────────┐  ┌──────────┐             │
│  │   LLM    │  │ Planning │             │
 (Brain)  │──│ (Goals)  │             │
│  └────┬─────┘  └──────────┘             │
│       │                                  │
│  ┌────┴─────┐  ┌──────────┐             │
│  │  Tools   │  │  Memory  │             │
 (Actions) (Context)│             │
│  └──────────┘  └──────────┘             │
└─────────────────────────────────────────┘

Four Core Components:

ComponentRoleExamples
LLMReasoning, decision-makingGPT-4o, Claude 3.5, Gemini 2.0
ToolsInteract with external worldAPI calls, DB queries, file reads
MemoryMaintain contextConversation history, vector DB, summaries
PlanningFormulate goal-achievement plansTask decomposition, prioritization

1.2 Agent vs Chatbot vs RAG Pipeline

┌─────────────────────────────────────────────────────┐
Level 0: ChatbotInputLLMText Output  (Simple text generation)├─────────────────────────────────────────────────────┤
Level 1: RAG PipelineInputRetrieve docs → LLMText Output  (Retrieval Augmented Generation)├─────────────────────────────────────────────────────┤
Level 2: Tool-Calling AgentInputLLMTool CallResultLLMOutput  (External tool usage)├─────────────────────────────────────────────────────┤
Level 3: Planning AgentInputPlan[Tool CallObserve]*Output  (Plan formulation + iterative execution)├─────────────────────────────────────────────────────┤
Level 4: Multi-Agent SystemInputCoordinatorAgent1 + Agent2Output  (Multiple specialized agents cooperating)└─────────────────────────────────────────────────────┘

1.3 Agent Capabilities Pyramid

Agent capabilities build in pyramid layers:

  1. Text generation: Basic LLM capability (Q&A, summarization, translation)
  2. Tool usage: External API/function calls (search, calculation, data retrieval)
  3. Planning: Breaking complex tasks into steps
  4. Multi-step execution: Repeating Thought-Action-Observation loops
  5. Multi-agent cooperation: Role division and feedback between specialized agents

2. Tool Calling / Function Calling Deep Dive

2.1 How Tool Calling Works

Tool Calling enables LLMs to invoke external functions. The core flow is:

User: "What's the weather in Seoul?"
LLM: "I need to call the get_weather function"
     (structured JSON output)
System: Execute get_weather(location="Seoul")
     (return result)
LLM: "The current temperature in Seoul is 18°C and sunny"

Key point: The LLM does not execute functions directly. It outputs which function to call with which arguments in JSON format, and the actual execution happens in your application code.

2.2 OpenAI Function Calling Format

import openai

client = openai.OpenAI()

# Tool definition
tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get the current weather for a given location",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "City name, e.g. Seoul, Tokyo"
                    },
                    "unit": {
                        "type": "string",
                        "enum": ["celsius", "fahrenheit"],
                        "description": "Temperature unit"
                    }
                },
                "required": ["location"]
            }
        }
    }
]

# Tool Calling request
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "user", "content": "What's the weather in Seoul?"}
    ],
    tools=tools,
    tool_choice="auto"  # auto, none, required, or force specific function
)

# Extract tool call from response
tool_call = response.choices[0].message.tool_calls[0]
print(tool_call.function.name)       # "get_weather"
print(tool_call.function.arguments)  # '{"location": "Seoul", "unit": "celsius"}'

# Execute function and pass result back to LLM
messages = [
    {"role": "user", "content": "What's the weather in Seoul?"},
    response.choices[0].message,
    {
        "role": "tool",
        "tool_call_id": tool_call.id,
        "content": '{"temperature": 18, "condition": "sunny", "humidity": 45}'
    }
]

final_response = client.chat.completions.create(
    model="gpt-4o",
    messages=messages,
    tools=tools
)
print(final_response.choices[0].message.content)

2.3 Anthropic Tool Use Format

import anthropic

client = anthropic.Anthropic()

# Anthropic tool definition
tools = [
    {
        "name": "get_weather",
        "description": "Get the current weather for a given location",
        "input_schema": {
            "type": "object",
            "properties": {
                "location": {
                    "type": "string",
                    "description": "City name, e.g. Seoul, Tokyo"
                },
                "unit": {
                    "type": "string",
                    "enum": ["celsius", "fahrenheit"],
                    "description": "Temperature unit"
                }
            },
            "required": ["location"]
        }
    }
]

# Tool Use request
response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    tools=tools,
    messages=[
        {"role": "user", "content": "What's the weather in Seoul?"}
    ]
)

# Process response
for block in response.content:
    if block.type == "tool_use":
        print(f"Tool: {block.name}")
        print(f"Input: {block.input}")
        print(f"Tool ID: {block.id}")

        # Pass tool result back
        result_response = client.messages.create(
            model="claude-sonnet-4-20250514",
            max_tokens=1024,
            tools=tools,
            messages=[
                {"role": "user", "content": "What's the weather in Seoul?"},
                {"role": "assistant", "content": response.content},
                {
                    "role": "user",
                    "content": [
                        {
                            "type": "tool_result",
                            "tool_use_id": block.id,
                            "content": '{"temperature": 18, "condition": "sunny"}'
                        }
                    ]
                }
            ]
        )

2.4 Google Gemini Function Calling

import google.generativeai as genai

# Gemini function declaration
get_weather_func = genai.protos.FunctionDeclaration(
    name="get_weather",
    description="Get the current weather for a location",
    parameters=genai.protos.Schema(
        type=genai.protos.Type.OBJECT,
        properties={
            "location": genai.protos.Schema(
                type=genai.protos.Type.STRING,
                description="City name"
            ),
        },
        required=["location"]
    )
)

tool = genai.protos.Tool(function_declarations=[get_weather_func])

model = genai.GenerativeModel(
    model_name="gemini-2.0-flash",
    tools=[tool]
)

chat = model.start_chat()
response = chat.send_message("What's the weather in Seoul?")

# Process function call
for part in response.parts:
    if fn := part.function_call:
        print(f"Function: {fn.name}")
        print(f"Args: {dict(fn.args)}")

2.5 Tool Schema Definition (JSON Schema)

Effective tool schema design directly impacts Tool Calling performance:

{
  "name": "search_products",
  "description": "Search for products in the e-commerce catalog. Returns matching products with price, rating, and availability. Use when the user wants to find or browse products.",
  "parameters": {
    "type": "object",
    "properties": {
      "query": {
        "type": "string",
        "description": "Search query string. Can include product names, categories, or features. Example: 'wireless bluetooth headphones'"
      },
      "category": {
        "type": "string",
        "enum": ["electronics", "clothing", "home", "sports", "books"],
        "description": "Product category to filter results"
      },
      "min_price": {
        "type": "number",
        "description": "Minimum price in USD"
      },
      "max_price": {
        "type": "number",
        "description": "Maximum price in USD"
      },
      "sort_by": {
        "type": "string",
        "enum": ["relevance", "price_asc", "price_desc", "rating", "newest"],
        "description": "Sort order for results. Default: relevance"
      },
      "limit": {
        "type": "integer",
        "description": "Maximum number of results (1-50). Default: 10"
      }
    },
    "required": ["query"]
  }
}

2.6 Parallel Tool Calls

Invoking multiple tools simultaneously with Parallel Tool Calling:

# Handling parallel tool calls in OpenAI
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "user", "content": "Compare the weather in Seoul and Tokyo"}
    ],
    tools=tools,
    parallel_tool_calls=True  # Default: True
)

# Multiple tool calls returned
tool_calls = response.choices[0].message.tool_calls
# tool_calls[0]: get_weather(location="Seoul")
# tool_calls[1]: get_weather(location="Tokyo")

# Pass all results at once
messages = [
    {"role": "user", "content": "Compare the weather in Seoul and Tokyo"},
    response.choices[0].message,
]

for tc in tool_calls:
    result = execute_function(tc.function.name, tc.function.arguments)
    messages.append({
        "role": "tool",
        "tool_call_id": tc.id,
        "content": json.dumps(result)
    })

final = client.chat.completions.create(
    model="gpt-4o",
    messages=messages,
    tools=tools
)

2.7 Forced Tool Use vs Auto

# Auto: LLM decides whether to use a tool
tool_choice = "auto"

# Required: Must use at least one tool
tool_choice = "required"

# None: No tool usage (text generation only)
tool_choice = "none"

# Force specific function call
tool_choice = {"type": "function", "function": {"name": "get_weather"}}

2.8 Tool Calling with Streaming

# Handling tool calls in streaming
stream = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "What's the weather in Seoul?"}],
    tools=tools,
    stream=True
)

tool_calls_buffer = {}
for chunk in stream:
    delta = chunk.choices[0].delta
    if delta.tool_calls:
        for tc in delta.tool_calls:
            idx = tc.index
            if idx not in tool_calls_buffer:
                tool_calls_buffer[idx] = {
                    "id": tc.id,
                    "function": {"name": "", "arguments": ""}
                }
            if tc.function.name:
                tool_calls_buffer[idx]["function"]["name"] += tc.function.name
            if tc.function.arguments:
                tool_calls_buffer[idx]["function"]["arguments"] += tc.function.arguments

# Execute tool calls after streaming completes
for idx, tc in tool_calls_buffer.items():
    result = execute_function(tc["function"]["name"], tc["function"]["arguments"])
    print(f"Tool: {tc['function']['name']}, Result: {result}")

3. Tool Calling Performance Optimization (Key Section)

3.1 Tool Description Engineering

Tool descriptions determine 80% of Tool Calling performance. Key principles for good descriptions:

# BAD: Too vague
{
    "name": "search",
    "description": "Search for things"
}

# GOOD: Clear and specific
{
    "name": "search_knowledge_base",
    "description": "Search the internal knowledge base for technical documentation and troubleshooting guides. Returns relevant articles ranked by relevance score. Use this tool when the user asks about product features, technical specifications, or needs help resolving technical issues. Do NOT use this for general web search or current events."
}

# BEST: Description + usage scenarios + caveats
{
    "name": "create_calendar_event",
    "description": "Create a new calendar event. Required fields: title, start_time. Optional: end_time (defaults to 1 hour after start), attendees (email list), location, description. Use this when the user wants to schedule a meeting or event. Returns the created event ID and a confirmation link. Note: Times must be in ISO 8601 format (YYYY-MM-DDTHH:MM:SS). If the user specifies a relative time like 'tomorrow at 3pm', convert it to the absolute format first."
}

3.2 Parameter Description Quality

# BAD: No parameter description
"properties": {
    "date": {"type": "string"}
}

# GOOD: Include format, examples, defaults
"properties": {
    "date": {
        "type": "string",
        "description": "Date in YYYY-MM-DD format. Example: '2025-03-15'. Defaults to today if not specified."
    }
}

3.3 Reducing Tool Count

# BAD: 10 tools with scattered related functionality
tools = [
    "get_user_name", "get_user_email", "get_user_phone",
    "get_user_address", "get_user_preferences",
    "update_user_name", "update_user_email", "update_user_phone",
    "update_user_address", "update_user_preferences"
]

# GOOD: 2 unified tools
tools = [
    {
        "name": "get_user_info",
        "description": "Get user information. Specify which fields to retrieve.",
        "parameters": {
            "properties": {
                "user_id": {"type": "string"},
                "fields": {
                    "type": "array",
                    "items": {"type": "string", "enum": ["name", "email", "phone", "address", "preferences"]},
                    "description": "List of fields to retrieve. If empty, returns all fields."
                }
            }
        }
    },
    {
        "name": "update_user_info",
        "description": "Update user information.",
        "parameters": {
            "properties": {
                "user_id": {"type": "string"},
                "updates": {
                    "type": "object",
                    "description": "Key-value pairs of fields to update"
                }
            }
        }
    }
]

3.4 Few-shot Examples in System Prompt

system_prompt = """You are a helpful assistant with access to tools.

Here are examples of how to use tools correctly:

User: "What's the weather like in Paris?"
Tool call: get_weather(location="Paris", unit="celsius")

User: "Find cheap flights from Seoul to Tokyo next Friday"
Tool call: search_flights(origin="ICN", destination="NRT", date="2025-03-28", sort_by="price_asc")

User: "How are you today?"
No tool call needed - just respond conversationally.
"""

3.5 Schema Simplification

# BAD: Deeply nested structure
{
    "parameters": {
        "type": "object",
        "properties": {
            "filter": {
                "type": "object",
                "properties": {
                    "conditions": {
                        "type": "array",
                        "items": {
                            "type": "object",
                            "properties": {
                                "field": {"type": "string"},
                                "operator": {"type": "string"},
                                "value": {"type": "object"}
                            }
                        }
                    }
                }
            }
        }
    }
}

# GOOD: Flattened structure
{
    "parameters": {
        "type": "object",
        "properties": {
            "filter_field": {"type": "string", "description": "Field to filter on"},
            "filter_operator": {"type": "string", "enum": ["eq", "gt", "lt", "contains"]},
            "filter_value": {"type": "string", "description": "Value to filter by (as string)"}
        }
    }
}

3.6 Error Handling and Retry Logic

import json
import time
from tenacity import retry, stop_after_attempt, wait_exponential

class ToolExecutor:
    def __init__(self):
        self.tool_registry = {}

    def register(self, name, func):
        self.tool_registry[name] = func

    @retry(stop=stop_after_attempt(3), wait=wait_exponential(min=1, max=10))
    def execute(self, tool_name, arguments_str):
        """Execute tool with error handling and retry"""
        try:
            args = json.loads(arguments_str)
        except json.JSONDecodeError as e:
            return {"error": f"Invalid JSON arguments: {str(e)}"}

        func = self.tool_registry.get(tool_name)
        if not func:
            return {"error": f"Unknown tool: {tool_name}"}

        try:
            result = func(**args)
            return {"success": True, "data": result}
        except TypeError as e:
            return {"error": f"Invalid parameters: {str(e)}"}
        except Exception as e:
            return {"error": f"Tool execution failed: {str(e)}"}

3.7 Caching Tool Results

import hashlib
from datetime import datetime, timedelta

class ToolCache:
    def __init__(self, ttl_seconds=300):
        self.cache = {}
        self.ttl = timedelta(seconds=ttl_seconds)

    def _make_key(self, tool_name, arguments):
        raw = f"{tool_name}:{json.dumps(arguments, sort_keys=True)}"
        return hashlib.sha256(raw.encode()).hexdigest()

    def get(self, tool_name, arguments):
        key = self._make_key(tool_name, arguments)
        if key in self.cache:
            entry = self.cache[key]
            if datetime.now() - entry["timestamp"] < self.ttl:
                return entry["result"]
            del self.cache[key]
        return None

    def set(self, tool_name, arguments, result):
        key = self._make_key(tool_name, arguments)
        self.cache[key] = {
            "result": result,
            "timestamp": datetime.now()
        }

3.8 Latency Optimization

import asyncio

async def parallel_tool_execution(tool_calls):
    """Execute multiple tools asynchronously in parallel"""
    async def execute_one(tc):
        tool_name = tc["function"]["name"]
        args = json.loads(tc["function"]["arguments"])
        return {
            "tool_call_id": tc["id"],
            "result": await async_execute_tool(tool_name, args)
        }

    results = await asyncio.gather(
        *[execute_one(tc) for tc in tool_calls],
        return_exceptions=True
    )
    return results

3.9 Fine-tuning for Tool Calling

# Tool Calling fine-tuning with Unsloth
from unsloth import FastLanguageModel

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="unsloth/Llama-3.1-8B-Instruct",
    max_seq_length=4096,
    load_in_4bit=True,
)

# Tool Calling dataset format
training_data = [
    {
        "messages": [
            {
                "role": "system",
                "content": "You have access to the following tools: ..."
            },
            {
                "role": "user",
                "content": "What is the stock price of AAPL?"
            },
            {
                "role": "assistant",
                "content": None,
                "tool_calls": [
                    {
                        "type": "function",
                        "function": {
                            "name": "get_stock_price",
                            "arguments": "{\"symbol\": \"AAPL\"}"
                        }
                    }
                ]
            }
        ]
    }
]

# LoRA fine-tuning
model = FastLanguageModel.get_peft_model(
    model,
    r=16,
    target_modules=["q_proj", "k_proj", "v_proj", "o_proj"],
    lora_alpha=16,
    lora_dropout=0,
)

4. ReAct Pattern Implementation

4.1 Thought-Action-Observation Loop

ReAct (Reasoning + Acting) is a pattern where the LLM repeats Thought - Action - Observation loops to solve problems.

Question: Is Seoul's population larger than Tokyo's in 2025?

Thought 1: I need to search for the current populations of Seoul and Tokyo.
Action 1: search("Seoul population 2025")
Observation 1: Seoul metropolitan area: ~9.7 million

Thought 2: I also need Tokyo's population.
Action 2: search("Tokyo population 2025")
Observation 2: Tokyo metropolitan area: ~13.9 million

Thought 3: Seoul (9.7M) is less than Tokyo (13.9M), so Seoul has a smaller population.
Action 3: finish("No. Seoul (~9.7M) has a smaller population than Tokyo (~13.9M).")

4.2 Python Implementation from Scratch

import re
import json

class ReActAgent:
    def __init__(self, llm_client, tools, max_iterations=10):
        self.llm = llm_client
        self.tools = {t["name"]: t for t in tools}
        self.tool_functions = {}
        self.max_iterations = max_iterations

    def register_function(self, name, func):
        self.tool_functions[name] = func

    def _build_system_prompt(self):
        tool_descriptions = "\n".join([
            f"- {t['name']}: {t['description']}"
            for t in self.tools.values()
        ])

        return f"""You are a helpful assistant that solves problems step by step.

Available tools:
{tool_descriptions}

For each step, you MUST output in this exact format:
Thought: [your reasoning about what to do next]
Action: [tool_name(param1="value1", param2="value2")]

After receiving an observation, continue with the next Thought.
When you have the final answer, use:
Thought: [your final reasoning]
Action: finish(answer="[your final answer]")
"""

    def _parse_action(self, text):
        """Extract function name and args from Action string"""
        match = re.search(r'Action:\s*(\w+)\((.*?)\)', text, re.DOTALL)
        if not match:
            return None, None

        func_name = match.group(1)
        args_str = match.group(2)

        args = {}
        for arg_match in re.finditer(r'(\w+)="([^"]*)"', args_str):
            args[arg_match.group(1)] = arg_match.group(2)

        return func_name, args

    def run(self, user_query):
        """Execute the ReAct loop"""
        messages = [
            {"role": "system", "content": self._build_system_prompt()},
            {"role": "user", "content": user_query}
        ]

        for i in range(self.max_iterations):
            response = self.llm.chat.completions.create(
                model="gpt-4o",
                messages=messages,
                temperature=0
            )

            assistant_message = response.choices[0].message.content
            messages.append({"role": "assistant", "content": assistant_message})

            print(f"\n--- Step {i+1} ---")
            print(assistant_message)

            func_name, args = self._parse_action(assistant_message)

            if func_name == "finish":
                return args.get("answer", "No answer provided")

            if func_name and func_name in self.tool_functions:
                try:
                    result = self.tool_functions[func_name](**args)
                    observation = f"Observation: {result}"
                except Exception as e:
                    observation = f"Observation: Error - {str(e)}"
            else:
                observation = f"Observation: Unknown tool '{func_name}'"

            print(observation)
            messages.append({"role": "user", "content": observation})

        return "Max iterations reached without a final answer."

# Usage example
agent = ReActAgent(llm_client=openai.OpenAI(), tools=[...])
agent.register_function("search", lambda query: web_search(query))
agent.register_function("calculate", lambda expression: eval(expression))

answer = agent.run("When did India's GDP surpass Japan's?")

4.3 ReAct vs Simple Tool Calling

ScenarioSimple Tool CallingReAct Pattern
Simple lookups (weather, stock)SuitableOverkill
Single calculationSuitableOverkill
Multi-step reasoningInadequateSuitable
Using intermediate resultsInadequateSuitable
Complex researchInadequateSuitable
Conditional branchingDifficultSuitable

4.4 Limitations and Solutions

Limitations:

  • Loops can continue indefinitely (limit with max iterations)
  • Thoughts can go in wrong directions (need self-correction)
  • Slow responses (LLM call at every step)

Solutions:

  • Set maximum iteration count
  • Include error messages in Observation
  • Improve speed with caching and parallel processing
  • Better approach: LangGraph's graph-based control flow

5. Agent Frameworks Comparison

5.1 LangChain / LangGraph

LangGraph is a framework for building graph-based agent workflows:

from langgraph.graph import StateGraph, MessagesState, START, END
from langgraph.prebuilt import ToolNode, tools_condition
from langchain_openai import ChatOpenAI
from langchain_core.tools import tool

@tool
def search_web(query: str) -> str:
    """Search the web for information."""
    return f"Search results for: {query}"

@tool
def calculate(expression: str) -> str:
    """Calculate a mathematical expression."""
    return str(eval(expression))

# Model setup
llm = ChatOpenAI(model="gpt-4o").bind_tools([search_web, calculate])

def call_model(state: MessagesState):
    response = llm.invoke(state["messages"])
    return {"messages": [response]}

# Build graph
graph = StateGraph(MessagesState)
graph.add_node("agent", call_model)
graph.add_node("tools", ToolNode(tools=[search_web, calculate]))

graph.add_edge(START, "agent")
graph.add_conditional_edges("agent", tools_condition)
graph.add_edge("tools", "agent")

app = graph.compile()

# Execute
result = app.invoke({
    "messages": [("user", "What is 15% of the population of France?")]
})

5.2 CrewAI (Role-Based Multi-Agent)

from crewai import Agent, Task, Crew, Process

# Agent definitions
researcher = Agent(
    role="Senior Research Analyst",
    goal="Uncover cutting-edge developments in AI",
    backstory="You are an expert research analyst at a leading tech think tank.",
    verbose=True,
    allow_delegation=False,
    tools=[search_tool, scrape_tool]
)

writer = Agent(
    role="Tech Content Writer",
    goal="Write engaging blog posts about AI developments",
    backstory="You are a renowned content strategist known for insightful articles.",
    verbose=True,
    allow_delegation=False
)

editor = Agent(
    role="Chief Editor",
    goal="Ensure high quality and accuracy of the final content",
    backstory="You are a meticulous editor with decades of publishing experience.",
    verbose=True
)

# Task definitions
research_task = Task(
    description="Research the latest AI agent frameworks released in 2025.",
    expected_output="A comprehensive report on latest AI agent frameworks.",
    agent=researcher
)

writing_task = Task(
    description="Write a blog post based on the research findings.",
    expected_output="A polished blog post of 1000+ words.",
    agent=writer
)

editing_task = Task(
    description="Review and edit the blog post for quality and accuracy.",
    expected_output="A final, publication-ready blog post.",
    agent=editor
)

# Crew execution
crew = Crew(
    agents=[researcher, writer, editor],
    tasks=[research_task, writing_task, editing_task],
    process=Process.sequential,
    verbose=True
)

result = crew.kickoff()

5.3 AutoGen (Microsoft, Conversation-Based)

from autogen import AssistantAgent, UserProxyAgent

# Assistant agent
assistant = AssistantAgent(
    name="assistant",
    llm_config={
        "model": "gpt-4o",
        "temperature": 0,
    },
    system_message="You are a helpful AI assistant. Solve tasks step by step."
)

# User proxy (code execution environment)
user_proxy = UserProxyAgent(
    name="user_proxy",
    human_input_mode="NEVER",
    max_consecutive_auto_reply=10,
    code_execution_config={
        "work_dir": "workspace",
        "use_docker": False,
    }
)

# Start conversation
user_proxy.initiate_chat(
    assistant,
    message="Analyze the top 10 AI papers from arXiv this week and create a summary."
)

5.4 Smolagents (HuggingFace)

from smolagents import CodeAgent, ToolCallingAgent, HfApiModel, tool

@tool
def get_weather(location: str) -> str:
    """Get weather for a location.

    Args:
        location: The city name to get weather for.
    """
    return f"Weather in {location}: 22C, sunny"

# Code Agent: generates and executes Python code
agent = CodeAgent(
    tools=[get_weather],
    model=HfApiModel("Qwen/Qwen2.5-72B-Instruct"),
)

result = agent.run("What's the weather in Seoul and Tokyo?")

5.5 Framework Comparison Table

FeatureLangGraphCrewAIAutoGenSmolagentsClaude SDK
DeveloperLangChainCrewAIMicrosoftHuggingFaceAnthropic
ParadigmGraph-basedRole-basedConversationCode generationLoop-based
Multi-agentPowerfulCore featureCore featureBasicManual
Learning curveHighLowMediumLowLow
CustomizationVery highMediumHighMediumVery high
Production stabilityHighMediumMediumEarly stageHigh
State managementBuilt-in (checkpoints)BasicBasicNoneManual
StreamingSupportedNot supportedNot supportedNot supportedSupported
Human-in-the-loopBuilt-inNot supportedSupportedNot supportedManual
Open sourceYesYesYesYesSDK only

6. Multi-Agent Architectures

6.1 Supervisor Pattern

A single coordinator (Supervisor) manages specialist agents:

from langgraph.graph import StateGraph, MessagesState, START, END

def supervisor_node(state: MessagesState):
    """Analyze task and delegate to appropriate agent"""
    system_msg = """You are a supervisor managing a team of specialists:
    - researcher: for finding information
    - coder: for writing and reviewing code
    - writer: for creating content

    Based on the user request, decide which agent should handle it.
    Respond with the agent name: 'researcher', 'coder', or 'writer'.
    If the task is complete, respond with 'FINISH'."""

    response = llm.invoke([
        {"role": "system", "content": system_msg},
        *state["messages"]
    ])
    return {"next": response.content.strip()}

# Build graph
graph = StateGraph(MessagesState)
graph.add_node("supervisor", supervisor_node)
graph.add_node("researcher", researcher_node)
graph.add_node("coder", coder_node)
graph.add_node("writer", writer_node)

graph.add_edge(START, "supervisor")
graph.add_conditional_edges("supervisor", lambda s: s["next"], {
    "researcher": "researcher",
    "coder": "coder",
    "writer": "writer",
    "FINISH": END
})
graph.add_edge("researcher", "supervisor")
graph.add_edge("coder", "supervisor")
graph.add_edge("writer", "supervisor")

6.2 Peer-to-Peer Pattern

Agents communicate directly with each other:

┌──────────┐    message    ┌──────────┐
Agent A  │──────────────▶│ Agent B(Researcher)              (Analyst)│          │◀──────────────│          │
└──────────┘    response   └──────────┘
      │                          │
      │         message          │
      └──────────┬───────────────┘
           ┌──────────┐
Agent C            (Writer)           └──────────┘

6.3 Hierarchical Pattern

         ┌────────────┐
DirectorAgent         └──────┬─────┘
        ┌───────┴───────┐
   ┌────▼────┐    ┌─────▼────┐
Team     │    │ TeamLead A   │    │ Lead B   └────┬─────┘    └────┬─────┘
   ┌────┴────┐    ┌─────┴────┐
┌──▼──┐ ┌──▼──┐ ┌──▼──┐ ┌──▼──┐
│Wkr 1│ │Wkr 2│ │Wkr 3│ │Wkr 4└─────┘ └─────┘ └─────┘ └─────┘

7. MCP (Model Context Protocol)

7.1 What is MCP?

MCP (Model Context Protocol) is a standardized communication protocol between agents and external tools proposed by Anthropic. Just as USB connects various devices through a standardized interface, MCP enables AI Agents to communicate with diverse external tools in a standardized way.

┌─────────────┐     MCP Protocol     ┌─────────────────┐
AI Agent    │◀───────────────────▶│  MCP Server  (Client)JSON-RPC 2.0       (Tool Provider)│              │                      │                  │
- Claude    │    ┌──────────┐     │  - Filesystem- GPT-4     │    │ Resources│- Database- Custom    │    │ Tools    │     │  - GitHub│              │    │ Prompts  │     │  - Slack└─────────────┘    └──────────┘     └─────────────────┘

7.2 Why MCP?

Problems with traditional approaches:

  • Different Tool Calling formats for each LLM Provider
  • Custom integration code needed for every new tool
  • No interoperability between tools
  • Different methods for accessing context (files, DB content, etc.)

MCP solution:

  • Standardized tool interface (any LLM can use the same MCP server)
  • Rich ecosystem (build once, use everywhere)
  • Unified protocol for Resources + Tools + Prompts

7.3 Building MCP Servers (Python)

from mcp.server.fastmcp import FastMCP

# Create MCP server
mcp = FastMCP("My Tool Server")

@mcp.tool()
def search_database(query: str, table: str = "products") -> str:
    """Search the database for records matching the query.

    Args:
        query: Search query string
        table: Database table to search (default: products)
    """
    results = db.search(table, query)
    return json.dumps(results)

@mcp.tool()
def send_email(to: str, subject: str, body: str) -> str:
    """Send an email to the specified recipient.

    Args:
        to: Recipient email address
        subject: Email subject line
        body: Email body content
    """
    email_service.send(to=to, subject=subject, body=body)
    return f"Email sent successfully to {to}"

@mcp.resource("config://app")
def get_app_config() -> str:
    """Get the current application configuration."""
    return json.dumps(app_config)

# Run server
if __name__ == "__main__":
    mcp.run(transport="stdio")

7.4 Available MCP Servers

MCP ServerFunctionKey Tools
filesystemFile system accessRead, write, search, directory traversal
postgresPostgreSQL DBQueries, schema inspection
sqliteSQLite DBQueries, table management
githubGitHub integrationPRs, Issues, code search
slackSlack integrationMessage sending, channel management
brave-searchWeb searchBrave Search API
puppeteerBrowser controlScreenshots, navigation
memoryPersistent memoryKnowledge graph store/search
google-driveGoogle DriveFile search, reading
notionNotion integrationPages, databases

7.5 MCP vs Custom Tool Implementations

AspectMCPCustom Implementation
StandardizationStandard protocolSelf-defined
ReusabilityHigh (ecosystem shared)Low (per-project)
Initial setupSome boilerplateFlexible
LLM compatibilityAll MCP clientsMay be Provider-specific
SecurityBuilt-in sandboxingMust implement yourself
DebuggingStandard toolingCustom logging needed

8. Memory Systems

8.1 Short-term Memory (Conversation Context)

class ConversationMemory:
    def __init__(self, max_messages=50):
        self.messages = []
        self.max_messages = max_messages

    def add(self, role, content):
        self.messages.append({"role": role, "content": content})
        if len(self.messages) > self.max_messages:
            non_system = [m for m in self.messages if m["role"] != "system"]
            system = [m for m in self.messages if m["role"] == "system"]
            self.messages = system + non_system[-self.max_messages:]

    def get_messages(self):
        return self.messages.copy()

8.2 Long-term Memory (Vector DB)

from langchain_community.vectorstores import Chroma
from langchain_openai import OpenAIEmbeddings

class LongTermMemory:
    def __init__(self, collection_name="agent_memory"):
        self.embeddings = OpenAIEmbeddings()
        self.vectorstore = Chroma(
            collection_name=collection_name,
            embedding_function=self.embeddings,
            persist_directory="./memory_db"
        )

    def store(self, text, metadata=None):
        """Store important information in long-term memory"""
        self.vectorstore.add_texts(
            texts=[text],
            metadatas=[metadata or {}]
        )

    def recall(self, query, k=5):
        """Retrieve relevant memories"""
        results = self.vectorstore.similarity_search(query, k=k)
        return [doc.page_content for doc in results]

8.3 Episodic Memory (Past Interactions)

class EpisodicMemory:
    def __init__(self):
        self.episodes = []

    def record_episode(self, task, actions, outcome, learnings):
        """Record past task experiences"""
        self.episodes.append({
            "task": task,
            "actions": actions,
            "outcome": outcome,
            "learnings": learnings,
            "timestamp": datetime.now().isoformat()
        })

    def find_similar_episodes(self, current_task, k=3):
        """Find past experiences similar to current task"""
        similar = self.vectorstore.similarity_search(current_task, k=k)
        return similar

8.4 Working Memory (Scratchpad)

class WorkingMemory:
    def __init__(self):
        self.scratchpad = {}
        self.current_plan = []
        self.intermediate_results = []

    def set(self, key, value):
        """Store temporary data during task execution"""
        self.scratchpad[key] = value

    def get(self, key):
        return self.scratchpad.get(key)

    def get_context_string(self):
        """Convert working memory to string for LLM prompt"""
        parts = []
        if self.current_plan:
            parts.append("Current Plan:")
            for i, step in enumerate(self.current_plan):
                status = step.get("status", "pending")
                parts.append(f"  {i+1}. [{status}] {step['description']}")
        if self.intermediate_results:
            parts.append("\nIntermediate Results:")
            for result in self.intermediate_results[-5:]:
                parts.append(f"  - {result}")
        return "\n".join(parts)

9. Production Deployment

9.1 Safety

class SafetyGuard:
    def __init__(self):
        self.allowed_tools = set()
        self.rate_limits = {}

    def whitelist_tool(self, tool_name, max_calls_per_minute=10):
        """Register allowed tool"""
        self.allowed_tools.add(tool_name)
        self.rate_limits[tool_name] = {
            "max_per_minute": max_calls_per_minute,
            "calls": []
        }

    def check_tool_call(self, tool_name, arguments):
        """Safety check before tool execution"""
        if tool_name not in self.allowed_tools:
            raise SecurityError(f"Tool '{tool_name}' is not whitelisted")

        now = datetime.now()
        calls = self.rate_limits[tool_name]["calls"]
        calls = [c for c in calls if (now - c).seconds < 60]
        if len(calls) >= self.rate_limits[tool_name]["max_per_minute"]:
            raise RateLimitError(f"Rate limit exceeded for '{tool_name}'")
        self.rate_limits[tool_name]["calls"] = calls + [now]
        return True

    def human_in_the_loop(self, tool_name, arguments):
        """Request human approval for dangerous tool calls"""
        dangerous_tools = {"delete_file", "send_email", "execute_code"}
        if tool_name in dangerous_tools:
            print(f"\n[APPROVAL REQUIRED] Tool: {tool_name}")
            print(f"Arguments: {json.dumps(arguments, indent=2)}")
            approval = input("Approve? (yes/no): ")
            return approval.lower() == "yes"
        return True

9.2 Observability

# LangSmith integration
import os
os.environ["LANGCHAIN_TRACING_V2"] = "true"
os.environ["LANGCHAIN_API_KEY"] = "your-api-key"
os.environ["LANGCHAIN_PROJECT"] = "my-agent-project"

# Langfuse integration
from langfuse.decorators import observe

@observe()
def agent_step(messages, tools):
    """Observable agent step"""
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=messages,
        tools=tools
    )
    return response

# Custom metrics logging
class AgentMetrics:
    def __init__(self):
        self.tool_calls = []
        self.latencies = []

    def log_tool_call(self, tool_name, duration_ms, success, tokens_used):
        self.tool_calls.append({
            "tool": tool_name,
            "duration_ms": duration_ms,
            "success": success,
            "timestamp": datetime.now().isoformat()
        })

    def get_summary(self):
        return {
            "total_calls": len(self.tool_calls),
            "avg_latency_ms": sum(self.latencies) / max(len(self.latencies), 1),
            "error_rate": sum(1 for t in self.tool_calls if not t["success"]) / max(len(self.tool_calls), 1),
        }

9.3 Cost Management

class CostManager:
    PRICING = {
        "gpt-4o": {"input": 2.50, "output": 10.00},
        "gpt-4o-mini": {"input": 0.15, "output": 0.60},
        "claude-3.5-sonnet": {"input": 3.00, "output": 15.00},
        "claude-3.5-haiku": {"input": 0.80, "output": 4.00},
    }

    def __init__(self, budget_limit_usd=100.0):
        self.total_cost = 0.0
        self.budget_limit = budget_limit_usd

    def track_usage(self, model, input_tokens, output_tokens):
        pricing = self.PRICING.get(model, {"input": 0, "output": 0})
        cost = (input_tokens * pricing["input"] + output_tokens * pricing["output"]) / 1_000_000
        self.total_cost += cost

        if self.total_cost > self.budget_limit:
            raise BudgetExceededError(
                f"Budget exceeded: ${self.total_cost:.2f} > ${self.budget_limit:.2f}"
            )

    def optimize_model_selection(self, task_complexity):
        """Auto-select model based on task complexity"""
        if task_complexity == "simple":
            return "gpt-4o-mini"
        elif task_complexity == "moderate":
            return "claude-3.5-haiku"
        else:
            return "gpt-4o"

9.4 Error Recovery Patterns

class AgentErrorRecovery:
    def __init__(self, max_retries=3):
        self.max_retries = max_retries

    async def execute_with_recovery(self, agent_step, context):
        """Execute agent step with error recovery"""
        last_error = None

        for attempt in range(self.max_retries):
            try:
                return await agent_step(context)
            except ToolTimeoutError as e:
                last_error = e
                context["timeout_multiplier"] = 2 ** attempt
            except ToolNotFoundError as e:
                last_error = e
                context["use_fallback"] = True
            except InvalidArgumentsError as e:
                last_error = e
                context["error_feedback"] = str(e)
            except RateLimitError as e:
                last_error = e
                await asyncio.sleep(2 ** attempt)

        raise AgentFailedError(f"Failed after {self.max_retries} attempts: {last_error}")

10. Quiz

Q1. What are the four core components of an AI Agent?

Answer: LLM (reasoning engine), Tools (external tools), Memory (context maintenance), Planning (plan formulation)

The LLM serves as the agent's brain, Tools enable interaction with the external world, Memory maintains context, and Planning breaks down complex tasks into steps.

Q2. Does the LLM directly execute functions in Tool Calling?

Answer: No. The LLM does not execute functions directly.

The LLM outputs which function to call with which arguments in JSON format, and the actual function execution is performed by the application code (client side). The execution result is then passed back to the LLM to generate the final response.

Q3. Explain the three stages of the ReAct pattern loop.

Answer: Thought - Action - Observation

  1. Thought: The LLM analyzes the current situation and reasons about the next action
  2. Action: Based on reasoning, invokes a tool or generates a final answer
  3. Observation: Checks the tool execution result and returns to Thought

This loop repeats until a final answer is produced.

Q4. What are the three main components of MCP (Model Context Protocol)?

Answer: Resources, Tools, Prompts

  • Resources: Data sources the agent can access (files, DBs, etc.)
  • Tools: Functions/actions the agent can execute
  • Prompts: Pre-defined prompt templates

All three are unified through a single standard protocol (JSON-RPC 2.0).

Q5. What is the most important factor in Tool Calling performance optimization?

Answer: Tool Description Engineering

Tool description quality determines 80% of Tool Calling performance. Descriptions should include clear and specific explanations, usage scenarios, and examples. Other important factors include parameter description quality, minimizing tool count, schema simplification, and few-shot examples.


11. References

  1. OpenAI Function Calling Documentation - OpenAI API Reference
  2. Anthropic Tool Use Guide - Anthropic Developer Documentation
  3. Google Gemini Function Calling - Google AI Developer Documentation
  4. Model Context Protocol (MCP) Specification - Anthropic, 2024
  5. ReAct: Synergizing Reasoning and Acting in Language Models - Yao et al., 2023
  6. LangGraph Documentation - LangChain
  7. CrewAI Documentation - CrewAI
  8. AutoGen: Enabling Next-Gen LLM Applications - Microsoft Research
  9. Smolagents Documentation - HuggingFace
  10. Gorilla: Large Language Model Connected with Massive APIs - UC Berkeley, 2023
  11. ToolLLM: Facilitating LLMs to Master 16000+ Real-world APIs - Qin et al., 2024
  12. A Survey on Large Language Model based Autonomous Agents - Wang et al., 2024
  13. Voyager: An Open-Ended Embodied Agent with Large Language Models - Wang et al., 2023
  14. BFCL (Berkeley Function Calling Leaderboard) - UC Berkeley Gorilla Project