1. What is an LLM Agent? - ReAct Paper Analysis
- Definition of an Agent
- ReAct: Combining Reasoning + Acting
2. Tool/Function Calling Mechanism
- Analysis Based on Anthropic Tool Use Official Documentation
3. OpenAI Function Calling vs Anthropic Tool Use Comparison
- OpenAI Approach Characteristics
- Anthropic Approach Characteristics
4. Planning Strategies
- Plan-and-Execute Pattern
- Tree of Thoughts (ToT)
5. Memory Systems
6. Analysis Based on LangGraph Official Documentation
- Core Concepts of LangGraph
7. Human-in-the-Loop Patterns
- LangGraph's Interrupt Mechanism
8. Building Multi-Agent Systems
- Supervisor Architecture
  - Implementation
  - Hierarchical Structure
9. MCP (Model Context Protocol) Overview
- An Open Standard Proposed by Anthropic
10. Agent Evaluation Methodology
11. Practical Example: Data Analysis Agent
12. References

1. What is an LLM Agent? - ReAct Paper Analysis

Definition of an Agent

An LLM Agent goes beyond a Language Model that simply generates text — it is a system that autonomously makes decisions while interacting with external environments. While traditional LLMs generate a single response to a given prompt, an Agent repeatedly performs Observation, Reasoning, and Action to achieve a goal.

The core components of an Agent system are as follows:

LLM (Brain): The core model responsible for reasoning and decision-making
Tools: External APIs, databases, code executors, and other instruments the Agent can use
Planning: Strategies for decomposing complex tasks into sub-steps
Memory: Mechanisms for maintaining past interactions and context

ReAct: Combining Reasoning + Acting

The ReAct (Reasoning and Acting) paper published by Yao et al. in 2022 is a foundational study that established the basis for LLM Agents. Accepted at ICLR 2023, this paper proposed a paradigm in which LLMs alternately generate reasoning traces and task-specific actions.

The limitations of existing approaches were clear. Chain-of-Thought (CoT) prompting excels at reasoning but cannot access external information, leading to hallucination. Conversely, Action-only approaches use external tools but lack reasoning processes, making complex judgments difficult.

ReAct unifies these two approaches. The core loop is as follows:

Thought: Analyze the current situation and plan the next action.
Action: Call an external tool or interact with the environment.
Observation: Observe the result of the Action.
... (repeat)
Thought: Sufficient information has been gathered, so generate the final answer.

For example, for the question "How much is Apple's current market cap in Korean Won?", a ReAct Agent operates as follows:

Thought: First, I need to check Apple's current market cap in USD.
Action: search("Apple current market cap USD")
Observation: Apple's market cap is approximately $3.4 trillion.
Thought: Now I need to check the current USD/KRW exchange rate.
Action: search("current USD KRW exchange rate")
Observation: 1 USD = 1,450 KRW
Thought: 3.4 trillion * 1,450 = 4,930 trillion KRW. Generate the final answer.
Answer: Apple's market cap is approximately 4,930 trillion KRW.

The experimental results were also impressive. On HotpotQA (question answering) and Fever (fact verification) tasks, ReAct significantly reduced hallucinations compared to CoT, and on decision-making benchmarks like ALFWorld and WebShop, it showed 34% and 10% success rate improvements over existing imitation learning and reinforcement learning methods, respectively.

2. Tool/Function Calling Mechanism

Analysis Based on Anthropic Tool Use Official Documentation

Anthropic's Claude provides Function Calling capabilities under the name Tool Use. According to the official documentation, Tool Use operates as follows:

Tool Definition

Tools to be used are defined in JSON Schema format when making API requests. Each tool definition includes name, description, and input_schema.

import anthropic

client = anthropic.Anthropic()

# Tool definition
tools = [
    {
        "name": "get_weather",
        "description": "Get the current weather in a given location",
        "input_schema": {
            "type": "object",
            "properties": {
                "location": {
                    "type": "string",
                    "description": "The city and state, e.g. San Francisco, CA"
                },
                "unit": {
                    "type": "string",
                    "enum": ["celsius", "fahrenheit"],
                    "description": "Temperature unit"
                }
            },
            "required": ["location"]
        }
    }
]

# API call
response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    tools=tools,
    messages=[
        {"role": "user", "content": "What's the weather like in Seoul?"}
    ]
)

Tool Use Operation Flow

Request Phase: The client sends a message along with tool definitions to the API.
Decision Phase: Claude selects an appropriate tool from available ones and returns a tool_use content block. At this point, the stop_reason becomes "tool_use".
Execution Phase: The client actually executes the tool. (Claude does not execute it directly.)
Result Delivery: The tool execution result is sent back to Claude as a tool_result content block.
Final Response: Claude generates a natural language response based on the tool result.

# 2. Extract tool_use block from Claude's response
tool_use_block = next(
    block for block in response.content if block.type == "tool_use"
)
tool_name = tool_use_block.name        # "get_weather"
tool_input = tool_use_block.input      # {"location": "Seoul, South Korea"}
tool_use_id = tool_use_block.id        # unique identifier

# 3. Actually execute the tool (implemented by the developer)
weather_result = call_weather_api(tool_input["location"])

# 4. Send tool result to Claude
follow_up = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    tools=tools,
    messages=[
        {"role": "user", "content": "What's the weather like in Seoul?"},
        {"role": "assistant", "content": response.content},
        {
            "role": "user",
            "content": [
                {
                    "type": "tool_result",
                    "tool_use_id": tool_use_id,
                    "content": weather_result
                }
            ]
        }
    ]
)

Advanced Features Added in 2025

Anthropic added three important features in 2025:

Tool Search Tool: Instead of pre-loading all tool definitions, tools are dynamically discovered as needed. This enables efficient use of the context window.
Programmatic Tool Calling: Tools are called in a code execution environment, reducing the burden on the context window.
Structured Outputs: Adding the strict: true option to tool definitions guarantees that Claude's tool calls always exactly follow the defined schema.

3. OpenAI Function Calling vs Anthropic Tool Use Comparison

Both platforms provide mechanisms for LLMs to generate structured data to call external functions, but they differ in implementation and philosophy.

Item	OpenAI Function Calling	Anthropic Tool Use
Name	Function Calling (or Tool Calling)	Tool Use
Tool Definition Location	`tools` parameter	`tools` parameter
Schema Format	JSON Schema (`parameters`)	JSON Schema (`input_schema`)
Response Format	`tool_calls` array (inside message)	`tool_use` content block
Parallel Calling	Controlled via `parallel_tool_calls`	Supported (multiple `tool_use` blocks)
Strict Mode	`strict: true` (Structured Outputs)	`strict: true` (added 2025)
Server-side Tools	Web search, Code Interpreter, etc.	Web search, Code execution, etc.
Result Delivery	`tool` role message	`tool_result` content block

OpenAI Approach Characteristics

OpenAI allows fine-grained control over the model's tool usage through the tool_choice parameter. Options include "auto" (model decides autonomously), "required" (must use a tool), "none" (tool use prohibited), or specifying a particular function. Setting parallel_tool_calls: false restricts the model to calling only one tool at a time.

from openai import OpenAI

client = OpenAI()

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Compare the weather in Seoul and Tokyo"}],
    tools=[{
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get weather for a location",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {"type": "string"}
                },
                "required": ["location"]
            }
        }
    }],
    parallel_tool_calls=True  # Simultaneous call for Seoul and Tokyo
)

Anthropic Approach Characteristics

Anthropic's Claude tends to naturally expose its thinking process when using tools, making it easier to transparently understand the Agent's decision-making process. It also supports tool_choice with "auto", "any" (must use at least one tool), or specifying a particular tool.

The key difference lies in architectural philosophy. OpenAI handles tool calls at the message-level, while Anthropic handles them at the content block-level. This means Anthropic can more flexibly intermix text and tool calls within a single response.

4. Planning Strategies

Planning is a critical capability for Agents performing complex tasks. Let's examine the key strategies.

Plan-and-Execute Pattern

Plan-and-Execute is a pattern that first establishes an overall plan (Plan) and then sequentially executes each step (Execute). This approach, introduced in the LangChain blog, is based on the Plan-and-Solve Prompting paper by Wang et al. and Yohei Nakajima's BabyAGI project.

[User Query]
     |
     v
[Planner LLM] --> Step 1, Step 2, Step 3, ...
     |
     v
[Executor] --> Execute Step 1 --> Execute Step 2 --> Execute Step 3
     |
     v
[Re-planner] --> Modify plan if needed
     |
     v
[Final Answer]

The advantages of Plan-and-Execute over ReAct are as follows:

Speed: No need to call the large Planner LLM for each sub-task execution. A smaller model can execute individual steps.
Reasoning Quality: Since the Planner explicitly decomposes the entire task, fewer steps are missed.
Cost Efficiency: Costs can be reduced by using a high-performance model for the Planner and a lightweight model for the Executor.

Tree of Thoughts (ToT)

Tree of Thoughts, published by Yao et al. at NeurIPS 2023, is a framework designed to overcome the limitations of Chain-of-Thought. The core idea is to expand the LLM's reasoning process into a tree structure, exploring and evaluating multiple thought paths.

                    [Problem]
                   /  |  \
              [Thought1] [Thought2] [Thought3]    <-- Generate multiple paths
              /  \      |      \
         [1-a] [1-b]  [2-a]   [3-a]    <-- Expand each path
           |     |      |       |
        [Eval] [Eval]  [Eval]   [Eval]   <-- Self-evaluation
           |            |
        [Select]      [Select]            <-- Select optimal path

The core components of ToT are as follows:

Thought Decomposition: Decompose the problem into intermediate "thought" units.
Thought Generation: Generate multiple candidate thoughts at each step (via sampling or proposal).
State Evaluation: Use the LLM itself to evaluate the promise of each state.
Search Algorithm: Explore the tree using BFS (breadth-first search) or DFS (depth-first search).

On the Game of 24 task, GPT-4 + CoT achieved only a 4% success rate, while ToT achieved 74%. This demonstrates the power of structural reasoning that enables exploration and backtracking.

5. Memory Systems

A Memory system is essential for Agents to operate effectively over the long term. Memory is broadly classified into three types.

Short-term Memory

The LLM's Context Window itself serves as short-term memory. This includes current conversation history and recent tool call results.

# Short-term memory: managing conversation history as messages
messages = [
    {"role": "user", "content": "My name is Youngju Kim"},
    {"role": "assistant", "content": "Hello, Youngju Kim!"},
    {"role": "user", "content": "What did I say my name was?"},
    # Can reference previous conversation within the context window
]

The limitations are clear. The context window has size limits (Claude: 200K tokens, GPT-4o: 128K tokens), and memories disappear when the session ends.

Long-term Memory

This approach persistently stores information in external storage (Vector DB, Key-Value Store, etc.) and retrieves it when needed. RAG (Retrieval-Augmented Generation) is the most representative implementation pattern.

from langchain_community.vectorstores import Chroma
from langchain_openai import OpenAIEmbeddings

# Long-term memory: store in Vector DB
vectorstore = Chroma(
    collection_name="agent_memory",
    embedding_function=OpenAIEmbeddings()
)

# Store past interactions
vectorstore.add_texts([
    "The user is interested in Python and data analysis.",
    "The user lives in Seoul and prefers Korean.",
    "In a previous session, the user asked about pandas DataFrames."
])

# Search relevant memories
relevant_memories = vectorstore.similarity_search(
    "user's programming interests", k=3
)

Entity Memory

This is a mechanism that extracts and updates information about specific entities (people, places, concepts, etc.) that appear in conversations. Knowledge about each entity accumulates as conversations progress.

# Entity Memory example structure
entity_store = {
    "Youngju Kim": {
        "occupation": "Data Engineer",
        "interests": ["LLM", "Data Pipeline", "Kubernetes"],
        "preferred_language": "Python",
        "recent_question_topic": "LangGraph Agent Construction"
    },
    "Project_A": {
        "status": "In Progress",
        "tech_stack": ["LangGraph", "Claude API", "PostgreSQL"],
        "goal": "Building an internal data analysis Agent"
    }
}

In practice, these three types of Memory are used in combination. In LangGraph, MemorySaver (short-term) and external Store (long-term) can be injected at .compile() time to build an integrated Memory system.

6. Analysis Based on LangGraph Official Documentation

Core Concepts of LangGraph

LangGraph is an Agent Orchestration Framework developed by the LangChain team that models Agent workflows as directed graphs. According to the official documentation, the core components are as follows:

StateGraph

StateGraph is the central class of LangGraph. It is parameterized with a user-defined State object, and all nodes in the graph read from and write to this state.

from langgraph.graph import StateGraph, START, END
from langgraph.graph import MessagesState

# MessagesState is a built-in State that manages a messages list
graph = StateGraph(MessagesState)

You can also define custom State:

from typing import TypedDict, Annotated
from langgraph.graph.message import add_messages

class AgentState(TypedDict):
    messages: Annotated[list, add_messages]
    plan: list[str]
    current_step: int
    final_answer: str

Using Annotated and add_messages, you can specify reducer functions. These determine how to update existing values when a node returns state. add_messages operates by appending new messages to the existing message list.

Node

A Node is a function that performs actual logic for the Agent. It takes the current State as input and returns an updated State.

from langchain_anthropic import ChatAnthropic

model = ChatAnthropic(model="claude-sonnet-4-20250514")
model_with_tools = model.bind_tools(tools)

def call_model(state: AgentState):
    """Node that calls the LLM"""
    response = model_with_tools.invoke(state["messages"])
    return {"messages": [response]}

def execute_tool(state: AgentState):
    """Node that executes tools"""
    last_message = state["messages"][-1]
    results = []
    for tool_call in last_message.tool_calls:
        result = tool_map[tool_call["name"]].invoke(tool_call["args"])
        results.append(
            ToolMessage(content=str(result), tool_call_id=tool_call["id"])
        )
    return {"messages": results}

# Add nodes
graph.add_node("agent", call_model)
graph.add_node("tools", execute_tool)

Edge

Edges define the execution flow between nodes. LangGraph provides three types of Edges:

1. Normal Edge: Always moves to a fixed next node.

graph.add_edge(START, "agent")     # Start -> agent node
graph.add_edge("tools", "agent")   # tools -> agent node (deliver results)

2. Conditional Edge: Branches to different nodes based on conditions.

def should_continue(state: AgentState):
    """Determine if tool calls are needed"""
    last_message = state["messages"][-1]
    if last_message.tool_calls:
        return "tools"
    return END

graph.add_conditional_edges("agent", should_continue)

3. Entry Point: Define the graph's starting point using the START constant.

Compile and Execution

After defining the graph, you must compile it before execution.

from langgraph.checkpoint.memory import MemorySaver

checkpointer = MemorySaver()
app = graph.compile(checkpointer=checkpointer)

# Execute
result = app.invoke(
    {"messages": [("user", "Tell me the current weather in Seoul")]},
    config={"configurable": {"thread_id": "session-001"}}
)

Conversation sessions are distinguished through thread_id, and the Checkpointer automatically saves state at each super-step.

7. Human-in-the-Loop Patterns

LangGraph's Interrupt Mechanism

LangGraph's official documentation supports Human-in-the-Loop as a first-class citizen. The key components are the Checkpointer and the Interrupt function.

Basic Principle

All execution in LangGraph has state saved at each super-step through the Checkpointer. Based on this, the graph can be paused before/after specific node execution, wait for user input, and then resumed.

Static Interrupt

Set breakpoints before or after specific node execution.

# Set breakpoints at compile time
app = graph.compile(
    checkpointer=checkpointer,
    interrupt_before=["execute_tool"],   # Interrupt before tool execution
    # interrupt_after=["execute_tool"],  # Interrupt after tool execution
)

# Execution stops just before the execute_tool node
result = app.invoke(
    {"messages": [("user", "Please delete this data")]},
    config={"configurable": {"thread_id": "session-002"}}
)

# Resume after user approval
app.invoke(None, config={"configurable": {"thread_id": "session-002"}})

Dynamic Interrupt (interrupt function)

LangGraph provides an interrupt() function that allows dynamic interruption within a node. This approach is more flexible.

from langgraph.types import interrupt, Command

def sensitive_tool_node(state: AgentState):
    """Node that requests user approval before performing sensitive operations"""
    last_message = state["messages"][-1]

    for tool_call in last_message.tool_calls:
        if tool_call["name"] in ["delete_data", "send_email", "execute_query"]:
            # Interrupt execution and request user approval
            user_response = interrupt(
                f"Attempting to execute '{tool_call['name']}' tool. "
                f"Input: {tool_call['args']}. Do you approve?"
            )

            if user_response != "approved":
                return {"messages": [
                    ToolMessage(
                        content="User rejected the execution.",
                        tool_call_id=tool_call["id"]
                    )
                ]}

        # Execute if approved
        result = tool_map[tool_call["name"]].invoke(tool_call["args"])
        return {"messages": [
            ToolMessage(content=str(result), tool_call_id=tool_call["id"])
        ]}

To resume an interrupted graph, use Command(resume=...):

# User approves from the interrupted state
app.invoke(
    Command(resume="approved"),
    config={"configurable": {"thread_id": "session-002"}}
)

Use Scenarios

Approval for dangerous operations: User confirmation before data deletion, email sending, payment processing
Clarifying ambiguous requests: Additional questions when the Agent cannot precisely determine the user's intent
Reviewing execution plans: User review/modification of plans after the planning phase in Plan-and-Execute patterns

8. Building Multi-Agent Systems

Supervisor Architecture

The Supervisor pattern presented in LangGraph official documentation and the langgraph-supervisor library is the core architecture of Multi-Agent systems. A central Supervisor Agent coordinates specialized Worker Agents.

                    [User Query]
                         |
                         v
                   [Supervisor Agent]
                   /       |        \
                  v        v         v
          [Research    [Code       [Data
           Agent]      Agent]      Agent]
              |           |           |
              v           v           v
         [Web Search] [Code Exec] [SQL Query]

Implementation

from langgraph.graph import StateGraph, MessagesState, START, END

# Define each specialized Agent
def research_agent(state: MessagesState):
    """Web search specialist Agent"""
    model = ChatAnthropic(model="claude-sonnet-4-20250514")
    model_with_search = model.bind_tools([web_search_tool])
    response = model_with_search.invoke(state["messages"])
    return {"messages": [response]}

def code_agent(state: MessagesState):
    """Code writing and execution specialist Agent"""
    model = ChatAnthropic(model="claude-sonnet-4-20250514")
    model_with_code = model.bind_tools([code_execution_tool])
    response = model_with_code.invoke(state["messages"])
    return {"messages": [response]}

def data_agent(state: MessagesState):
    """Data analysis specialist Agent"""
    model = ChatAnthropic(model="claude-sonnet-4-20250514")
    model_with_data = model.bind_tools([sql_query_tool, chart_tool])
    response = model_with_data.invoke(state["messages"])
    return {"messages": [response]}

# Supervisor routing function
def supervisor(state: MessagesState):
    """Decide which Agent to delegate the task to"""
    model = ChatAnthropic(model="claude-sonnet-4-20250514")
    system_prompt = """You are a Supervisor. Analyze the user's request
    and delegate it to the appropriate specialist Agent.
    - research: when information retrieval is needed
    - code: when code writing/execution is needed
    - data: when data analysis/visualization is needed
    - FINISH: when the task is complete"""

    response = model.invoke([
        {"role": "system", "content": system_prompt},
        *state["messages"]
    ])
    return {"messages": [response]}

def route_supervisor(state: MessagesState):
    """Route based on the Supervisor's decision"""
    last_message = state["messages"][-1].content
    if "research" in last_message.lower():
        return "research_agent"
    elif "code" in last_message.lower():
        return "code_agent"
    elif "data" in last_message.lower():
        return "data_agent"
    return END

# Compose graph
workflow = StateGraph(MessagesState)
workflow.add_node("supervisor", supervisor)
workflow.add_node("research_agent", research_agent)
workflow.add_node("code_agent", code_agent)
workflow.add_node("data_agent", data_agent)

workflow.add_edge(START, "supervisor")
workflow.add_conditional_edges("supervisor", route_supervisor)
workflow.add_edge("research_agent", "supervisor")
workflow.add_edge("code_agent", "supervisor")
workflow.add_edge("data_agent", "supervisor")

app = workflow.compile()

Hierarchical Structure

For more complex systems, a hierarchical Multi-Agent structure is used. A top-level Supervisor manages mid-level Supervisors, which in turn manage actual Workers. Each layer can be tested independently, and new domains can be added without affecting existing ones.

9. MCP (Model Context Protocol) Overview

An Open Standard Proposed by Anthropic

Model Context Protocol (MCP) is an open protocol announced by Anthropic in November 2024 that provides standardized connections between LLM applications and external data sources and tools. Just as USB is a standard interface for connecting various peripherals to computers, MCP is a standard interface for connecting AI models to external systems.

Architecture

MCP follows a Client-Server architecture.

[LLM Application (MCP Client)]
         |
    [MCP Protocol]
         |
[MCP Server A]  [MCP Server B]  [MCP Server C]
     |               |               |
[GitHub API]   [Database]      [File System]

MCP Host: Applications with embedded LLMs such as Claude Desktop, IDEs
MCP Client: Component within the Host that communicates with MCP Servers
MCP Server: A server that exposes specific capabilities (tools, data) through a standardized protocol

Core Capabilities

MCP Servers can expose three types of capabilities:

Tools: Functions that Agents can call (e.g., file reading, API calls)
Resources: Data that can be used as context (e.g., documents, configuration files)
Prompts: Reusable prompt templates

2025 Status

The MCP specification's latest version was published as of November 25, 2025, with over 10,000 public MCP servers currently active. Major AI products including ChatGPT, Cursor, Gemini, Microsoft Copilot, and Visual Studio Code have adopted MCP.

In June 2025, there was an important security update. MCP servers were classified as OAuth 2.0 Resource Servers, with support for Structured JSON Output (structuredContent) and an Elicitation feature for requesting user input mid-session.

Anthropic donated MCP to the Agentic AI Foundation, transitioning to a community-driven governance structure.

10. Agent Evaluation Methodology

Agent systems require evaluation on different dimensions than traditional LLM evaluation. The key evaluation axes are as follows:

Task Completion Rate

Measures whether the given goal was actually achieved. Evaluation criteria vary by benchmark.

WebArena: Success/failure of web browsing tasks
SWE-bench: Whether actual GitHub Issues were resolved
HumanEval: Code generation accuracy

Tool Selection Accuracy

Evaluates whether the Agent selected appropriate tools.

# Evaluation metrics example
evaluation = {
    "correct_tool_selected": True,      # Was the correct tool selected?
    "correct_parameters": True,         # Are the parameters accurate?
    "unnecessary_tool_calls": 0,        # Number of unnecessary calls
    "total_tool_calls": 3,             # Total call count
    "optimal_tool_calls": 2,           # Optimal call count
    "efficiency": 2/3                  # Efficiency
}

Trajectory Quality

Evaluates not just the final result but the efficiency of the process.

Did it reach the goal without unnecessary steps?
Did it recover appropriately when errors occurred?
Did it avoid repeating the same operations?

Safety & Guardrails

Did it avoid calling unauthorized tools?
Did it properly handle sensitive information?
Did it correctly interrupt when Human-in-the-Loop was needed?

Cost & Latency

Total API call count and token usage
Total elapsed time for task completion
Performance efficiency relative to model size

11. Practical Example: Data Analysis Agent

Below is the complete code for building a CSV Data Analysis Agent using LangGraph. It uses Anthropic Claude as the LLM and leverages Tool Use and Conditional Edges.

import pandas as pd
from typing import TypedDict, Annotated
from langchain_anthropic import ChatAnthropic
from langchain_core.messages import ToolMessage, HumanMessage
from langchain_core.tools import tool
from langgraph.graph import StateGraph, START, END, MessagesState
from langgraph.graph.message import add_messages
from langgraph.checkpoint.memory import MemorySaver
from langgraph.prebuilt import ToolNode, tools_condition


# ============================================
# 1. Tool Definitions
# ============================================

@tool
def load_csv(file_path: str) -> str:
    """Loads a CSV file and returns basic information."""
    df = pd.read_csv(file_path)
    info = {
        "shape": df.shape,
        "columns": list(df.columns),
        "dtypes": df.dtypes.to_dict(),
        "head": df.head().to_string(),
        "describe": df.describe().to_string()
    }
    return str(info)


@tool
def run_analysis(file_path: str, query: str) -> str:
    """Runs an analysis query on CSV data using pandas.

    Args:
        file_path: Path to the CSV file
        query: pandas query to execute (e.g., 'df.groupby("category").mean()')
    """
    df = pd.read_csv(file_path)
    try:
        result = eval(query, {"df": df, "pd": pd})
        if isinstance(result, pd.DataFrame):
            return result.to_string()
        elif isinstance(result, pd.Series):
            return result.to_string()
        return str(result)
    except Exception as e:
        return f"Query execution error: {str(e)}"


@tool
def generate_summary(analysis_results: str, user_question: str) -> str:
    """Converts analysis results into a user-friendly summary.

    Args:
        analysis_results: Analysis result text
        user_question: Original user question
    """
    summary = f"""
    ## Analysis Summary

    **Question**: {user_question}

    **Results**:
    {analysis_results}
    """
    return summary


# ============================================
# 2. Agent Configuration
# ============================================

# Tool list
tools = [load_csv, run_analysis, generate_summary]

# LLM setup
model = ChatAnthropic(
    model="claude-sonnet-4-20250514",
    temperature=0,
    max_tokens=4096
)
model_with_tools = model.bind_tools(tools)


# Agent node definition
def agent_node(state: MessagesState):
    """Agent reasons and calls tools as needed."""
    system_message = {
        "role": "system",
        "content": """You are a data analysis specialist Agent.
        To answer the user's question, follow these steps:
        1. First load the data with load_csv to understand its structure.
        2. Execute appropriate pandas queries with run_analysis.
        3. Summarize the results with generate_summary.
        Always think step by step and call the necessary tools."""
    }
    messages = [system_message] + state["messages"]
    response = model_with_tools.invoke(messages)
    return {"messages": [response]}


# ============================================
# 3. Graph Construction
# ============================================

# Create StateGraph
workflow = StateGraph(MessagesState)

# Add nodes
workflow.add_node("agent", agent_node)
workflow.add_node("tools", ToolNode(tools))

# Add edges
workflow.add_edge(START, "agent")
workflow.add_conditional_edges(
    "agent",
    tools_condition,  # "tools" if tool_calls exist, END otherwise
)
workflow.add_edge("tools", "agent")  # Deliver tool results to agent

# Compile
checkpointer = MemorySaver()
app = workflow.compile(checkpointer=checkpointer)


# ============================================
# 4. Execution
# ============================================

def run_data_agent(question: str, file_path: str, thread_id: str = "default"):
    """Run the data analysis Agent."""
    config = {"configurable": {"thread_id": thread_id}}

    initial_message = HumanMessage(
        content=f"File path: {file_path}\n\nQuestion: {question}"
    )

    result = app.invoke(
        {"messages": [initial_message]},
        config=config
    )

    # Extract final response
    final_message = result["messages"][-1]
    return final_message.content


# Usage example
if __name__ == "__main__":
    answer = run_data_agent(
        question="What is the monthly average sales and which month had the highest sales?",
        file_path="./data/sales.csv",
        thread_id="analysis-001"
    )
    print(answer)

This Agent operates according to the ReAct pattern. The LLM performs reasoning (agent_node), selects necessary tools (branching via tools_condition), and delivers tool results back to the LLM (tools -> agent Edge) in a repeating loop. Since MemorySaver saves state at each step, subsequent questions with the same thread_id will maintain previous analysis context.

12. References

Yao, S. et al. (2022). "ReAct: Synergizing Reasoning and Acting in Language Models." ICLR 2023. https://arxiv.org/abs/2210.03629
Yao, S. et al. (2023). "Tree of Thoughts: Deliberate Problem Solving with Large Language Models." NeurIPS 2023. https://arxiv.org/abs/2305.10601
Anthropic. "Tool use with Claude." Claude API Documentation. https://docs.anthropic.com/en/docs/build-with-claude/tool-use
Anthropic. "How to implement tool use." Claude API Documentation. https://platform.claude.com/docs/en/agents-and-tools/tool-use/implement-tool-use
Anthropic. "Introducing advanced tool use on the Claude Developer Console." https://www.anthropic.com/engineering/advanced-tool-use
OpenAI. "Function calling." OpenAI API Documentation. https://platform.openai.com/docs/guides/function-calling
LangChain. "LangGraph Documentation." https://docs.langchain.com/oss/python/langgraph/quickstart
LangChain. "LangGraph Graph API Overview." https://docs.langchain.com/oss/python/langgraph/graph-api
LangChain. "Human-in-the-loop." https://docs.langchain.com/oss/python/langchain/human-in-the-loop
LangChain. "Plan-and-Execute Agents." https://blog.langchain.com/planning-agents/
LangChain. "LangGraph Supervisor." https://github.com/langchain-ai/langgraph-supervisor-py
Anthropic. "Introducing the Model Context Protocol." https://www.anthropic.com/news/model-context-protocol
Model Context Protocol Specification. https://modelcontextprotocol.io/specification/2025-11-25