LLM 에이전트 & Agentic AI 완전 정복: ReAct, 멀티에이전트, MCP까지

들어가며
1. 에이전트란 무엇인가?
- 기존 LLM vs 에이전트
2. ReAct 프레임워크: 추론 + 행동
3. Chain-of-Thought & Tree-of-Thought
- Chain-of-Thought (CoT)
- Tree-of-Thought (ToT)
4. 메모리 시스템
5. 도구 통합 (Tool Integration)
6. LangGraph: 상태 기반 에이전트
- 상태 기반 에이전트 구현
7. 멀티에이전트 시스템
- CrewAI: 역할 기반 멀티에이전트
- AutoGen: 대화 기반 멀티에이전트
8. Claude API Tool Use 구현
9. 에이전트 평가
10. 2026년 트렌드: Computer-use & Coding Agents
- Computer-use Agents
- Coding Agents: Devin과 SWE-agent
11. OpenAI Assistants API
퀴즈: 핵심 개념 확인
마치며

들어가며

2024~2025년을 거쳐 2026년, AI의 패러다임은 단순한 "질문-답변" 챗봇에서 **자율적으로 행동하는 에이전트(Agent)**로 완전히 이동했습니다.

LLM 에이전트는 목표를 주면 스스로 계획을 세우고, 도구를 호출하고, 결과를 검토하며 목표를 달성합니다. Devin이 혼자서 GitHub 이슈를 해결하고, Claude가 컴퓨터 화면을 클릭하며 작업을 수행하는 시대가 왔습니다.

이 가이드에서는 LLM 에이전트의 핵심 개념부터 실전 구현까지 모두 다룹니다.

1. 에이전트란 무엇인가?

기존 LLM vs 에이전트

기존 LLM은 입력 → 출력의 단순한 구조입니다. 반면 에이전트는:

지각(Perceive): 환경(툴 결과, 사용자 입력, 메모리)에서 정보 수집
계획(Plan): 목표 달성을 위한 행동 시퀀스 결정
행동(Act): 도구 호출, API 요청, 코드 실행
반성(Reflect): 결과 평가 후 다음 행동 조정

에이전트의 핵심 구성요소:

구성요소	설명
LLM 코어	추론 및 의사결정 엔진
도구(Tools)	웹 검색, 코드 실행, API 등
메모리	단기/장기 컨텍스트 관리
오케스트레이터	에이전트 루프 제어

2. ReAct 프레임워크: 추론 + 행동

ReAct란?

ReAct(Reasoning + Acting)는 2022년 Yao et al.이 제안한 프레임워크로, LLM이 생각(Thought) → 행동(Action) → 관찰(Observation) 사이클을 반복하며 문제를 해결합니다.

Thought: 현재 상황을 분석하고 다음 행동 결정
Action: tool_name(arguments) 형태로 도구 호출
Observation: 도구 실행 결과 수신
... 반복 ...
Final Answer: 최종 답변 도출

왜 ReAct가 환각을 줄이는가?

일반 LLM은 전체 답변을 한 번에 생성하다 보니 중간에 사실을 "지어내는" 경향이 있습니다. ReAct는:

실시간 근거 확인: 각 Observation이 사실 기반의 앵커 역할
단계적 검증: 중간 결과를 확인하며 오류를 조기 수정
외부 지식 활용: 추론 과정 중 실제 도구로 검색/계산

Python 구현 예시

from langchain.agents import create_react_agent
from langchain_anthropic import ChatAnthropic
from langchain.tools import DuckDuckGoSearchRun, PythonREPLTool
from langchain import hub

llm = ChatAnthropic(model="claude-opus-4-5", temperature=0)
tools = [DuckDuckGoSearchRun(), PythonREPLTool()]

# ReAct 프롬프트 템플릿 로드
prompt = hub.pull("hwchase17/react")

agent = create_react_agent(llm, tools, prompt)

from langchain.agents import AgentExecutor
agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True)

result = agent_executor.invoke({
    "input": "2026년 현재 가장 인기 있는 AI 에이전트 프레임워크 3가지를 검색하고 비교표를 만들어줘"
})

3. Chain-of-Thought & Tree-of-Thought

Chain-of-Thought (CoT)

CoT는 "단계별로 생각해봅시다"라는 프롬프트만으로 LLM의 추론 능력을 극적으로 향상시킵니다.

cot_prompt = """
문제를 단계별로 풀어보세요:

문제: {problem}

풀이 과정:
1. 먼저 주어진 정보를 정리합니다.
2. 필요한 계산/추론을 수행합니다.
3. 중간 결과를 검증합니다.
4. 최종 답을 도출합니다.
"""

Tree-of-Thought (ToT)

ToT는 CoT를 확장해 여러 추론 경로를 트리 구조로 탐색합니다. BFS/DFS로 가장 유망한 경로를 선택합니다.

from langchain_experimental.tot.base import ToTChain
from langchain_experimental.tot.thought_generation import ProposePromptStrategy

tot_chain = ToTChain.from_llm(
    llm=llm,
    checker=checker,
    k=3,           # 각 레벨에서 생성할 가지 수
    c=4,           # 평가 깊이
    verbose=True
)

4. 메모리 시스템

메모리의 4가지 유형

에이전트의 메모리는 인간의 기억 체계와 유사하게 설계됩니다:

메모리 유형	저장 위치	특징
센서리 메모리	입력 컨텍스트	현재 입력 처리
단기 메모리	컨텍스트 윈도우	현재 대화 세션
장기 메모리	벡터 DB / KV 저장소	영구 지식 저장
에피소딕 메모리	벡터 DB	과거 경험 인덱싱

mem0: 장기 메모리 통합

mem0는 에이전트에 개인화된 장기 메모리를 추가하는 오픈소스 라이브러리입니다.

from mem0 import Memory

# mem0 초기화 (벡터 DB로 Qdrant 사용)
config = {
    "vector_store": {
        "provider": "qdrant",
        "config": {
            "collection_name": "agent_memory",
            "host": "localhost",
            "port": 6333,
        }
    },
    "llm": {
        "provider": "anthropic",
        "config": {
            "model": "claude-opus-4-5",
            "temperature": 0,
        }
    }
}

memory = Memory.from_config(config)
user_id = "user_123"

# 메모리 저장
memory.add(
    messages=[
        {"role": "user", "content": "나는 Python을 주로 쓰고 FastAPI를 좋아해"},
        {"role": "assistant", "content": "알겠습니다! Python/FastAPI 선호를 기억할게요."}
    ],
    user_id=user_id
)

# 메모리 검색 및 활용
relevant_memories = memory.search(
    query="사용자가 선호하는 언어는?",
    user_id=user_id
)

context = "\n".join([m["memory"] for m in relevant_memories])
print(f"관련 메모리: {context}")

벡터 저장소 기반 에피소딕 메모리

from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import Chroma
from datetime import datetime

class EpisodicMemory:
    def __init__(self):
        self.embeddings = OpenAIEmbeddings()
        self.store = Chroma(
            collection_name="episodes",
            embedding_function=self.embeddings,
            persist_directory="./episodic_memory"
        )

    def store_episode(self, content: str, metadata: dict = None):
        """대화/작업 에피소드를 메모리에 저장"""
        metadata = metadata or {}
        metadata["timestamp"] = datetime.now().isoformat()
        self.store.add_texts([content], metadatas=[metadata])

    def recall(self, query: str, k: int = 3):
        """관련 에피소드 검색"""
        docs = self.store.similarity_search(query, k=k)
        return [doc.page_content for doc in docs]

# 사용 예시
memory = EpisodicMemory()
memory.store_episode(
    "사용자가 FastAPI 프로젝트 구조에 대해 질문했고, 성공적으로 답변했다",
    {"task_type": "coding", "success": True}
)

5. 도구 통합 (Tool Integration)

표준 도구 카테고리

에이전트가 사용하는 주요 도구들:

웹 검색: Tavily, SerpAPI, DuckDuckGo
코드 실행: Python REPL, Jupyter Kernel
파일 시스템: 파일 읽기/쓰기/검색
API 호출: REST, GraphQL
데이터베이스: SQL, 벡터 DB 쿼리
컴퓨터 제어: 화면 캡처, 클릭, 키보드 입력

커스텀 웹 검색 도구 구현

from langchain.tools import BaseTool
from pydantic import BaseModel, Field
import httpx
from typing import Optional

class WebSearchInput(BaseModel):
    query: str = Field(description="검색할 쿼리")
    max_results: int = Field(default=5, description="반환할 결과 수")

class TavilySearchTool(BaseTool):
    name: str = "web_search"
    description: str = "최신 정보를 웹에서 검색합니다. 실시간 정보가 필요할 때 사용하세요."
    args_schema: type[BaseModel] = WebSearchInput
    api_key: str = ""

    def _run(self, query: str, max_results: int = 5) -> str:
        url = "https://api.tavily.com/search"
        payload = {
            "api_key": self.api_key,
            "query": query,
            "max_results": max_results,
            "include_answer": True,
        }
        response = httpx.post(url, json=payload)
        data = response.json()

        results = []
        if data.get("answer"):
            results.append(f"요약: {data['answer']}\n")
        for r in data.get("results", []):
            results.append(f"- {r['title']}: {r['content'][:200]}...")
        return "\n".join(results)

    async def _arun(self, query: str, max_results: int = 5) -> str:
        # 비동기 버전
        async with httpx.AsyncClient() as client:
            response = await client.post(
                "https://api.tavily.com/search",
                json={"api_key": self.api_key, "query": query, "max_results": max_results}
            )
        return self._process_response(response.json())

MCP (Model Context Protocol)

MCP는 Anthropic이 2024년 말 발표한 표준화된 도구 통합 프로토콜입니다. 기존 Function Calling이 각 LLM마다 다른 형식을 사용했다면, MCP는 서버-클라이언트 모델로 도구를 표준화합니다.

MCP의 핵심 장점:

재사용성: 한 번 만든 MCP 서버를 어떤 LLM과도 연결 가능
풍부한 컨텍스트: Resources, Prompts, Tools 세 가지 추상화 제공
동적 발견: 런타임에 사용 가능한 도구 목록을 동적으로 조회

# MCP 서버 구현 예시 (Python SDK)
from mcp.server import Server
from mcp.server.stdio import stdio_server
from mcp import types

app = Server("my-tool-server")

@app.list_tools()
async def list_tools() -> list[types.Tool]:
    return [
        types.Tool(
            name="get_weather",
            description="특정 도시의 현재 날씨를 조회합니다",
            inputSchema={
                "type": "object",
                "properties": {
                    "city": {"type": "string", "description": "도시명"},
                    "units": {"type": "string", "enum": ["celsius", "fahrenheit"]}
                },
                "required": ["city"]
            }
        )
    ]

@app.call_tool()
async def call_tool(name: str, arguments: dict) -> list[types.TextContent]:
    if name == "get_weather":
        city = arguments["city"]
        # 실제 날씨 API 호출
        weather_data = await fetch_weather(city)
        return [types.TextContent(type="text", text=str(weather_data))]

async def main():
    async with stdio_server() as (read_stream, write_stream):
        await app.run(read_stream, write_stream, app.create_initialization_options())

6. LangGraph: 상태 기반 에이전트

LangGraph는 LangChain 팀이 만든 그래프 기반 에이전트 오케스트레이션 프레임워크입니다. 기존 LangChain Expression Language(LCEL)의 DAG와 달리, 사이클(cycle)을 지원하여 에이전트 루프를 자연스럽게 표현합니다.

상태 기반 에이전트 구현

from langgraph.graph import StateGraph, END
from langchain_anthropic import ChatAnthropic
from langchain_core.messages import HumanMessage, AIMessage, ToolMessage
from typing import TypedDict, Annotated, Sequence
import operator

# 1. 상태 정의
class AgentState(TypedDict):
    messages: Annotated[Sequence, operator.add]
    tool_calls: list
    iteration_count: int

# 2. LLM 및 도구 설정
llm = ChatAnthropic(model="claude-opus-4-5")
tools = [WebSearchTool(), PythonREPLTool()]
llm_with_tools = llm.bind_tools(tools)

# 3. 노드 정의
def call_model(state: AgentState) -> AgentState:
    """LLM 호출 노드"""
    response = llm_with_tools.invoke(state["messages"])
    return {
        "messages": [response],
        "iteration_count": state["iteration_count"] + 1
    }

def call_tools(state: AgentState) -> AgentState:
    """도구 실행 노드"""
    last_message = state["messages"][-1]
    tool_results = []

    for tool_call in last_message.tool_calls:
        tool = next(t for t in tools if t.name == tool_call["name"])
        result = tool.invoke(tool_call["args"])
        tool_results.append(
            ToolMessage(content=str(result), tool_call_id=tool_call["id"])
        )
    return {"messages": tool_results}

# 4. 라우팅 함수 (조건부 엣지)
def should_continue(state: AgentState) -> str:
    last_message = state["messages"][-1]
    # 도구 호출이 있으면 계속, 없으면 종료
    if hasattr(last_message, "tool_calls") and last_message.tool_calls:
        if state["iteration_count"] < 10:  # 무한 루프 방지
            return "tools"
    return "end"

# 5. 그래프 구성
graph = StateGraph(AgentState)
graph.add_node("agent", call_model)
graph.add_node("tools", call_tools)

graph.set_entry_point("agent")
graph.add_conditional_edges(
    "agent",
    should_continue,
    {"tools": "tools", "end": END}
)
graph.add_edge("tools", "agent")  # 도구 실행 후 다시 에이전트로

# 6. 메모리 체크포인트 추가
from langgraph.checkpoint.memory import MemorySaver
checkpointer = MemorySaver()
app = graph.compile(checkpointer=checkpointer)

# 실행 (thread_id로 대화 세션 관리)
config = {"configurable": {"thread_id": "session_001"}}
result = app.invoke(
    {"messages": [HumanMessage(content="2026년 AI 에이전트 트렌드를 검색하고 요약해줘")], "iteration_count": 0},
    config=config
)

7. 멀티에이전트 시스템

CrewAI: 역할 기반 멀티에이전트

from crewai import Agent, Task, Crew, Process
from crewai_tools import SerperDevTool, FileWriterTool

# 도구 설정
search_tool = SerperDevTool()
file_writer = FileWriterTool()

# 에이전트 정의
researcher = Agent(
    role="AI 리서처",
    goal="최신 AI 에이전트 기술 트렌드를 심층 조사한다",
    backstory="""당신은 AI 분야 전문 리서처입니다.
    최신 논문, 블로그, GitHub를 분석하여 핵심 인사이트를 추출합니다.""",
    tools=[search_tool],
    llm="claude-opus-4-5",
    verbose=True
)

writer = Agent(
    role="기술 작가",
    goal="리서치 결과를 읽기 쉬운 기술 보고서로 작성한다",
    backstory="""당신은 복잡한 AI 개념을 명확하게 설명하는 전문 작가입니다.""",
    tools=[file_writer],
    llm="claude-opus-4-5",
    verbose=True
)

# 태스크 정의
research_task = Task(
    description="2026년 LLM 에이전트 트렌드 Top 5를 조사하세요. 각 트렌드마다 구체적인 사례와 영향을 포함하세요.",
    expected_output="5개 트렌드의 상세 분석 (각 500자 이상)",
    agent=researcher
)

writing_task = Task(
    description="리서치 결과를 바탕으로 기술 블로그 포스트를 작성하세요.",
    expected_output="마크다운 형식의 2000자 기술 블로그 포스트",
    agent=writer,
    output_file="ai_trends_2026.md"
)

# Crew 실행 (순차 프로세스)
crew = Crew(
    agents=[researcher, writer],
    tasks=[research_task, writing_task],
    process=Process.sequential,
    verbose=True
)

result = crew.kickoff()

AutoGen: 대화 기반 멀티에이전트

AutoGen은 Microsoft가 만든 멀티에이전트 프레임워크로, 에이전트 간 대화를 통한 협업이 특징입니다.

import autogen

config_list = [{"model": "claude-opus-4-5", "api_key": "YOUR_KEY"}]

# 오케스트레이터 에이전트
orchestrator = autogen.AssistantAgent(
    name="Orchestrator",
    system_message="""당신은 팀을 조율하는 오케스트레이터입니다.
    작업을 분석하고 적절한 전문가 에이전트에게 위임합니다.
    모든 결과를 통합하여 최종 답변을 생성합니다.""",
    llm_config={"config_list": config_list}
)

# 코드 실행 에이전트
coder = autogen.AssistantAgent(
    name="Coder",
    system_message="당신은 Python 코드를 작성하고 실행하는 전문가입니다.",
    llm_config={"config_list": config_list, "functions": [...]}
)

# 사용자 프록시 (코드 실행 담당)
user_proxy = autogen.UserProxyAgent(
    name="UserProxy",
    human_input_mode="NEVER",
    max_consecutive_auto_reply=10,
    code_execution_config={"work_dir": "coding", "use_docker": False}
)

# 그룹 채팅 실행
groupchat = autogen.GroupChat(
    agents=[orchestrator, coder, user_proxy],
    messages=[],
    max_round=12
)
manager = autogen.GroupChatManager(groupchat=groupchat)
user_proxy.initiate_chat(manager, message="데이터 시각화 코드를 작성해줘")

8. Claude API Tool Use 구현

import anthropic
import json

client = anthropic.Anthropic()

# 도구 정의
tools = [
    {
        "name": "get_stock_price",
        "description": "특정 종목의 현재 주가와 변동률을 조회합니다",
        "input_schema": {
            "type": "object",
            "properties": {
                "symbol": {
                    "type": "string",
                    "description": "주식 종목 코드 (예: AAPL, MSFT)"
                },
                "currency": {
                    "type": "string",
                    "enum": ["USD", "KRW"],
                    "description": "표시 통화"
                }
            },
            "required": ["symbol"]
        }
    }
]

def process_tool_call(tool_name: str, tool_input: dict) -> str:
    """도구 실행 로직"""
    if tool_name == "get_stock_price":
        # 실제 API 호출 (예시)
        return json.dumps({
            "symbol": tool_input["symbol"],
            "price": 185.92,
            "change_percent": "+2.3%",
            "currency": tool_input.get("currency", "USD")
        })

# 에이전트 루프
messages = [{"role": "user", "content": "Apple 주가를 알려줘"}]

while True:
    response = client.messages.create(
        model="claude-opus-4-5",
        max_tokens=4096,
        tools=tools,
        messages=messages
    )

    messages.append({"role": "assistant", "content": response.content})

    if response.stop_reason == "end_turn":
        # 최종 텍스트 응답 추출
        final_text = next(
            block.text for block in response.content
            if hasattr(block, "text")
        )
        print(f"최종 답변: {final_text}")
        break

    if response.stop_reason == "tool_use":
        tool_results = []
        for block in response.content:
            if block.type == "tool_use":
                result = process_tool_call(block.name, block.input)
                tool_results.append({
                    "type": "tool_result",
                    "tool_use_id": block.id,
                    "content": result
                })

        messages.append({"role": "user", "content": tool_results})

9. 에이전트 평가

주요 벤치마크

벤치마크	측정 영역	특징
AgentBench	8개 환경(OS, DB, 게임 등)	실제 환경 기반 평가
GAIA	일반 AI 보조 능력	인간 수준 비교
SWE-bench	소프트웨어 엔지니어링	실제 GitHub 이슈 해결
WebArena	웹 탐색 능력	실제 웹사이트 조작
OSWorld	컴퓨터 사용 능력	GUI 상호작용

Trajectory Evaluation vs Outcome Evaluation

에이전트 평가에는 두 가지 핵심 관점이 있습니다:

Outcome Evaluation (결과 평가):

최종 목표 달성 여부만 측정
Pass@k, Success Rate
단순하지만 과정을 무시

Trajectory Evaluation (경로 평가):

목표 달성 과정(행동 시퀀스)을 평가
효율성, 안전성, 부작용 없음을 함께 측정
프로덕션 환경에서 필수

from dataclasses import dataclass
from typing import List, Optional

@dataclass
class AgentTrajectory:
    task: str
    steps: List[dict]  # {"thought": ..., "action": ..., "observation": ...}
    final_answer: str
    success: bool
    total_tokens: int

def evaluate_trajectory(trajectory: AgentTrajectory) -> dict:
    """경로 기반 에이전트 평가"""
    metrics = {
        "task_success": trajectory.success,
        "efficiency": calculate_efficiency(trajectory.steps),
        "redundant_steps": count_redundant_steps(trajectory.steps),
        "error_recovery": check_error_recovery(trajectory.steps),
        "tool_usage_appropriateness": evaluate_tool_usage(trajectory.steps),
        "cost_efficiency": 1000 / trajectory.total_tokens  # 토큰당 효율
    }
    return metrics

def count_redundant_steps(steps: List[dict]) -> int:
    """불필요한 중복 도구 호출 수"""
    seen_actions = set()
    redundant = 0
    for step in steps:
        action_key = f"{step.get('action_type')}:{step.get('action_input')}"
        if action_key in seen_actions:
            redundant += 1
        seen_actions.add(action_key)
    return redundant

주요 에이전트 실패 모드

무한 루프: 목표 달성 조건을 잘못 설정해 반복
도구 환각: 존재하지 않는 도구나 파라미터를 호출
컨텍스트 드리프트: 긴 세션에서 초기 목표를 잊음
과도한 계획: 단순한 작업에 불필요한 계획 수립
도구 남용: 필요 없는 도구를 계속 호출

10. 2026년 트렌드: Computer-use & Coding Agents

Computer-use Agents

Claude의 Computer Use API와 GPT-4o의 컴퓨터 제어 기능은 에이전트가 실제 컴퓨터 화면을 보고 조작할 수 있게 합니다.

import anthropic
import base64
from PIL import ImageGrab

def take_screenshot() -> str:
    """화면 캡처 후 base64 인코딩"""
    screenshot = ImageGrab.grab()
    screenshot.save("/tmp/screenshot.png")
    with open("/tmp/screenshot.png", "rb") as f:
        return base64.b64encode(f.read()).decode()

client = anthropic.Anthropic()

# Computer-use 에이전트
response = client.messages.create(
    model="claude-opus-4-5",
    max_tokens=4096,
    tools=[
        {"type": "computer_20241022", "name": "computer", "display_width_px": 1920, "display_height_px": 1080},
        {"type": "text_editor_20241022", "name": "str_replace_editor"},
        {"type": "bash_20241022", "name": "bash"}
    ],
    messages=[{
        "role": "user",
        "content": [
            {"type": "text", "text": "브라우저를 열고 GitHub 최신 트렌딩 저장소를 확인해줘"},
            {"type": "image", "source": {"type": "base64", "media_type": "image/png", "data": take_screenshot()}}
        ]
    }]
)

Coding Agents: Devin과 SWE-agent

SWE-bench 기준으로 2026년 최신 코딩 에이전트 성능:

에이전트	SWE-bench Verified	특징
Claude Code	~72%	터미널 통합, 코드베이스 이해
Devin 2.0	~65%	전체 개발 워크플로우
SWE-agent	~58%	오픈소스, 연구용
Aider	~55%	로컬 코드베이스 특화

11. OpenAI Assistants API

from openai import OpenAI
import time

client = OpenAI()

# Assistants 생성 (도구 + 지식베이스 포함)
assistant = client.beta.assistants.create(
    name="AI 기술 분석가",
    instructions="당신은 AI/ML 기술 전문가입니다. 최신 논문과 기술 문서를 분석하여 인사이트를 제공합니다.",
    model="gpt-4o",
    tools=[
        {"type": "file_search"},   # 파일 기반 RAG
        {"type": "code_interpreter"}  # 코드 실행
    ]
)

# 파일 업로드 및 벡터 저장소 생성
vector_store = client.beta.vector_stores.create(name="AI 논문 저장소")
with open("ai_papers_2026.pdf", "rb") as f:
    client.beta.vector_stores.file_batches.upload_and_poll(
        vector_store_id=vector_store.id,
        files=[("ai_papers_2026.pdf", f)]
    )

# Thread 생성 및 메시지 추가
thread = client.beta.threads.create()
client.beta.threads.messages.create(
    thread_id=thread.id,
    role="user",
    content="업로드된 논문에서 Agentic AI의 주요 한계점을 분석해줘"
)

# Run 실행 및 결과 대기
run = client.beta.threads.runs.create(
    thread_id=thread.id,
    assistant_id=assistant.id
)

while run.status in ["queued", "in_progress"]:
    run = client.beta.threads.runs.retrieve(thread_id=thread.id, run_id=run.id)
    time.sleep(1)

# 결과 출력
messages = client.beta.threads.messages.list(thread_id=thread.id)
print(messages.data[0].content[0].text.value)

퀴즈: 핵심 개념 확인

Q1. ReAct 프레임워크에서 Thought-Action-Observation 사이클이 환각을 줄이는 원리는?

정답: 실시간 외부 근거를 통한 단계적 검증

설명: 일반 LLM은 한 번에 전체 답변을 생성하므로 중간 과정에서 사실을 "지어낼" 수 있습니다. ReAct는 각 추론 단계마다 실제 도구(검색, 계산 등)를 호출하고 Observation으로 근거를 확인합니다. 이 실제 결과가 "사실 앵커" 역할을 하여 이후 추론이 근거 없이 이탈하는 것을 방지합니다. 또한 중간 단계를 명시적으로 기록하므로 오류가 발생한 지점을 쉽게 파악하고 수정할 수 있습니다.

Q2. LangGraph에서 사이클(cycle)이 있는 그래프가 DAG 기반 LangChain과 다른 점은?

정답: 상태 기반 반복 실행과 동적 라우팅 가능

설명: LangChain의 LCEL은 방향성 비순환 그래프(DAG)로, 한 번 실행되면 되돌아올 수 없습니다. LangGraph는 사이클을 지원하여 "도구 호출 → 결과 확인 → 재시도" 같은 에이전트 루프를 자연스럽게 표현합니다. 조건부 엣지(conditional edge)로 현재 상태에 따라 다음 노드를 동적으로 결정하고, 체크포인트(checkpointer)로 상태를 영구 저장하여 세션 간 메모리를 유지합니다. 이는 인간의 "시도-오류-수정" 사고 과정을 코드로 표현한 것입니다.

Q3. MCP(Model Context Protocol)가 기존 Function Calling보다 유연한 이유는?

정답: 표준화된 서버-클라이언트 아키텍처로 LLM 독립적 도구 생태계 형성

설명: 기존 Function Calling은 OpenAI, Anthropic, Google 각자 다른 형식을 사용하므로 특정 LLM에 종속됩니다. MCP는 stdio나 HTTP 기반 표준 프로토콜을 정의하여, 한 번 만든 MCP 서버를 모든 MCP 지원 클라이언트(Claude, Cursor, VS Code 등)에서 재사용할 수 있습니다. 또한 Tools(실행 가능한 함수) 외에 Resources(파일, DB 등 데이터)와 Prompts(재사용 가능한 프롬프트 템플릿) 추상화를 제공하여 더 풍부한 컨텍스트를 에이전트에 제공합니다.

Q4. 멀티에이전트 시스템에서 오케스트레이터와 실행 에이전트를 분리하는 이점은?

정답: 관심사 분리, 전문화, 병렬 처리, 오류 격리

설명: 오케스트레이터는 전체 계획 수립과 조율에만 집중하고, 실행 에이전트는 특정 도메인(코딩, 검색, 글쓰기 등)에 특화됩니다. 이점은 다음과 같습니다: (1) 각 에이전트를 독립적으로 최적화할 수 있음, (2) 여러 실행 에이전트가 병렬로 작업 가능하여 속도 향상, (3) 한 에이전트의 실패가 전체 시스템을 멈추지 않음(오류 격리), (4) 새로운 전문 에이전트를 쉽게 추가 가능(확장성), (5) 각 에이전트의 행동을 독립적으로 감사/로깅 가능.

Q5. 에이전트 평가에서 trajectory evaluation과 outcome evaluation의 차이는?

정답: Outcome은 최종 성공 여부, Trajectory는 과정의 효율성과 안전성까지 평가

설명: Outcome Evaluation은 목표 달성 여부만 측정합니다(0 또는 1). 간단하지만 나쁜 과정으로 올바른 결과에 도달하거나, 부작용이 있어도 통과됩니다. Trajectory Evaluation은 전체 행동 시퀀스를 분석합니다: 불필요한 단계가 없는지(효율성), 안전하지 않은 행동은 없는지(안전성), 오류를 적절히 복구했는지, 토큰/API 비용이 합리적인지 등을 종합 평가합니다. 프로덕션 에이전트는 "목표를 달성했더라도 과도한 비용이나 부작용이 있으면 실패"로 판단해야 하므로 Trajectory Evaluation이 필수적입니다.

마치며

LLM 에이전트는 이제 연구 단계를 넘어 실제 프로덕션에서 가치를 만들고 있습니다. 2026년 핵심 트렌드:

Computer-use 에이전트: 화면을 보고 직접 조작하는 범용 에이전트
장기 메모리 표준화: mem0, Zep 같은 메모리 레이어의 보편화
MCP 생태계 확장: 수천 개의 MCP 서버와 도구
에이전트 안전성: 에이전트 행동 감사, 권한 제한, 인간 감독
멀티모달 에이전트: 텍스트, 이미지, 오디오를 통합 처리

다음 단계로 LangGraph를 활용한 프로덕션 에이전트 구축이나 MCP 서버 개발에 도전해보세요!