LLM Agent 설계 패턴 완전 가이드: ReAct부터 Multi-Agent까지

Agent란 무엇인가? (명확한 정의)
Pattern 1: ReAct (Reason + Act)
Pattern 2: Chain of Thought (CoT)
Pattern 3: Plan-and-Execute
Pattern 4: Reflection과 Self-Critique
Pattern 5: Tree of Thoughts (ToT)
어떤 패턴을 언제 선택하는가
흔한 실수와 해결법
마치며

AI 에이전트를 처음 만들 때 가장 흔한 실수는 "일단 GPT-4 붙이고 보자"는 접근입니다. 그런데 막상 만들다 보면 에이전트가 같은 툴을 무한 반복하거나, 엉뚱한 답을 확신에 차서 내놓거나, context 창이 넘쳐서 터지는 상황을 마주하게 됩니다.

이 글은 현업에서 검증된 LLM Agent 설계 패턴 5가지를 다룹니다. 각 패턴의 원리, 코드, 그리고 솔직한 장단점까지.

Agent란 무엇인가? (명확한 정의)

많은 정의가 있지만 가장 실용적인 정의는 이것입니다:

"An agent is an LLM that can take actions, observe results, and decide what to do next — in a loop."

에이전트 = 반복 루프 안에서 행동하는 LLM. 그게 전부입니다.

Perception (입력 받기)
    → Reasoning (LLM이 생각)
    → Action (툴 사용)
    → Observation (결과 확인)
    → back to Reasoning (다시 생각)

이 루프가 핵심입니다. 단순한 LLM 호출과 에이전트의 차이는 피드백 루프의 유무입니다.

Pattern 1: ReAct (Reason + Act)

2022년 Yao et al.이 발표한 패턴으로, 현재 가장 널리 쓰이는 에이전트 구조입니다. 이름 그대로 **Reason(추론)**과 **Act(행동)**을 교대로 수행합니다.

작동 원리

LLM에게 특정 형식으로 생각하도록 강제합니다:

Thought: 무엇을 해야 하는지 추론
Action: tool_name(params)
Observation: [툴 실행 결과]
Thought: 결과를 보고 다음 행동 결정
... (반복)
Final Answer: 최종 답변

실제 구현

system_prompt = """
You have access to these tools: [search, calculator, code_executor]

Always follow this exact format:
Thought: I need to...
Action: tool_name(params)
Observation: [tool result]
... (repeat as needed)
Final Answer: ...

Never skip the Thought step. Never fabricate Observations.
"""

# 실제 트레이스 예시:
# Question: "2023년 한국 GDP를 인구로 나누면 얼마인가?"
# Thought: 한국의 GDP와 인구를 찾아야 한다
# Action: search("South Korea GDP 2023")
# Observation: "South Korea GDP 2023: $1.71 trillion USD"
# Thought: 이제 인구가 필요하다
# Action: search("South Korea population 2024")
# Observation: "51.7 million"
# Thought: 이제 계산
# Action: calculator("1710000000000 / 51700000")
# Observation: "33075"
# Final Answer: 약 $33,075 (1인당 GDP)

툴 정의와 파싱 로직을 포함한 실제 구현:

import re
from typing import Callable

class ReActAgent:
    def __init__(self, llm, tools: dict[str, Callable], max_iterations=10):
        self.llm = llm
        self.tools = tools
        self.max_iterations = max_iterations

    def run(self, question: str) -> str:
        messages = [
            {"role": "system", "content": self._build_system_prompt()},
            {"role": "user", "content": question}
        ]

        for i in range(self.max_iterations):
            response = self.llm.invoke(messages)
            content = response.content

            # Final Answer가 나오면 종료
            if "Final Answer:" in content:
                return content.split("Final Answer:")[-1].strip()

            # Action 파싱
            action_match = re.search(r"Action: (\w+)\((.*?)\)", content)
            if not action_match:
                return content  # 포맷 실패 시 그대로 반환

            tool_name = action_match.group(1)
            tool_args = action_match.group(2)

            # 툴 실행
            if tool_name not in self.tools:
                observation = f"Error: tool '{tool_name}' not found"
            else:
                try:
                    observation = self.tools[tool_name](tool_args)
                except Exception as e:
                    observation = f"Error executing tool: {str(e)}"

            # Observation을 메시지에 추가
            messages.append({"role": "assistant", "content": content})
            messages.append({"role": "user", "content": f"Observation: {observation}"})

        return "Max iterations reached without final answer"

    def _build_system_prompt(self):
        tool_names = list(self.tools.keys())
        return f"""You have access to these tools: {tool_names}
Always use this format:
Thought: [your reasoning]
Action: tool_name(params)
Observation: [will be filled in]
Final Answer: [when done]"""

ReAct의 장단점

장점: 추론 과정이 투명하게 보임 — 디버깅이 쉬움. 대부분의 작업에 잘 맞음.

단점: 토큰 소비가 많음. 같은 Thought 패턴이 반복될 수 있음 (hallucination 위험).

Pattern 2: Chain of Thought (CoT)

도구 없이 복잡한 추론만 필요할 때 쓰는 패턴입니다.

Zero-shot CoT

가장 간단한 형태 — 프롬프트에 "Let's think step by step"만 붙여도 됩니다:

response = llm.invoke(
    "Q: A bat and a ball cost $1.10 total. The bat costs $1.00 more than the ball. "
    "How much does the ball cost?\n\nLet's think step by step."
)
# LLM이 단계별 추론을 하게 됨

Few-shot CoT

예시를 제공해서 추론 형식을 보여줍니다:

few_shot_examples = """
Q: If 5 machines take 5 minutes to make 5 widgets, how long does it take 100 machines to make 100 widgets?
A:
Step 1: One machine makes 1 widget in 5 minutes.
Step 2: 100 machines each making 1 widget simultaneously = 5 minutes.
Answer: 5 minutes.

Q: [new question here]
A:
"""

ReAct vs CoT: 언제 무엇을?

외부 정보나 계산이 필요 없는 순수 추론 → CoT
실시간 정보, 계산, 코드 실행이 필요 → ReAct
단순 Q&A → 둘 다 필요 없음, 그냥 LLM 호출

Pattern 3: Plan-and-Execute

복잡한 장기 작업에 적합한 패턴입니다. 계획을 먼저 세우고(Plan), 그 다음 실행합니다(Execute).

class PlanAndExecuteAgent:
    def __init__(self, planner_llm, executor_llm, tools):
        self.planner = planner_llm
        self.executor = executor_llm
        self.tools = tools

    def run(self, goal: str) -> str:
        # 1단계: 계획 수립 (강력한 LLM 사용)
        plan_prompt = f"""
        Goal: {goal}

        Create a step-by-step plan to achieve this goal.
        Return a numbered list of concrete, executable steps.
        Each step should be specific enough to execute independently.
        """
        plan_response = self.planner.invoke(plan_prompt)
        steps = self._parse_plan(plan_response)

        print(f"Plan created: {len(steps)} steps")

        # 2단계: 각 단계 실행 (저렴한 LLM 사용 가능)
        results = []
        for i, step in enumerate(steps):
            print(f"Executing step {i+1}: {step}")

            result = self.executor.invoke(
                f"Execute this step: {step}\n\n"
                f"Context from previous steps: {results}\n\n"
                f"Available tools: {list(self.tools.keys())}"
            )
            results.append({"step": step, "result": result})

        # 3단계: 최종 정리
        summary = self.planner.invoke(
            f"Goal: {goal}\n\nResults: {results}\n\nSummarize the final outcome."
        )
        return summary

이 패턴의 핵심 장점: 계획 단계와 실행 단계에 서로 다른 모델을 쓸 수 있습니다. 계획은 GPT-4로, 실행은 GPT-3.5로 — 비용 절감이 됩니다.

단점: 초기 계획이 잘못되면 전체가 잘못됩니다. 동적으로 계획을 수정하는 로직이 필요합니다.

Pattern 4: Reflection과 Self-Critique

에이전트가 자신의 출력을 스스로 비판하고 개선하는 패턴입니다.

class ReflectionAgent:
    def __init__(self, llm):
        self.llm = llm

    def run(self, question: str, num_reflections: int = 2) -> str:
        # 초기 답변 생성
        answer = self.llm.invoke(f"Answer this question: {question}")

        for i in range(num_reflections):
            # 자기 비판
            critique = self.llm.invoke(f"""
            Question: {question}
            Answer: {answer}

            Critically evaluate this answer:
            1. Is it factually correct?
            2. Is it complete? What's missing?
            3. Is it clearly explained?
            4. Are there any logical errors?

            Be specific and constructive.
            """)

            print(f"Critique {i+1}: {critique[:200]}...")

            # 비판을 반영해서 개선
            answer = self.llm.invoke(f"""
            Original question: {question}
            Previous answer: {answer}
            Critique: {critique}

            Now provide an improved answer that addresses the critique.
            """)

        return answer

이 패턴은 단순한 것 같지만 실제로 품질 향상이 큽니다. 특히 코드 생성, 문서 작성, 복잡한 분석에서 효과적입니다.

주의: Reflection을 너무 많이 반복하면 과도한 자기 비판으로 오히려 답변 품질이 나빠질 수 있습니다. 보통 2-3회가 적당합니다.

Pattern 5: Tree of Thoughts (ToT)

2023년에 나온 패턴으로, 여러 추론 경로를 동시에 탐색합니다. 체스 엔진처럼 여러 수를 시뮬레이션하는 방식입니다.

from typing import List
import asyncio

class TreeOfThoughts:
    def __init__(self, llm, branching_factor=3, max_depth=4):
        self.llm = llm
        self.branching_factor = branching_factor
        self.max_depth = max_depth

    async def solve(self, problem: str) -> str:
        root = {"thought": problem, "score": 1.0, "children": []}

        # BFS로 탐색
        queue = [root]
        best_leaf = None
        best_score = 0

        for depth in range(self.max_depth):
            next_queue = []

            for node in queue:
                # 각 노드에서 여러 다음 생각 생성
                children = await self._generate_thoughts(node["thought"])

                for child_thought in children:
                    # 각 경로 평가
                    score = await self._evaluate_thought(child_thought, problem)
                    child_node = {
                        "thought": child_thought,
                        "score": score,
                        "children": []
                    }
                    node["children"].append(child_node)
                    next_queue.append(child_node)

                    if score > best_score:
                        best_score = score
                        best_leaf = child_node

            # 상위 k개 경로만 계속 탐색 (빔 서치)
            queue = sorted(next_queue, key=lambda x: x["score"], reverse=True)[:self.branching_factor]

        # 최고 경로로 최종 답변 생성
        return await self._generate_final_answer(best_leaf["thought"], problem)

    async def _generate_thoughts(self, current_thought: str) -> List[str]:
        response = await self.llm.ainvoke(f"""
        Current reasoning: {current_thought}
        Generate {self.branching_factor} different ways to continue this reasoning.
        Each should be a distinct approach.
        Return as numbered list.
        """)
        return self._parse_numbered_list(response)

    async def _evaluate_thought(self, thought: str, problem: str) -> float:
        response = await self.llm.ainvoke(f"""
        Problem: {problem}
        Reasoning path: {thought}
        Rate this reasoning path from 0.0 to 1.0.
        Return only the number.
        """)
        try:
            return float(response.strip())
        except:
            return 0.5

ToT를 써야 할 때:

수학 문제 (여러 풀이 방법 탐색)
복잡한 계획 수립 (여러 전략 비교)
창의적 글쓰기 (다양한 방향 탐색)

쓰지 말아야 할 때:

단순한 정보 검색
속도가 중요할 때 (API 비용과 레이턴시가 매우 큼)

어떤 패턴을 언제 선택하는가

상황	추천 패턴
도구를 써서 정보 수집, 계산 필요	ReAct
순수 추론, 도구 없음	Chain of Thought
장기 프로젝트, 여러 단계 작업	Plan-and-Execute
품질이 중요한 문서/코드 생성	Reflection
복잡한 수학/계획, 비용 상관없음	Tree of Thoughts
여러 전문가가 필요한 작업	Multi-Agent (다음 글 참고)

흔한 실수와 해결법

현업에서 실제로 마주치는 문제들입니다.

1. Tool Call 무한 루프

에이전트가 같은 툴을 계속 부릅니다. 보통 에러 처리가 없거나 Observation을 무시할 때 발생합니다.

# 반드시 max_iterations와 loop detection 추가
MAX_ITERATIONS = 15
seen_actions = set()

for i in range(MAX_ITERATIONS):
    action = agent.get_next_action()
    action_key = f"{action.tool}:{action.args}"

    if action_key in seen_actions:
        return "Error: Agent stuck in loop. Breaking out."
    seen_actions.add(action_key)

2. Context 창 오버플로우

긴 대화에서 이전 Observation들이 쌓여서 context 창을 초과합니다.

def trim_messages_to_fit(messages, max_tokens=100000):
    """오래된 Observation부터 제거"""
    while count_tokens(messages) > max_tokens:
        # 첫 번째 Observation 쌍 제거 (system 메시지는 유지)
        for i, msg in enumerate(messages[1:], 1):
            if "Observation:" in msg.get("content", ""):
                messages.pop(i)
                messages.pop(i-1)  # 해당 Action도 제거
                break
    return messages

3. Tool 에러를 무시

# 나쁜 예
result = tool.run(args)

# 좋은 예: 에러를 LLM에게 알려줘서 대응하게 함
try:
    result = tool.run(args)
except Exception as e:
    result = f"Tool error: {str(e)}. Try a different approach."
    # LLM이 이 에러를 보고 다른 방법을 시도함

4. Timeout 없음

외부 API가 느리거나 죽으면 에이전트가 영원히 기다립니다.

import asyncio

async def run_tool_with_timeout(tool, args, timeout=30):
    try:
        return await asyncio.wait_for(tool.arun(args), timeout=timeout)
    except asyncio.TimeoutError:
        return f"Tool timed out after {timeout}s. Try a different approach."

마치며

패턴을 아는 것보다 언제 어떤 패턴을 쓰는지 아는 것이 더 중요합니다. 처음에는 ReAct부터 시작하세요 — 대부분의 사용 사례를 커버합니다. 복잡도가 올라갈수록 Plan-and-Execute, Reflection을 추가하고, 품질이 중요한 경우에 ToT를 고려하세요.

다음 글에서는 이 패턴들을 연결하는 **MCP(Model Context Protocol)**를 다룹니다 — AI와 외부 세계를 연결하는 새로운 표준입니다.