Split View: LLM Agent 설계 패턴 완전 가이드: ReAct부터 Multi-Agent까지
LLM Agent 설계 패턴 완전 가이드: ReAct부터 Multi-Agent까지
- Agent란 무엇인가? (명확한 정의)
- Pattern 1: ReAct (Reason + Act)
- Pattern 2: Chain of Thought (CoT)
- Pattern 3: Plan-and-Execute
- Pattern 4: Reflection과 Self-Critique
- Pattern 5: Tree of Thoughts (ToT)
- 어떤 패턴을 언제 선택하는가
- 흔한 실수와 해결법
- 마치며
AI 에이전트를 처음 만들 때 가장 흔한 실수는 "일단 GPT-4 붙이고 보자"는 접근입니다. 그런데 막상 만들다 보면 에이전트가 같은 툴을 무한 반복하거나, 엉뚱한 답을 확신에 차서 내놓거나, context 창이 넘쳐서 터지는 상황을 마주하게 됩니다.
이 글은 현업에서 검증된 LLM Agent 설계 패턴 5가지를 다룹니다. 각 패턴의 원리, 코드, 그리고 솔직한 장단점까지.
Agent란 무엇인가? (명확한 정의)
많은 정의가 있지만 가장 실용적인 정의는 이것입니다:
"An agent is an LLM that can take actions, observe results, and decide what to do next — in a loop."
에이전트 = 반복 루프 안에서 행동하는 LLM. 그게 전부입니다.
Perception (입력 받기)
→ Reasoning (LLM이 생각)
→ Action (툴 사용)
→ Observation (결과 확인)
→ back to Reasoning (다시 생각)
이 루프가 핵심입니다. 단순한 LLM 호출과 에이전트의 차이는 피드백 루프의 유무입니다.
Pattern 1: ReAct (Reason + Act)
2022년 Yao et al.이 발표한 패턴으로, 현재 가장 널리 쓰이는 에이전트 구조입니다. 이름 그대로 **Reason(추론)**과 **Act(행동)**을 교대로 수행합니다.
작동 원리
LLM에게 특정 형식으로 생각하도록 강제합니다:
Thought: 무엇을 해야 하는지 추론
Action: tool_name(params)
Observation: [툴 실행 결과]
Thought: 결과를 보고 다음 행동 결정
... (반복)
Final Answer: 최종 답변
실제 구현
system_prompt = """
You have access to these tools: [search, calculator, code_executor]
Always follow this exact format:
Thought: I need to...
Action: tool_name(params)
Observation: [tool result]
... (repeat as needed)
Final Answer: ...
Never skip the Thought step. Never fabricate Observations.
"""
# 실제 트레이스 예시:
# Question: "2023년 한국 GDP를 인구로 나누면 얼마인가?"
# Thought: 한국의 GDP와 인구를 찾아야 한다
# Action: search("South Korea GDP 2023")
# Observation: "South Korea GDP 2023: $1.71 trillion USD"
# Thought: 이제 인구가 필요하다
# Action: search("South Korea population 2024")
# Observation: "51.7 million"
# Thought: 이제 계산
# Action: calculator("1710000000000 / 51700000")
# Observation: "33075"
# Final Answer: 약 $33,075 (1인당 GDP)
툴 정의와 파싱 로직을 포함한 실제 구현:
import re
from typing import Callable
class ReActAgent:
def __init__(self, llm, tools: dict[str, Callable], max_iterations=10):
self.llm = llm
self.tools = tools
self.max_iterations = max_iterations
def run(self, question: str) -> str:
messages = [
{"role": "system", "content": self._build_system_prompt()},
{"role": "user", "content": question}
]
for i in range(self.max_iterations):
response = self.llm.invoke(messages)
content = response.content
# Final Answer가 나오면 종료
if "Final Answer:" in content:
return content.split("Final Answer:")[-1].strip()
# Action 파싱
action_match = re.search(r"Action: (\w+)\((.*?)\)", content)
if not action_match:
return content # 포맷 실패 시 그대로 반환
tool_name = action_match.group(1)
tool_args = action_match.group(2)
# 툴 실행
if tool_name not in self.tools:
observation = f"Error: tool '{tool_name}' not found"
else:
try:
observation = self.tools[tool_name](tool_args)
except Exception as e:
observation = f"Error executing tool: {str(e)}"
# Observation을 메시지에 추가
messages.append({"role": "assistant", "content": content})
messages.append({"role": "user", "content": f"Observation: {observation}"})
return "Max iterations reached without final answer"
def _build_system_prompt(self):
tool_names = list(self.tools.keys())
return f"""You have access to these tools: {tool_names}
Always use this format:
Thought: [your reasoning]
Action: tool_name(params)
Observation: [will be filled in]
Final Answer: [when done]"""
ReAct의 장단점
장점: 추론 과정이 투명하게 보임 — 디버깅이 쉬움. 대부분의 작업에 잘 맞음.
단점: 토큰 소비가 많음. 같은 Thought 패턴이 반복될 수 있음 (hallucination 위험).
Pattern 2: Chain of Thought (CoT)
도구 없이 복잡한 추론만 필요할 때 쓰는 패턴입니다.
Zero-shot CoT
가장 간단한 형태 — 프롬프트에 "Let's think step by step"만 붙여도 됩니다:
response = llm.invoke(
"Q: A bat and a ball cost $1.10 total. The bat costs $1.00 more than the ball. "
"How much does the ball cost?\n\nLet's think step by step."
)
# LLM이 단계별 추론을 하게 됨
Few-shot CoT
예시를 제공해서 추론 형식을 보여줍니다:
few_shot_examples = """
Q: If 5 machines take 5 minutes to make 5 widgets, how long does it take 100 machines to make 100 widgets?
A:
Step 1: One machine makes 1 widget in 5 minutes.
Step 2: 100 machines each making 1 widget simultaneously = 5 minutes.
Answer: 5 minutes.
Q: [new question here]
A:
"""
ReAct vs CoT: 언제 무엇을?
- 외부 정보나 계산이 필요 없는 순수 추론 → CoT
- 실시간 정보, 계산, 코드 실행이 필요 → ReAct
- 단순 Q&A → 둘 다 필요 없음, 그냥 LLM 호출
Pattern 3: Plan-and-Execute
복잡한 장기 작업에 적합한 패턴입니다. 계획을 먼저 세우고(Plan), 그 다음 실행합니다(Execute).
class PlanAndExecuteAgent:
def __init__(self, planner_llm, executor_llm, tools):
self.planner = planner_llm
self.executor = executor_llm
self.tools = tools
def run(self, goal: str) -> str:
# 1단계: 계획 수립 (강력한 LLM 사용)
plan_prompt = f"""
Goal: {goal}
Create a step-by-step plan to achieve this goal.
Return a numbered list of concrete, executable steps.
Each step should be specific enough to execute independently.
"""
plan_response = self.planner.invoke(plan_prompt)
steps = self._parse_plan(plan_response)
print(f"Plan created: {len(steps)} steps")
# 2단계: 각 단계 실행 (저렴한 LLM 사용 가능)
results = []
for i, step in enumerate(steps):
print(f"Executing step {i+1}: {step}")
result = self.executor.invoke(
f"Execute this step: {step}\n\n"
f"Context from previous steps: {results}\n\n"
f"Available tools: {list(self.tools.keys())}"
)
results.append({"step": step, "result": result})
# 3단계: 최종 정리
summary = self.planner.invoke(
f"Goal: {goal}\n\nResults: {results}\n\nSummarize the final outcome."
)
return summary
이 패턴의 핵심 장점: 계획 단계와 실행 단계에 서로 다른 모델을 쓸 수 있습니다. 계획은 GPT-4로, 실행은 GPT-3.5로 — 비용 절감이 됩니다.
단점: 초기 계획이 잘못되면 전체가 잘못됩니다. 동적으로 계획을 수정하는 로직이 필요합니다.
Pattern 4: Reflection과 Self-Critique
에이전트가 자신의 출력을 스스로 비판하고 개선하는 패턴입니다.
class ReflectionAgent:
def __init__(self, llm):
self.llm = llm
def run(self, question: str, num_reflections: int = 2) -> str:
# 초기 답변 생성
answer = self.llm.invoke(f"Answer this question: {question}")
for i in range(num_reflections):
# 자기 비판
critique = self.llm.invoke(f"""
Question: {question}
Answer: {answer}
Critically evaluate this answer:
1. Is it factually correct?
2. Is it complete? What's missing?
3. Is it clearly explained?
4. Are there any logical errors?
Be specific and constructive.
""")
print(f"Critique {i+1}: {critique[:200]}...")
# 비판을 반영해서 개선
answer = self.llm.invoke(f"""
Original question: {question}
Previous answer: {answer}
Critique: {critique}
Now provide an improved answer that addresses the critique.
""")
return answer
이 패턴은 단순한 것 같지만 실제로 품질 향상이 큽니다. 특히 코드 생성, 문서 작성, 복잡한 분석에서 효과적입니다.
주의: Reflection을 너무 많이 반복하면 과도한 자기 비판으로 오히려 답변 품질이 나빠질 수 있습니다. 보통 2-3회가 적당합니다.
Pattern 5: Tree of Thoughts (ToT)
2023년에 나온 패턴으로, 여러 추론 경로를 동시에 탐색합니다. 체스 엔진처럼 여러 수를 시뮬레이션하는 방식입니다.
from typing import List
import asyncio
class TreeOfThoughts:
def __init__(self, llm, branching_factor=3, max_depth=4):
self.llm = llm
self.branching_factor = branching_factor
self.max_depth = max_depth
async def solve(self, problem: str) -> str:
root = {"thought": problem, "score": 1.0, "children": []}
# BFS로 탐색
queue = [root]
best_leaf = None
best_score = 0
for depth in range(self.max_depth):
next_queue = []
for node in queue:
# 각 노드에서 여러 다음 생각 생성
children = await self._generate_thoughts(node["thought"])
for child_thought in children:
# 각 경로 평가
score = await self._evaluate_thought(child_thought, problem)
child_node = {
"thought": child_thought,
"score": score,
"children": []
}
node["children"].append(child_node)
next_queue.append(child_node)
if score > best_score:
best_score = score
best_leaf = child_node
# 상위 k개 경로만 계속 탐색 (빔 서치)
queue = sorted(next_queue, key=lambda x: x["score"], reverse=True)[:self.branching_factor]
# 최고 경로로 최종 답변 생성
return await self._generate_final_answer(best_leaf["thought"], problem)
async def _generate_thoughts(self, current_thought: str) -> List[str]:
response = await self.llm.ainvoke(f"""
Current reasoning: {current_thought}
Generate {self.branching_factor} different ways to continue this reasoning.
Each should be a distinct approach.
Return as numbered list.
""")
return self._parse_numbered_list(response)
async def _evaluate_thought(self, thought: str, problem: str) -> float:
response = await self.llm.ainvoke(f"""
Problem: {problem}
Reasoning path: {thought}
Rate this reasoning path from 0.0 to 1.0.
Return only the number.
""")
try:
return float(response.strip())
except:
return 0.5
ToT를 써야 할 때:
- 수학 문제 (여러 풀이 방법 탐색)
- 복잡한 계획 수립 (여러 전략 비교)
- 창의적 글쓰기 (다양한 방향 탐색)
쓰지 말아야 할 때:
- 단순한 정보 검색
- 속도가 중요할 때 (API 비용과 레이턴시가 매우 큼)
어떤 패턴을 언제 선택하는가
| 상황 | 추천 패턴 |
|---|---|
| 도구를 써서 정보 수집, 계산 필요 | ReAct |
| 순수 추론, 도구 없음 | Chain of Thought |
| 장기 프로젝트, 여러 단계 작업 | Plan-and-Execute |
| 품질이 중요한 문서/코드 생성 | Reflection |
| 복잡한 수학/계획, 비용 상관없음 | Tree of Thoughts |
| 여러 전문가가 필요한 작업 | Multi-Agent (다음 글 참고) |
흔한 실수와 해결법
현업에서 실제로 마주치는 문제들입니다.
1. Tool Call 무한 루프
에이전트가 같은 툴을 계속 부릅니다. 보통 에러 처리가 없거나 Observation을 무시할 때 발생합니다.
# 반드시 max_iterations와 loop detection 추가
MAX_ITERATIONS = 15
seen_actions = set()
for i in range(MAX_ITERATIONS):
action = agent.get_next_action()
action_key = f"{action.tool}:{action.args}"
if action_key in seen_actions:
return "Error: Agent stuck in loop. Breaking out."
seen_actions.add(action_key)
2. Context 창 오버플로우
긴 대화에서 이전 Observation들이 쌓여서 context 창을 초과합니다.
def trim_messages_to_fit(messages, max_tokens=100000):
"""오래된 Observation부터 제거"""
while count_tokens(messages) > max_tokens:
# 첫 번째 Observation 쌍 제거 (system 메시지는 유지)
for i, msg in enumerate(messages[1:], 1):
if "Observation:" in msg.get("content", ""):
messages.pop(i)
messages.pop(i-1) # 해당 Action도 제거
break
return messages
3. Tool 에러를 무시
# 나쁜 예
result = tool.run(args)
# 좋은 예: 에러를 LLM에게 알려줘서 대응하게 함
try:
result = tool.run(args)
except Exception as e:
result = f"Tool error: {str(e)}. Try a different approach."
# LLM이 이 에러를 보고 다른 방법을 시도함
4. Timeout 없음
외부 API가 느리거나 죽으면 에이전트가 영원히 기다립니다.
import asyncio
async def run_tool_with_timeout(tool, args, timeout=30):
try:
return await asyncio.wait_for(tool.arun(args), timeout=timeout)
except asyncio.TimeoutError:
return f"Tool timed out after {timeout}s. Try a different approach."
마치며
패턴을 아는 것보다 언제 어떤 패턴을 쓰는지 아는 것이 더 중요합니다. 처음에는 ReAct부터 시작하세요 — 대부분의 사용 사례를 커버합니다. 복잡도가 올라갈수록 Plan-and-Execute, Reflection을 추가하고, 품질이 중요한 경우에 ToT를 고려하세요.
다음 글에서는 이 패턴들을 연결하는 **MCP(Model Context Protocol)**를 다룹니다 — AI와 외부 세계를 연결하는 새로운 표준입니다.
LLM Agent Design Patterns: From ReAct to Multi-Agent Orchestration
- What Is an Agent? (A Practical Definition)
- Pattern 1: ReAct (Reason + Act)
- Pattern 2: Chain of Thought (CoT)
- Pattern 3: Plan-and-Execute
- Pattern 4: Reflection and Self-Critique
- Pattern 5: Tree of Thoughts (ToT)
- Pattern Selection Guide
- Common Mistakes and How to Fix Them
- Wrapping Up
The most common mistake when building your first AI agent is "just plug in GPT-4 and see what happens." Then you discover your agent looping on the same tool call forever, confidently hallucinating an answer, or crashing because you blew the context window.
This post covers five battle-tested LLM agent design patterns: how each works, code you can actually run, and honest trade-offs.
What Is an Agent? (A Practical Definition)
There are many definitions floating around, but the most useful one is:
"An agent is an LLM that can take actions, observe results, and decide what to do next — in a loop."
Agent = LLM operating inside a feedback loop. That's it.
Perception (receive input)
→ Reasoning (LLM thinks)
→ Action (use a tool)
→ Observation (check result)
→ back to Reasoning
The feedback loop is what separates an agent from a simple LLM call. Without it, you just have a prompt.
Pattern 1: ReAct (Reason + Act)
Published by Yao et al. in 2022, ReAct is the most widely deployed agent pattern today. The name says it all: alternate between Reasoning and Acting.
How It Works
You force the LLM to think in a specific format:
Thought: [reasoning about what to do]
Action: tool_name(params)
Observation: [result of tool call]
Thought: [reasoning about the result]
... (repeat)
Final Answer: [conclusion]
A Real Trace
# System prompt that enforces ReAct format
system_prompt = """
You have access to these tools: [search, calculator, code_executor]
Always follow this exact format:
Thought: I need to...
Action: tool_name(params)
Observation: [tool result]
... (repeat as needed)
Final Answer: ...
Never skip the Thought step. Never fabricate Observations.
"""
# Example trace:
# Question: "What is South Korea's GDP in 2023 divided by its population?"
# Thought: I need to find South Korea's GDP and population
# Action: search("South Korea GDP 2023")
# Observation: "South Korea GDP 2023: $1.71 trillion USD"
# Thought: Now I need population
# Action: search("South Korea population 2024")
# Observation: "51.7 million"
# Thought: Now I can calculate
# Action: calculator("1710000000000 / 51700000")
# Observation: "33075"
# Final Answer: ~$33,075 GDP per capita
Production-Ready Implementation
import re
from typing import Callable
class ReActAgent:
def __init__(self, llm, tools: dict[str, Callable], max_iterations=10):
self.llm = llm
self.tools = tools
self.max_iterations = max_iterations
def run(self, question: str) -> str:
messages = [
{"role": "system", "content": self._build_system_prompt()},
{"role": "user", "content": question}
]
for i in range(self.max_iterations):
response = self.llm.invoke(messages)
content = response.content
# Done when we see Final Answer
if "Final Answer:" in content:
return content.split("Final Answer:")[-1].strip()
# Parse the Action
action_match = re.search(r"Action: (\w+)\((.*?)\)", content)
if not action_match:
return content # Format failure — return as-is
tool_name = action_match.group(1)
tool_args = action_match.group(2)
# Execute the tool
if tool_name not in self.tools:
observation = f"Error: tool '{tool_name}' not found"
else:
try:
observation = self.tools[tool_name](tool_args)
except Exception as e:
observation = f"Error executing tool: {str(e)}"
# Append Observation to message history
messages.append({"role": "assistant", "content": content})
messages.append({"role": "user", "content": f"Observation: {observation}"})
return "Max iterations reached without final answer"
def _build_system_prompt(self):
tool_names = list(self.tools.keys())
return f"""You have access to these tools: {tool_names}
Always use this format:
Thought: [your reasoning]
Action: tool_name(params)
Observation: [will be filled in]
Final Answer: [when done]"""
ReAct Pros and Cons
Pros: The reasoning chain is fully transparent — debugging is straightforward. Works well for the majority of agent tasks.
Cons: High token consumption. Repetitive Thought patterns can emerge, leading to hallucinated Observations.
Pattern 2: Chain of Thought (CoT)
Use this when you need complex reasoning but no external tools.
Zero-Shot CoT
Simplest form — just append "Let's think step by step" to your prompt:
response = llm.invoke(
"Q: A bat and a ball cost $1.10 total. The bat costs $1.00 more than the ball. "
"How much does the ball cost?\n\nLet's think step by step."
)
# Forces the LLM to reason through the problem sequentially
Few-Shot CoT
Provide worked examples to demonstrate the reasoning format:
few_shot_examples = """
Q: If 5 machines take 5 minutes to make 5 widgets, how long does it take 100 machines to make 100 widgets?
A:
Step 1: One machine makes 1 widget in 5 minutes.
Step 2: 100 machines each making 1 widget simultaneously = 5 minutes.
Answer: 5 minutes.
Q: [new question here]
A:
"""
ReAct vs CoT: Decision Guide
- Pure reasoning, no external data needed → CoT
- Needs real-time info, calculation, or code execution → ReAct
- Simple Q&A → neither, just use a plain LLM call
Pattern 3: Plan-and-Execute
Best for complex, multi-step tasks where you can decompose the problem upfront.
class PlanAndExecuteAgent:
def __init__(self, planner_llm, executor_llm, tools):
self.planner = planner_llm # Expensive, powerful model
self.executor = executor_llm # Cheaper, faster model
self.tools = tools
def run(self, goal: str) -> str:
# Phase 1: Planning — use your best LLM
plan_prompt = f"""
Goal: {goal}
Create a step-by-step plan to achieve this goal.
Return a numbered list of concrete, executable steps.
Each step should be specific enough to execute independently.
"""
plan_response = self.planner.invoke(plan_prompt)
steps = self._parse_plan(plan_response)
print(f"Plan created: {len(steps)} steps")
# Phase 2: Execution — can use a cheaper model
results = []
for i, step in enumerate(steps):
print(f"Executing step {i+1}: {step}")
result = self.executor.invoke(
f"Execute this step: {step}\n\n"
f"Context from previous steps: {results}\n\n"
f"Available tools: {list(self.tools.keys())}"
)
results.append({"step": step, "result": result})
# Phase 3: Summarize
summary = self.planner.invoke(
f"Goal: {goal}\n\nResults: {results}\n\nSummarize the final outcome."
)
return summary
Key advantage: You can use different models for planning vs execution. Plan with GPT-4, execute with GPT-3.5 — significant cost savings on long workflows.
Key weakness: If the initial plan is wrong, everything downstream is wrong. You need logic to replan when execution fails.
Pattern 4: Reflection and Self-Critique
The agent generates an output, then critiques its own work, then improves it.
class ReflectionAgent:
def __init__(self, llm):
self.llm = llm
def run(self, question: str, num_reflections: int = 2) -> str:
# Generate initial answer
answer = self.llm.invoke(f"Answer this question thoroughly: {question}")
for i in range(num_reflections):
# Self-critique phase
critique = self.llm.invoke(f"""
Question: {question}
Answer: {answer}
Critically evaluate this answer:
1. Is it factually correct?
2. Is it complete? What's missing?
3. Is the explanation clear?
4. Are there logical errors?
Be specific and constructive.
""")
print(f"Critique {i+1}: {critique[:200]}...")
# Refinement phase
answer = self.llm.invoke(f"""
Original question: {question}
Previous answer: {answer}
Critique: {critique}
Provide an improved answer that addresses all critique points.
""")
return answer
This sounds simple but delivers real quality gains. It works especially well for code generation, technical writing, and complex analysis tasks.
Warning: Too many reflection rounds can cause over-correction — the model second-guesses correct answers. Two to three rounds is usually the sweet spot.
Pattern 5: Tree of Thoughts (ToT)
Published in 2023, ToT explores multiple reasoning paths simultaneously — like a chess engine evaluating several moves ahead before committing.
from typing import List
import asyncio
class TreeOfThoughts:
def __init__(self, llm, branching_factor=3, max_depth=4):
self.llm = llm
self.branching_factor = branching_factor
self.max_depth = max_depth
async def solve(self, problem: str) -> str:
root = {"thought": problem, "score": 1.0, "children": []}
# BFS exploration
queue = [root]
best_leaf = None
best_score = 0
for depth in range(self.max_depth):
next_queue = []
for node in queue:
# Generate multiple next thoughts from this node
children = await self._generate_thoughts(node["thought"])
for child_thought in children:
# Score each reasoning path
score = await self._evaluate_thought(child_thought, problem)
child_node = {
"thought": child_thought,
"score": score,
"children": []
}
node["children"].append(child_node)
next_queue.append(child_node)
if score > best_score:
best_score = score
best_leaf = child_node
# Keep only top-k paths (beam search)
queue = sorted(next_queue, key=lambda x: x["score"], reverse=True)[:self.branching_factor]
# Generate final answer from best path
return await self._generate_final_answer(best_leaf["thought"], problem)
async def _generate_thoughts(self, current_thought: str) -> List[str]:
response = await self.llm.ainvoke(f"""
Current reasoning: {current_thought}
Generate {self.branching_factor} distinct ways to continue this reasoning.
Return as a numbered list.
""")
return self._parse_numbered_list(response)
async def _evaluate_thought(self, thought: str, problem: str) -> float:
response = await self.llm.ainvoke(f"""
Problem: {problem}
Reasoning path: {thought}
Rate this reasoning path from 0.0 to 1.0.
Return only the number.
""")
try:
return float(response.strip())
except:
return 0.5
Use ToT when:
- Math problems requiring exploration of multiple solution approaches
- Complex planning with many possible strategies
- Creative tasks where you want to explore divergent directions
Don't use ToT when:
- Simple information retrieval
- Speed matters — API costs and latency are very high
- The problem has a clearly correct linear solution
Pattern Selection Guide
| Situation | Recommended Pattern |
|---|---|
| Tool use needed (search, calculation) | ReAct |
| Pure reasoning, no tools | Chain of Thought |
| Long multi-step projects | Plan-and-Execute |
| Quality-critical docs or code | Reflection |
| Complex math/planning, cost not a concern | Tree of Thoughts |
| Multiple specialists needed | Multi-Agent (see next post) |
Common Mistakes and How to Fix Them
These are real problems you will hit in production.
1. Tool Call Infinite Loops
The agent calls the same tool repeatedly. Usually caused by missing error handling or ignored Observations.
MAX_ITERATIONS = 15
seen_actions = set()
for i in range(MAX_ITERATIONS):
action = agent.get_next_action()
action_key = f"{action.tool}:{action.args}"
if action_key in seen_actions:
return "Error: Agent stuck in loop. Breaking out."
seen_actions.add(action_key)
2. Context Window Overflow
Accumulated Observations fill up the context window in long sessions.
def trim_messages_to_fit(messages, max_tokens=100000):
"""Remove oldest Observations first, keep system message"""
while count_tokens(messages) > max_tokens:
for i, msg in enumerate(messages[1:], 1):
if "Observation:" in msg.get("content", ""):
messages.pop(i)
messages.pop(i - 1) # Remove the paired Action too
break
return messages
3. Swallowing Tool Errors
# Bad — silent failure
result = tool.run(args)
# Good — tell the LLM what went wrong so it can adapt
try:
result = tool.run(args)
except Exception as e:
result = f"Tool error: {str(e)}. Try a different approach."
4. No Timeouts on Tool Calls
If an external API hangs, your agent waits forever.
import asyncio
async def run_tool_with_timeout(tool, args, timeout=30):
try:
return await asyncio.wait_for(tool.arun(args), timeout=timeout)
except asyncio.TimeoutError:
return f"Tool timed out after {timeout}s. Try a different approach."
Wrapping Up
Knowing the patterns matters less than knowing when to apply them. Start with ReAct — it handles most cases. Layer in Plan-and-Execute for multi-step complexity, Reflection for output quality, and ToT only when you genuinely need multi-path exploration.
Next up: MCP (Model Context Protocol) — Anthropic's open standard for connecting AI to the external world. If you're building anything with tools, you need to understand this.