- Authors

- Name
- Youngju Kim
- @fjvbn20031
- What Is an Agent? (A Practical Definition)
- Pattern 1: ReAct (Reason + Act)
- Pattern 2: Chain of Thought (CoT)
- Pattern 3: Plan-and-Execute
- Pattern 4: Reflection and Self-Critique
- Pattern 5: Tree of Thoughts (ToT)
- Pattern Selection Guide
- Common Mistakes and How to Fix Them
- Wrapping Up
The most common mistake when building your first AI agent is "just plug in GPT-4 and see what happens." Then you discover your agent looping on the same tool call forever, confidently hallucinating an answer, or crashing because you blew the context window.
This post covers five battle-tested LLM agent design patterns: how each works, code you can actually run, and honest trade-offs.
What Is an Agent? (A Practical Definition)
There are many definitions floating around, but the most useful one is:
"An agent is an LLM that can take actions, observe results, and decide what to do next — in a loop."
Agent = LLM operating inside a feedback loop. That's it.
Perception (receive input)
→ Reasoning (LLM thinks)
→ Action (use a tool)
→ Observation (check result)
→ back to Reasoning
The feedback loop is what separates an agent from a simple LLM call. Without it, you just have a prompt.
Pattern 1: ReAct (Reason + Act)
Published by Yao et al. in 2022, ReAct is the most widely deployed agent pattern today. The name says it all: alternate between Reasoning and Acting.
How It Works
You force the LLM to think in a specific format:
Thought: [reasoning about what to do]
Action: tool_name(params)
Observation: [result of tool call]
Thought: [reasoning about the result]
... (repeat)
Final Answer: [conclusion]
A Real Trace
# System prompt that enforces ReAct format
system_prompt = """
You have access to these tools: [search, calculator, code_executor]
Always follow this exact format:
Thought: I need to...
Action: tool_name(params)
Observation: [tool result]
... (repeat as needed)
Final Answer: ...
Never skip the Thought step. Never fabricate Observations.
"""
# Example trace:
# Question: "What is South Korea's GDP in 2023 divided by its population?"
# Thought: I need to find South Korea's GDP and population
# Action: search("South Korea GDP 2023")
# Observation: "South Korea GDP 2023: $1.71 trillion USD"
# Thought: Now I need population
# Action: search("South Korea population 2024")
# Observation: "51.7 million"
# Thought: Now I can calculate
# Action: calculator("1710000000000 / 51700000")
# Observation: "33075"
# Final Answer: ~$33,075 GDP per capita
Production-Ready Implementation
import re
from typing import Callable
class ReActAgent:
def __init__(self, llm, tools: dict[str, Callable], max_iterations=10):
self.llm = llm
self.tools = tools
self.max_iterations = max_iterations
def run(self, question: str) -> str:
messages = [
{"role": "system", "content": self._build_system_prompt()},
{"role": "user", "content": question}
]
for i in range(self.max_iterations):
response = self.llm.invoke(messages)
content = response.content
# Done when we see Final Answer
if "Final Answer:" in content:
return content.split("Final Answer:")[-1].strip()
# Parse the Action
action_match = re.search(r"Action: (\w+)\((.*?)\)", content)
if not action_match:
return content # Format failure — return as-is
tool_name = action_match.group(1)
tool_args = action_match.group(2)
# Execute the tool
if tool_name not in self.tools:
observation = f"Error: tool '{tool_name}' not found"
else:
try:
observation = self.tools[tool_name](tool_args)
except Exception as e:
observation = f"Error executing tool: {str(e)}"
# Append Observation to message history
messages.append({"role": "assistant", "content": content})
messages.append({"role": "user", "content": f"Observation: {observation}"})
return "Max iterations reached without final answer"
def _build_system_prompt(self):
tool_names = list(self.tools.keys())
return f"""You have access to these tools: {tool_names}
Always use this format:
Thought: [your reasoning]
Action: tool_name(params)
Observation: [will be filled in]
Final Answer: [when done]"""
ReAct Pros and Cons
Pros: The reasoning chain is fully transparent — debugging is straightforward. Works well for the majority of agent tasks.
Cons: High token consumption. Repetitive Thought patterns can emerge, leading to hallucinated Observations.
Pattern 2: Chain of Thought (CoT)
Use this when you need complex reasoning but no external tools.
Zero-Shot CoT
Simplest form — just append "Let's think step by step" to your prompt:
response = llm.invoke(
"Q: A bat and a ball cost $1.10 total. The bat costs $1.00 more than the ball. "
"How much does the ball cost?\n\nLet's think step by step."
)
# Forces the LLM to reason through the problem sequentially
Few-Shot CoT
Provide worked examples to demonstrate the reasoning format:
few_shot_examples = """
Q: If 5 machines take 5 minutes to make 5 widgets, how long does it take 100 machines to make 100 widgets?
A:
Step 1: One machine makes 1 widget in 5 minutes.
Step 2: 100 machines each making 1 widget simultaneously = 5 minutes.
Answer: 5 minutes.
Q: [new question here]
A:
"""
ReAct vs CoT: Decision Guide
- Pure reasoning, no external data needed → CoT
- Needs real-time info, calculation, or code execution → ReAct
- Simple Q&A → neither, just use a plain LLM call
Pattern 3: Plan-and-Execute
Best for complex, multi-step tasks where you can decompose the problem upfront.
class PlanAndExecuteAgent:
def __init__(self, planner_llm, executor_llm, tools):
self.planner = planner_llm # Expensive, powerful model
self.executor = executor_llm # Cheaper, faster model
self.tools = tools
def run(self, goal: str) -> str:
# Phase 1: Planning — use your best LLM
plan_prompt = f"""
Goal: {goal}
Create a step-by-step plan to achieve this goal.
Return a numbered list of concrete, executable steps.
Each step should be specific enough to execute independently.
"""
plan_response = self.planner.invoke(plan_prompt)
steps = self._parse_plan(plan_response)
print(f"Plan created: {len(steps)} steps")
# Phase 2: Execution — can use a cheaper model
results = []
for i, step in enumerate(steps):
print(f"Executing step {i+1}: {step}")
result = self.executor.invoke(
f"Execute this step: {step}\n\n"
f"Context from previous steps: {results}\n\n"
f"Available tools: {list(self.tools.keys())}"
)
results.append({"step": step, "result": result})
# Phase 3: Summarize
summary = self.planner.invoke(
f"Goal: {goal}\n\nResults: {results}\n\nSummarize the final outcome."
)
return summary
Key advantage: You can use different models for planning vs execution. Plan with GPT-4, execute with GPT-3.5 — significant cost savings on long workflows.
Key weakness: If the initial plan is wrong, everything downstream is wrong. You need logic to replan when execution fails.
Pattern 4: Reflection and Self-Critique
The agent generates an output, then critiques its own work, then improves it.
class ReflectionAgent:
def __init__(self, llm):
self.llm = llm
def run(self, question: str, num_reflections: int = 2) -> str:
# Generate initial answer
answer = self.llm.invoke(f"Answer this question thoroughly: {question}")
for i in range(num_reflections):
# Self-critique phase
critique = self.llm.invoke(f"""
Question: {question}
Answer: {answer}
Critically evaluate this answer:
1. Is it factually correct?
2. Is it complete? What's missing?
3. Is the explanation clear?
4. Are there logical errors?
Be specific and constructive.
""")
print(f"Critique {i+1}: {critique[:200]}...")
# Refinement phase
answer = self.llm.invoke(f"""
Original question: {question}
Previous answer: {answer}
Critique: {critique}
Provide an improved answer that addresses all critique points.
""")
return answer
This sounds simple but delivers real quality gains. It works especially well for code generation, technical writing, and complex analysis tasks.
Warning: Too many reflection rounds can cause over-correction — the model second-guesses correct answers. Two to three rounds is usually the sweet spot.
Pattern 5: Tree of Thoughts (ToT)
Published in 2023, ToT explores multiple reasoning paths simultaneously — like a chess engine evaluating several moves ahead before committing.
from typing import List
import asyncio
class TreeOfThoughts:
def __init__(self, llm, branching_factor=3, max_depth=4):
self.llm = llm
self.branching_factor = branching_factor
self.max_depth = max_depth
async def solve(self, problem: str) -> str:
root = {"thought": problem, "score": 1.0, "children": []}
# BFS exploration
queue = [root]
best_leaf = None
best_score = 0
for depth in range(self.max_depth):
next_queue = []
for node in queue:
# Generate multiple next thoughts from this node
children = await self._generate_thoughts(node["thought"])
for child_thought in children:
# Score each reasoning path
score = await self._evaluate_thought(child_thought, problem)
child_node = {
"thought": child_thought,
"score": score,
"children": []
}
node["children"].append(child_node)
next_queue.append(child_node)
if score > best_score:
best_score = score
best_leaf = child_node
# Keep only top-k paths (beam search)
queue = sorted(next_queue, key=lambda x: x["score"], reverse=True)[:self.branching_factor]
# Generate final answer from best path
return await self._generate_final_answer(best_leaf["thought"], problem)
async def _generate_thoughts(self, current_thought: str) -> List[str]:
response = await self.llm.ainvoke(f"""
Current reasoning: {current_thought}
Generate {self.branching_factor} distinct ways to continue this reasoning.
Return as a numbered list.
""")
return self._parse_numbered_list(response)
async def _evaluate_thought(self, thought: str, problem: str) -> float:
response = await self.llm.ainvoke(f"""
Problem: {problem}
Reasoning path: {thought}
Rate this reasoning path from 0.0 to 1.0.
Return only the number.
""")
try:
return float(response.strip())
except:
return 0.5
Use ToT when:
- Math problems requiring exploration of multiple solution approaches
- Complex planning with many possible strategies
- Creative tasks where you want to explore divergent directions
Don't use ToT when:
- Simple information retrieval
- Speed matters — API costs and latency are very high
- The problem has a clearly correct linear solution
Pattern Selection Guide
| Situation | Recommended Pattern |
|---|---|
| Tool use needed (search, calculation) | ReAct |
| Pure reasoning, no tools | Chain of Thought |
| Long multi-step projects | Plan-and-Execute |
| Quality-critical docs or code | Reflection |
| Complex math/planning, cost not a concern | Tree of Thoughts |
| Multiple specialists needed | Multi-Agent (see next post) |
Common Mistakes and How to Fix Them
These are real problems you will hit in production.
1. Tool Call Infinite Loops
The agent calls the same tool repeatedly. Usually caused by missing error handling or ignored Observations.
MAX_ITERATIONS = 15
seen_actions = set()
for i in range(MAX_ITERATIONS):
action = agent.get_next_action()
action_key = f"{action.tool}:{action.args}"
if action_key in seen_actions:
return "Error: Agent stuck in loop. Breaking out."
seen_actions.add(action_key)
2. Context Window Overflow
Accumulated Observations fill up the context window in long sessions.
def trim_messages_to_fit(messages, max_tokens=100000):
"""Remove oldest Observations first, keep system message"""
while count_tokens(messages) > max_tokens:
for i, msg in enumerate(messages[1:], 1):
if "Observation:" in msg.get("content", ""):
messages.pop(i)
messages.pop(i - 1) # Remove the paired Action too
break
return messages
3. Swallowing Tool Errors
# Bad — silent failure
result = tool.run(args)
# Good — tell the LLM what went wrong so it can adapt
try:
result = tool.run(args)
except Exception as e:
result = f"Tool error: {str(e)}. Try a different approach."
4. No Timeouts on Tool Calls
If an external API hangs, your agent waits forever.
import asyncio
async def run_tool_with_timeout(tool, args, timeout=30):
try:
return await asyncio.wait_for(tool.arun(args), timeout=timeout)
except asyncio.TimeoutError:
return f"Tool timed out after {timeout}s. Try a different approach."
Wrapping Up
Knowing the patterns matters less than knowing when to apply them. Start with ReAct — it handles most cases. Layer in Plan-and-Execute for multi-step complexity, Reflection for output quality, and ToT only when you genuinely need multi-path exploration.
Next up: MCP (Model Context Protocol) — Anthropic's open standard for connecting AI to the external world. If you're building anything with tools, you need to understand this.