LLM Agent Design Patterns: From ReAct to Multi-Agent Orchestration

What Is an Agent? (A Practical Definition)
Pattern 1: ReAct (Reason + Act)
Pattern 2: Chain of Thought (CoT)
Pattern 3: Plan-and-Execute
Pattern 4: Reflection and Self-Critique
Pattern 5: Tree of Thoughts (ToT)
Pattern Selection Guide
Common Mistakes and How to Fix Them
Wrapping Up

The most common mistake when building your first AI agent is "just plug in GPT-4 and see what happens." Then you discover your agent looping on the same tool call forever, confidently hallucinating an answer, or crashing because you blew the context window.

This post covers five battle-tested LLM agent design patterns: how each works, code you can actually run, and honest trade-offs.

What Is an Agent? (A Practical Definition)

There are many definitions floating around, but the most useful one is:

"An agent is an LLM that can take actions, observe results, and decide what to do next — in a loop."

Agent = LLM operating inside a feedback loop. That's it.

Perception (receive input)
    → Reasoning (LLM thinks)
    → Action (use a tool)
    → Observation (check result)
    → back to Reasoning

The feedback loop is what separates an agent from a simple LLM call. Without it, you just have a prompt.

Pattern 1: ReAct (Reason + Act)

Published by Yao et al. in 2022, ReAct is the most widely deployed agent pattern today. The name says it all: alternate between Reasoning and Acting.

How It Works

You force the LLM to think in a specific format:

Thought: [reasoning about what to do]
Action: tool_name(params)
Observation: [result of tool call]
Thought: [reasoning about the result]
... (repeat)
Final Answer: [conclusion]

A Real Trace

# System prompt that enforces ReAct format
system_prompt = """
You have access to these tools: [search, calculator, code_executor]

Always follow this exact format:
Thought: I need to...
Action: tool_name(params)
Observation: [tool result]
... (repeat as needed)
Final Answer: ...

Never skip the Thought step. Never fabricate Observations.
"""

# Example trace:
# Question: "What is South Korea's GDP in 2023 divided by its population?"
# Thought: I need to find South Korea's GDP and population
# Action: search("South Korea GDP 2023")
# Observation: "South Korea GDP 2023: $1.71 trillion USD"
# Thought: Now I need population
# Action: search("South Korea population 2024")
# Observation: "51.7 million"
# Thought: Now I can calculate
# Action: calculator("1710000000000 / 51700000")
# Observation: "33075"
# Final Answer: ~$33,075 GDP per capita

Production-Ready Implementation

import re
from typing import Callable

class ReActAgent:
    def __init__(self, llm, tools: dict[str, Callable], max_iterations=10):
        self.llm = llm
        self.tools = tools
        self.max_iterations = max_iterations

    def run(self, question: str) -> str:
        messages = [
            {"role": "system", "content": self._build_system_prompt()},
            {"role": "user", "content": question}
        ]

        for i in range(self.max_iterations):
            response = self.llm.invoke(messages)
            content = response.content

            # Done when we see Final Answer
            if "Final Answer:" in content:
                return content.split("Final Answer:")[-1].strip()

            # Parse the Action
            action_match = re.search(r"Action: (\w+)\((.*?)\)", content)
            if not action_match:
                return content  # Format failure — return as-is

            tool_name = action_match.group(1)
            tool_args = action_match.group(2)

            # Execute the tool
            if tool_name not in self.tools:
                observation = f"Error: tool '{tool_name}' not found"
            else:
                try:
                    observation = self.tools[tool_name](tool_args)
                except Exception as e:
                    observation = f"Error executing tool: {str(e)}"

            # Append Observation to message history
            messages.append({"role": "assistant", "content": content})
            messages.append({"role": "user", "content": f"Observation: {observation}"})

        return "Max iterations reached without final answer"

    def _build_system_prompt(self):
        tool_names = list(self.tools.keys())
        return f"""You have access to these tools: {tool_names}
Always use this format:
Thought: [your reasoning]
Action: tool_name(params)
Observation: [will be filled in]
Final Answer: [when done]"""

ReAct Pros and Cons

Pros: The reasoning chain is fully transparent — debugging is straightforward. Works well for the majority of agent tasks.

Cons: High token consumption. Repetitive Thought patterns can emerge, leading to hallucinated Observations.

Pattern 2: Chain of Thought (CoT)

Use this when you need complex reasoning but no external tools.

Zero-Shot CoT

Simplest form — just append "Let's think step by step" to your prompt:

response = llm.invoke(
    "Q: A bat and a ball cost $1.10 total. The bat costs $1.00 more than the ball. "
    "How much does the ball cost?\n\nLet's think step by step."
)
# Forces the LLM to reason through the problem sequentially

Few-Shot CoT

Provide worked examples to demonstrate the reasoning format:

few_shot_examples = """
Q: If 5 machines take 5 minutes to make 5 widgets, how long does it take 100 machines to make 100 widgets?
A:
Step 1: One machine makes 1 widget in 5 minutes.
Step 2: 100 machines each making 1 widget simultaneously = 5 minutes.
Answer: 5 minutes.

Q: [new question here]
A:
"""

ReAct vs CoT: Decision Guide

Pure reasoning, no external data needed → CoT
Needs real-time info, calculation, or code execution → ReAct
Simple Q&A → neither, just use a plain LLM call

Pattern 3: Plan-and-Execute

Best for complex, multi-step tasks where you can decompose the problem upfront.

class PlanAndExecuteAgent:
    def __init__(self, planner_llm, executor_llm, tools):
        self.planner = planner_llm      # Expensive, powerful model
        self.executor = executor_llm    # Cheaper, faster model
        self.tools = tools

    def run(self, goal: str) -> str:
        # Phase 1: Planning — use your best LLM
        plan_prompt = f"""
        Goal: {goal}

        Create a step-by-step plan to achieve this goal.
        Return a numbered list of concrete, executable steps.
        Each step should be specific enough to execute independently.
        """
        plan_response = self.planner.invoke(plan_prompt)
        steps = self._parse_plan(plan_response)

        print(f"Plan created: {len(steps)} steps")

        # Phase 2: Execution — can use a cheaper model
        results = []
        for i, step in enumerate(steps):
            print(f"Executing step {i+1}: {step}")

            result = self.executor.invoke(
                f"Execute this step: {step}\n\n"
                f"Context from previous steps: {results}\n\n"
                f"Available tools: {list(self.tools.keys())}"
            )
            results.append({"step": step, "result": result})

        # Phase 3: Summarize
        summary = self.planner.invoke(
            f"Goal: {goal}\n\nResults: {results}\n\nSummarize the final outcome."
        )
        return summary

Key advantage: You can use different models for planning vs execution. Plan with GPT-4, execute with GPT-3.5 — significant cost savings on long workflows.

Key weakness: If the initial plan is wrong, everything downstream is wrong. You need logic to replan when execution fails.

Pattern 4: Reflection and Self-Critique

The agent generates an output, then critiques its own work, then improves it.

class ReflectionAgent:
    def __init__(self, llm):
        self.llm = llm

    def run(self, question: str, num_reflections: int = 2) -> str:
        # Generate initial answer
        answer = self.llm.invoke(f"Answer this question thoroughly: {question}")

        for i in range(num_reflections):
            # Self-critique phase
            critique = self.llm.invoke(f"""
            Question: {question}
            Answer: {answer}

            Critically evaluate this answer:
            1. Is it factually correct?
            2. Is it complete? What's missing?
            3. Is the explanation clear?
            4. Are there logical errors?

            Be specific and constructive.
            """)

            print(f"Critique {i+1}: {critique[:200]}...")

            # Refinement phase
            answer = self.llm.invoke(f"""
            Original question: {question}
            Previous answer: {answer}
            Critique: {critique}

            Provide an improved answer that addresses all critique points.
            """)

        return answer

This sounds simple but delivers real quality gains. It works especially well for code generation, technical writing, and complex analysis tasks.

Warning: Too many reflection rounds can cause over-correction — the model second-guesses correct answers. Two to three rounds is usually the sweet spot.

Pattern 5: Tree of Thoughts (ToT)

Published in 2023, ToT explores multiple reasoning paths simultaneously — like a chess engine evaluating several moves ahead before committing.

from typing import List
import asyncio

class TreeOfThoughts:
    def __init__(self, llm, branching_factor=3, max_depth=4):
        self.llm = llm
        self.branching_factor = branching_factor
        self.max_depth = max_depth

    async def solve(self, problem: str) -> str:
        root = {"thought": problem, "score": 1.0, "children": []}

        # BFS exploration
        queue = [root]
        best_leaf = None
        best_score = 0

        for depth in range(self.max_depth):
            next_queue = []

            for node in queue:
                # Generate multiple next thoughts from this node
                children = await self._generate_thoughts(node["thought"])

                for child_thought in children:
                    # Score each reasoning path
                    score = await self._evaluate_thought(child_thought, problem)
                    child_node = {
                        "thought": child_thought,
                        "score": score,
                        "children": []
                    }
                    node["children"].append(child_node)
                    next_queue.append(child_node)

                    if score > best_score:
                        best_score = score
                        best_leaf = child_node

            # Keep only top-k paths (beam search)
            queue = sorted(next_queue, key=lambda x: x["score"], reverse=True)[:self.branching_factor]

        # Generate final answer from best path
        return await self._generate_final_answer(best_leaf["thought"], problem)

    async def _generate_thoughts(self, current_thought: str) -> List[str]:
        response = await self.llm.ainvoke(f"""
        Current reasoning: {current_thought}
        Generate {self.branching_factor} distinct ways to continue this reasoning.
        Return as a numbered list.
        """)
        return self._parse_numbered_list(response)

    async def _evaluate_thought(self, thought: str, problem: str) -> float:
        response = await self.llm.ainvoke(f"""
        Problem: {problem}
        Reasoning path: {thought}
        Rate this reasoning path from 0.0 to 1.0.
        Return only the number.
        """)
        try:
            return float(response.strip())
        except:
            return 0.5

Use ToT when:

Math problems requiring exploration of multiple solution approaches
Complex planning with many possible strategies
Creative tasks where you want to explore divergent directions

Don't use ToT when:

Simple information retrieval
Speed matters — API costs and latency are very high
The problem has a clearly correct linear solution

Pattern Selection Guide

Situation	Recommended Pattern
Tool use needed (search, calculation)	ReAct
Pure reasoning, no tools	Chain of Thought
Long multi-step projects	Plan-and-Execute
Quality-critical docs or code	Reflection
Complex math/planning, cost not a concern	Tree of Thoughts
Multiple specialists needed	Multi-Agent (see next post)

Common Mistakes and How to Fix Them

These are real problems you will hit in production.

1. Tool Call Infinite Loops

The agent calls the same tool repeatedly. Usually caused by missing error handling or ignored Observations.

MAX_ITERATIONS = 15
seen_actions = set()

for i in range(MAX_ITERATIONS):
    action = agent.get_next_action()
    action_key = f"{action.tool}:{action.args}"

    if action_key in seen_actions:
        return "Error: Agent stuck in loop. Breaking out."
    seen_actions.add(action_key)

2. Context Window Overflow

Accumulated Observations fill up the context window in long sessions.

def trim_messages_to_fit(messages, max_tokens=100000):
    """Remove oldest Observations first, keep system message"""
    while count_tokens(messages) > max_tokens:
        for i, msg in enumerate(messages[1:], 1):
            if "Observation:" in msg.get("content", ""):
                messages.pop(i)
                messages.pop(i - 1)  # Remove the paired Action too
                break
    return messages

3. Swallowing Tool Errors

# Bad — silent failure
result = tool.run(args)

# Good — tell the LLM what went wrong so it can adapt
try:
    result = tool.run(args)
except Exception as e:
    result = f"Tool error: {str(e)}. Try a different approach."

4. No Timeouts on Tool Calls

If an external API hangs, your agent waits forever.

import asyncio

async def run_tool_with_timeout(tool, args, timeout=30):
    try:
        return await asyncio.wait_for(tool.arun(args), timeout=timeout)
    except asyncio.TimeoutError:
        return f"Tool timed out after {timeout}s. Try a different approach."

Wrapping Up

Knowing the patterns matters less than knowing when to apply them. Start with ReAct — it handles most cases. Layer in Plan-and-Execute for multi-step complexity, Reflection for output quality, and ToT only when you genuinely need multi-path exploration.

Next up: MCP (Model Context Protocol) — Anthropic's open standard for connecting AI to the external world. If you're building anything with tools, you need to understand this.