Skip to content
Published on

AI Agents Complete Guide: Building Autonomous AI Systems with LangChain, LangGraph, and CrewAI

Authors

Table of Contents

  1. What Are AI Agents?
  2. ReAct - Reasoning and Acting
  3. Tool Use
  4. LangChain Complete Guide
  5. LangGraph - State Machine Agents
  6. LlamaIndex Agents
  7. CrewAI - Multi-Agent Collaboration
  8. Agent Memory
  9. Code Execution Agents
  10. Agent Evaluation and Monitoring

1. What Are AI Agents?

1.1 Defining an Agent

An AI agent is an autonomous system that perceives its environment, selects actions, and executes them to achieve a goal.

Comparison with simple chatbots:

PropertyChatbotAI Agent
Action capabilityText generation onlyTool use, code execution, search, etc.
PlanningNoneMulti-step planning
MemoryWithin conversationLong-term memory possible
AutonomyLowHigh
ExecutionSingle responseIterates until goal achieved

1.2 The Four Core Components of an Agent

1. LLM (The Brain)

Handles all reasoning and judgment. Answers questions like "What should I do next?" and "Does this result satisfy the goal?"

2. Tool Use (The Hands)

Interacts with the external world. Web search, calculator, code execution, database queries, API calls — all fall here.

3. Memory (Recollection)

Short-term memory (conversation history), long-term memory (vector DB), and episodic memory (past experiences).

4. Planning (Strategy)

Decomposes complex goals into smaller sub-tasks and determines execution order.

1.3 The Agent Execution Loop

User goal input
  [Plan]
  Decompose goal into sub-tasks
  [Select Action]
  Decide next action (which tool to use?)
  [Execute Tool]
  Run the selected tool
  [Observe Result]
  Review tool output
  [Goal achieved?]YesGenerate final answer
No
  Back to Select Action

1.4 Applications of AI Agents

  • Research agents: Web search → information gathering → summary report
  • Code agents: Requirements analysis → code writing → test execution
  • Data analysis agents: Load data → analyze → visualize
  • Customer service agents: Identify query → system lookup → respond
  • DevOps agents: Monitor → detect issue → auto-remediate

2. ReAct

ReAct (Reasoning + Acting) is an agent framework published in 2022. The core is a "think and act" loop.

2.1 The ReAct Framework

Traditional Chain-of-Thought (CoT) only thinks. An agent needs think + act + observe.

Thought: I need up-to-date stock price data to answer this question
Action: Search["Samsung Electronics stock price 2026"]
Observation: Samsung Electronics current price: 78,000 KRW, +2.3% from previous day

Thought: I have the price data. Now I should calculate the change amount
Action: Calculator[78000 * 0.023]
Observation: 1794

Thought: The stock rose by 1,794 KRW from yesterday. I can now answer
Final Answer: Samsung Electronics is currently at 78,000 KRW, up 1,794 KRW (+2.3%) from the prior day.

2.2 Implementing a ReAct Agent

from langchain import hub
from langchain.agents import create_react_agent, AgentExecutor
from langchain_openai import ChatOpenAI
from langchain.tools import Tool
from langchain_community.tools import DuckDuckGoSearchRun
from langchain_core.prompts import PromptTemplate

REACT_TEMPLATE = """You are a helpful AI assistant.
You have access to the following tools:

{tools}

Use the following format:

Question: the input question you must answer
Thought: you should always think about what to do
Action: the action to take, should be one of [{tool_names}]
Action Input: the input to the action
Observation: the result of the action
... (this Thought/Action/Observation can repeat N times)
Thought: I now know the final answer
Final Answer: the final answer to the original input question

Begin!

Question: {input}
Thought:{agent_scratchpad}"""

react_prompt = PromptTemplate.from_template(REACT_TEMPLATE)

llm = ChatOpenAI(model="gpt-4o", temperature=0)

search = DuckDuckGoSearchRun()
tools = [
    Tool(
        name="Search",
        func=search.run,
        description="Useful for searching for current events and up-to-date data"
    ),
    Tool(
        name="Calculator",
        func=lambda x: eval(x),
        description="Useful for math calculations. Input is a Python expression"
    )
]

agent = create_react_agent(llm, tools, react_prompt)
agent_executor = AgentExecutor(
    agent=agent,
    tools=tools,
    verbose=True,
    max_iterations=10,
    handle_parsing_errors=True
)

result = agent_executor.invoke({
    "input": "What is Bitcoin's current price in March 2026, and how does it compare to a year ago?"
})
print(result["output"])

2.3 Limitations of ReAct

  • Hallucination: May generate tool actions that do not exist
  • Infinite loops: Can repeat without a termination condition
  • Long context: Accumulated Thought/Action/Observation steps can exceed the context window

3. Tool Use

Tools are how agents interact with the external world.

3.1 OpenAI Function Calling

OpenAI's Function Calling allows the LLM to invoke functions in a structured way.

from openai import OpenAI
import json

client = OpenAI()

functions = [
    {
        "name": "get_weather",
        "description": "Get current weather for a specific city",
        "parameters": {
            "type": "object",
            "properties": {
                "city": {
                    "type": "string",
                    "description": "City name to get weather for"
                },
                "unit": {
                    "type": "string",
                    "enum": ["celsius", "fahrenheit"],
                    "description": "Temperature unit"
                }
            },
            "required": ["city"]
        }
    },
    {
        "name": "search_database",
        "description": "Search internal database for information",
        "parameters": {
            "type": "object",
            "properties": {
                "query": {
                    "type": "string",
                    "description": "Search query"
                },
                "table": {
                    "type": "string",
                    "enum": ["users", "products", "orders"],
                    "description": "Table to search"
                }
            },
            "required": ["query"]
        }
    }
]

def get_weather(city: str, unit: str = "celsius") -> dict:
    return {
        "city": city,
        "temperature": 15,
        "unit": unit,
        "condition": "sunny",
        "humidity": 60
    }

def search_database(query: str, table: str = "products") -> list:
    return [{"id": 1, "name": "Sample Product", "price": 100}]

available_tools = {
    "get_weather": get_weather,
    "search_database": search_database
}

def run_agent_with_tools(user_message: str) -> str:
    messages = [{"role": "user", "content": user_message}]

    while True:
        response = client.chat.completions.create(
            model="gpt-4o",
            messages=messages,
            tools=[{"type": "function", "function": f} for f in functions],
            tool_choice="auto"
        )

        message = response.choices[0].message

        if not message.tool_calls:
            return message.content

        messages.append(message)

        for tool_call in message.tool_calls:
            func_name = tool_call.function.name
            func_args = json.loads(tool_call.function.arguments)
            print(f"  Tool call: {func_name}({func_args})")

            if func_name in available_tools:
                result = available_tools[func_name](**func_args)
            else:
                result = {"error": f"Unknown function: {func_name}"}

            messages.append({
                "role": "tool",
                "tool_call_id": tool_call.id,
                "content": json.dumps(result)
            })

answer = run_agent_with_tools("What's the weather in London? Do I need an umbrella?")
print(answer)

3.2 Diverse Tool Examples

from langchain.tools import tool
from langchain_community.tools import WikipediaQueryRun
from langchain_community.utilities import WikipediaAPIWrapper
import subprocess
import sqlite3

@tool
def calculator(expression: str) -> str:
    """Perform mathematical calculations. Input a Python expression."""
    try:
        result = eval(expression)
        return str(result)
    except Exception as e:
        return f"Calculation error: {e}"

@tool
def run_python_code(code: str) -> str:
    """Execute Python code and return the result."""
    try:
        local_vars = {}
        exec(code, {"__builtins__": {}}, local_vars)
        output = local_vars.get('result', 'No result variable found')
        return str(output)
    except Exception as e:
        return f"Code execution error: {e}"

@tool
def query_database(sql: str) -> str:
    """Execute a SQL query against the SQLite database."""
    try:
        conn = sqlite3.connect("agent_db.sqlite")
        cursor = conn.cursor()
        cursor.execute(sql)
        rows = cursor.fetchall()
        conn.close()
        return str(rows)
    except Exception as e:
        return f"DB error: {e}"

@tool
def send_email(to: str, subject: str, body: str) -> str:
    """Send an email."""
    print(f"Sending email to: {to}")
    print(f"Subject: {subject}")
    print(f"Body: {body[:100]}...")
    return f"Email successfully sent to {to}."

wikipedia = WikipediaQueryRun(api_wrapper=WikipediaAPIWrapper())

@tool
def search_wikipedia(query: str) -> str:
    """Search for information on Wikipedia."""
    return wikipedia.run(query)

4. LangChain

LangChain is the standard framework for building LLM applications.

4.1 LangChain Core Components

LangChain Architecture
├── Models (LLM, Chat Models, Embeddings)
├── Prompts (PromptTemplate, ChatPromptTemplate)
├── Chains (LLMChain, SequentialChain, LCEL)
├── Memory (Buffer, Summary, VectorStore)
├── Agents (ReAct, OpenAI Functions)
├── Tools (Built-in + Custom)
└── Retrievers (VectorStore, MultiQuery)

4.2 LCEL (LangChain Expression Language)

Modern LangChain uses LCEL pipelines.

from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnableParallel

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)

# Basic chain
prompt = ChatPromptTemplate.from_messages([
    ("system", "You are an expert analyst."),
    ("human", "{question}")
])

chain = prompt | llm | StrOutputParser()
result = chain.invoke({"question": "What are the advantages of AI agents?"})
print(result)

# Structured output
from pydantic import BaseModel, Field

class AnalysisResult(BaseModel):
    summary: str = Field(description="Summary")
    key_points: list[str] = Field(description="List of key points")
    recommendation: str = Field(description="Recommendation")

structured_chain = prompt | llm.with_structured_output(AnalysisResult)
result = structured_chain.invoke({"question": "Compare LangChain vs LlamaIndex"})
print(result.summary)
print(result.key_points)

# Parallel chain
parallel_chain = RunnableParallel({
    "pros": ChatPromptTemplate.from_template("What are the advantages of {topic}?") | llm | StrOutputParser(),
    "cons": ChatPromptTemplate.from_template("What are the disadvantages of {topic}?") | llm | StrOutputParser(),
})

result = parallel_chain.invoke({"topic": "AI agents"})
print("Pros:", result["pros"])
print("Cons:", result["cons"])

4.3 Memory Management

from langchain.memory import ConversationBufferMemory, ConversationSummaryMemory
from langchain.chains import ConversationChain
from langchain_openai import ChatOpenAI

# Buffer memory (keeps all messages)
buffer_memory = ConversationBufferMemory(
    memory_key="history",
    return_messages=True
)

# Summary memory (summarizes old conversations)
summary_memory = ConversationSummaryMemory(
    llm=ChatOpenAI(model="gpt-4o-mini"),
    memory_key="history",
    return_messages=True
)

llm = ChatOpenAI(model="gpt-4o", temperature=0.7)

conversation = ConversationChain(
    llm=llm,
    memory=buffer_memory,
    verbose=True
)

r1 = conversation.predict(input="My name is Alice")
r2 = conversation.predict(input="What's my name?")  # Remembers!
print(r1, r2)

# Vector store based long-term memory
from langchain.memory import VectorStoreRetrieverMemory
from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import FAISS

embeddings = OpenAIEmbeddings()
vectorstore = FAISS.from_texts(["dummy"], embeddings)
retriever = vectorstore.as_retriever(search_kwargs={"k": 3})

vector_memory = VectorStoreRetrieverMemory(retriever=retriever)
vector_memory.save_context(
    {"input": "My favorite food is sushi"},
    {"output": "Got it!"}
)

relevant = vector_memory.load_memory_variables({"prompt": "Recommend a food"})
print(relevant)

4.4 RAG (Retrieval Augmented Generation)

from langchain_community.document_loaders import WebBaseLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_openai import OpenAIEmbeddings, ChatOpenAI
from langchain_community.vectorstores import Chroma
from langchain.chains import RetrievalQA
from langchain_core.prompts import ChatPromptTemplate

loader = WebBaseLoader("https://example.com/document")
documents = loader.load()

splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
chunks = splitter.split_documents(documents)

embeddings = OpenAIEmbeddings()
vectorstore = Chroma.from_documents(chunks, embeddings)

llm = ChatOpenAI(model="gpt-4o", temperature=0)

rag_prompt = ChatPromptTemplate.from_template("""
Answer the question based only on the following context.
If the answer is not in the context, say you don't know.

Context:
{context}

Question: {question}

Answer:""")

qa_chain = RetrievalQA.from_chain_type(
    llm=llm,
    retriever=vectorstore.as_retriever(search_kwargs={"k": 4}),
    chain_type_kwargs={"prompt": rag_prompt},
    return_source_documents=True
)

result = qa_chain.invoke({"query": "What are the main points?"})
print("Answer:", result["result"])
print("Sources:", [doc.metadata for doc in result["source_documents"]])

4.5 Complete LangChain Agent

from langchain_openai import ChatOpenAI
from langchain.agents import create_openai_tools_agent, AgentExecutor
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain.memory import ConversationBufferWindowMemory
from langchain.tools import tool
from langchain_community.tools import DuckDuckGoSearchRun
import datetime

search = DuckDuckGoSearchRun()

@tool
def get_current_datetime() -> str:
    """Returns the current date and time."""
    return datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S")

@tool
def calculate(expression: str) -> str:
    """Perform math calculations. Example: 2+2, 10*5, sqrt(16)"""
    import math
    safe_dict = {k: getattr(math, k) for k in dir(math) if not k.startswith('_')}
    safe_dict['abs'] = abs
    try:
        return str(eval(expression, {"__builtins__": {}}, safe_dict))
    except Exception as e:
        return f"Calculation error: {e}"

@tool
def web_search(query: str) -> str:
    """Search the web for current information."""
    return search.run(query)

tools = [get_current_datetime, calculate, web_search]

prompt = ChatPromptTemplate.from_messages([
    ("system", """You are a helpful AI assistant.
Answer the user's questions accurately and helpfully.
Use tools when necessary to gather information."""),
    MessagesPlaceholder(variable_name="chat_history"),
    ("human", "{input}"),
    MessagesPlaceholder(variable_name="agent_scratchpad"),
])

llm = ChatOpenAI(model="gpt-4o", temperature=0)

memory = ConversationBufferWindowMemory(
    memory_key="chat_history",
    return_messages=True,
    k=10
)

agent = create_openai_tools_agent(llm, tools, prompt)
agent_executor = AgentExecutor(
    agent=agent,
    tools=tools,
    memory=memory,
    verbose=True,
    max_iterations=5,
    handle_parsing_errors=True
)

def chat(message: str) -> str:
    result = agent_executor.invoke({"input": message})
    return result["output"]

print(chat("What's today's date?"))
print(chat("Tell me about recent trends in AI agents"))
print(chat("What's the market cap of the company you just mentioned?"))  # Uses memory

5. LangGraph

LangGraph implements agents as State Machines. Complex loops, branches, and conditional execution become possible.

5.1 Why LangGraph?

Limitations of standard LangChain agents:

  • Only linear execution (loops are difficult)
  • Inconvenient state management
  • Complex branching
  • Human-in-the-loop is hard

LangGraph's solutions:

  • Graph-based execution flow
  • Explicit state management
  • Conditional edges for branching
  • Interrupt points

5.2 LangGraph Basics

from langgraph.graph import StateGraph, END
from langgraph.prebuilt import ToolNode
from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage
from typing import TypedDict, Annotated, Sequence
import operator

# Define state schema
class AgentState(TypedDict):
    messages: Annotated[Sequence, operator.add]
    next: str

llm = ChatOpenAI(model="gpt-4o", temperature=0)
tools = [web_search, calculate, get_current_datetime]
llm_with_tools = llm.bind_tools(tools)

def agent_node(state: AgentState) -> AgentState:
    """LLM decides the next action"""
    messages = state["messages"]
    response = llm_with_tools.invoke(messages)
    return {"messages": [response]}

def should_continue(state: AgentState) -> str:
    """Decide whether to continue or end (conditional edge)"""
    messages = state["messages"]
    last_message = messages[-1]

    if hasattr(last_message, 'tool_calls') and last_message.tool_calls:
        return "tools"
    return END

tool_node = ToolNode(tools)

workflow = StateGraph(AgentState)

workflow.add_node("agent", agent_node)
workflow.add_node("tools", tool_node)

workflow.set_entry_point("agent")

workflow.add_conditional_edges(
    "agent",
    should_continue,
    {
        "tools": "tools",
        END: END
    }
)
workflow.add_edge("tools", "agent")

app = workflow.compile()

result = app.invoke({
    "messages": [HumanMessage(content="What's the weather in London?")]
})
print(result["messages"][-1].content)

5.3 Human-in-the-Loop

from langgraph.graph import StateGraph, END
from langgraph.checkpoint.memory import MemorySaver

memory = MemorySaver()

class ApprovalState(TypedDict):
    messages: Annotated[Sequence, operator.add]
    pending_action: str
    approved: bool

def agent_node(state: ApprovalState) -> ApprovalState:
    messages = state["messages"]
    response = llm_with_tools.invoke(messages)

    if hasattr(response, 'tool_calls') and response.tool_calls:
        tool_name = response.tool_calls[0]['name']
        if tool_name in ["send_email", "delete_file", "make_payment"]:
            return {
                "messages": [response],
                "pending_action": tool_name,
                "approved": False
            }

    return {"messages": [response]}

def human_approval_node(state: ApprovalState) -> ApprovalState:
    """Human approval node (interrupt)"""
    print(f"\nAction requiring approval: {state['pending_action']}")
    print("Type 'approve' to continue, 'reject' to cancel")
    return state

def check_approval(state: ApprovalState) -> str:
    if state.get("approved"):
        return "execute"
    elif state.get("pending_action") and not state.get("approved"):
        return "human_approval"
    return END

workflow = StateGraph(ApprovalState)
workflow.add_node("agent", agent_node)
workflow.add_node("human_approval", human_approval_node)
workflow.add_node("tools", ToolNode(tools))

workflow.set_entry_point("agent")
workflow.add_conditional_edges("agent", check_approval, {
    "human_approval": "human_approval",
    "execute": "tools",
    END: END
})
workflow.add_edge("tools", "agent")

app = workflow.compile(
    checkpointer=memory,
    interrupt_before=["human_approval"]
)

thread_id = "session_001"
config = {"configurable": {"thread_id": thread_id}}

result = app.invoke(
    {"messages": [HumanMessage(content="Send a meeting invitation to the team")]},
    config=config
)

# After human approval, resume
app.update_state(config, {"approved": True})
final_result = app.invoke(None, config=config)

5.4 Research Agent with LangGraph

from langgraph.graph import StateGraph, END
from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage
from typing import TypedDict, List
import json

class ResearchState(TypedDict):
    topic: str
    search_queries: List[str]
    search_results: List[str]
    draft: str
    final_report: str
    iteration: int

llm = ChatOpenAI(model="gpt-4o", temperature=0)

def plan_queries(state: ResearchState) -> ResearchState:
    topic = state["topic"]
    response = llm.invoke([
        HumanMessage(content=f"""Topic: {topic}
Generate 5 search queries to thoroughly research this topic.
Return as JSON: {{"queries": ["query1", "query2", ...]}}""")
    ])
    queries = json.loads(response.content)["queries"]
    return {"search_queries": queries}

def execute_searches(state: ResearchState) -> ResearchState:
    queries = state["search_queries"]
    results = []
    for query in queries:
        result = search.run(query)
        results.append(f"[{query}]\n{result}")
    return {"search_results": results}

def write_draft(state: ResearchState) -> ResearchState:
    topic = state["topic"]
    results = "\n\n".join(state["search_results"])
    response = llm.invoke([
        HumanMessage(content=f"""Topic: {topic}

Collected information:
{results}

Write a detailed research report draft based on the above.""")
    ])
    return {"draft": response.content, "iteration": state.get("iteration", 0) + 1}

def review_and_improve(state: ResearchState) -> ResearchState:
    draft = state["draft"]
    response = llm.invoke([
        HumanMessage(content=f"""Review and improve the following research report draft:

{draft}

Improvements:
1. Verify accuracy
2. Improve logical flow
3. Add important information
4. Strengthen conclusions

Write the final report.""")
    ])
    return {"final_report": response.content}

def should_improve(state: ResearchState) -> str:
    if state.get("iteration", 0) < 2:
        return "improve"
    return "finalize"

research_graph = StateGraph(ResearchState)
research_graph.add_node("plan_queries", plan_queries)
research_graph.add_node("execute_searches", execute_searches)
research_graph.add_node("write_draft", write_draft)
research_graph.add_node("review_and_improve", review_and_improve)

research_graph.set_entry_point("plan_queries")
research_graph.add_edge("plan_queries", "execute_searches")
research_graph.add_edge("execute_searches", "write_draft")
research_graph.add_conditional_edges(
    "write_draft",
    should_improve,
    {
        "improve": "execute_searches",
        "finalize": "review_and_improve"
    }
)
research_graph.add_edge("review_and_improve", END)

research_app = research_graph.compile()

result = research_app.invoke({
    "topic": "AI Agent Technology Trends in 2026",
    "search_queries": [],
    "search_results": [],
    "draft": "",
    "final_report": "",
    "iteration": 0
})
print(result["final_report"])

6. LlamaIndex

LlamaIndex is a data-centric AI agent framework.

6.1 LlamaIndex Agents

from llama_index.core.agent import ReActAgent
from llama_index.core.tools import FunctionTool, QueryEngineTool
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.llms.openai import OpenAI
from llama_index.core.settings import Settings

Settings.llm = OpenAI(model="gpt-4o", temperature=0)

def multiply(a: float, b: float) -> float:
    """Multiplies two numbers."""
    return a * b

def add(a: float, b: float) -> float:
    """Adds two numbers."""
    return a + b

multiply_tool = FunctionTool.from_defaults(fn=multiply)
add_tool = FunctionTool.from_defaults(fn=add)

documents = SimpleDirectoryReader("./data").load_data()
index = VectorStoreIndex.from_documents(documents)
query_engine = index.as_query_engine(similarity_top_k=3)

query_tool = QueryEngineTool.from_defaults(
    query_engine=query_engine,
    name="knowledge_base",
    description="Search internal company documents for information."
)

agent = ReActAgent.from_tools(
    [multiply_tool, add_tool, query_tool],
    llm=Settings.llm,
    verbose=True,
    max_iterations=10
)

response = agent.chat(
    "Find the AI policy in internal documents, and add the penalty amounts of $500 and $1000"
)
print(response)

6.2 Multi-Document RAG Agent

from llama_index.core.agent import ReActAgent
from llama_index.core.tools import QueryEngineTool
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.core.node_parser import SentenceSplitter

docs_finance = SimpleDirectoryReader("./finance_docs").load_data()
docs_hr = SimpleDirectoryReader("./hr_docs").load_data()
docs_technical = SimpleDirectoryReader("./technical_docs").load_data()

splitter = SentenceSplitter(chunk_size=512)

finance_index = VectorStoreIndex.from_documents(docs_finance, transformations=[splitter])
hr_index = VectorStoreIndex.from_documents(docs_hr, transformations=[splitter])
tech_index = VectorStoreIndex.from_documents(docs_technical, transformations=[splitter])

tools = [
    QueryEngineTool.from_defaults(
        query_engine=finance_index.as_query_engine(),
        name="finance_qa",
        description="Answer questions about finance, accounting, and budgets"
    ),
    QueryEngineTool.from_defaults(
        query_engine=hr_index.as_query_engine(),
        name="hr_qa",
        description="Answer questions about HR, hiring, and benefits"
    ),
    QueryEngineTool.from_defaults(
        query_engine=tech_index.as_query_engine(),
        name="tech_qa",
        description="Answer questions about technical specifications and development guides"
    ),
]

agent = ReActAgent.from_tools(tools, verbose=True)
response = agent.chat("Tell me about the 2026 IT budget and new hiring plans")
print(response)

7. CrewAI

CrewAI is a role-based multi-agent collaboration framework.

7.1 CrewAI Core Concepts

Crew
├── Agents - each with role and goal
│   ├── Role: "Senior Researcher", "Content Writer"
│   ├── Goal: what the agent aims to achieve
│   ├── Backstory: personality/expertise
│   └── Tools: available tools
└── Tasks
    ├── Description: what needs to be done
    ├── Expected Output: expected result
    └── Agent: assigned agent

7.2 Research Team Agent

from crewai import Agent, Task, Crew, Process
from crewai_tools import SerperDevTool, WebsiteSearchTool
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(model="gpt-4o", temperature=0.7)

search_tool = SerperDevTool()
web_tool = WebsiteSearchTool()

researcher = Agent(
    role="Senior Researcher",
    goal="Collect comprehensive and accurate information on the given topic",
    backstory="""You are a professional researcher with 10 years of experience.
    You are an expert at systematically investigating complex topics
    and extracting key insights from reliable, up-to-date sources.""",
    tools=[search_tool, web_tool],
    llm=llm,
    verbose=True
)

analyst = Agent(
    role="Data Analyst",
    goal="Analyze collected information and identify patterns and trends",
    backstory="""You are a data analysis expert. You discover meaningful patterns
    in raw data and derive actionable insights by combining statistical methods
    with business knowledge.""",
    tools=[search_tool],
    llm=llm,
    verbose=True
)

writer = Agent(
    role="Content Writer",
    goal="Write clear, compelling reports from analysis results",
    backstory="""You are a professional writer who can explain technical content
    to general audiences. You love conveying complex analytical results through
    storytelling.""",
    llm=llm,
    verbose=True
)

research_task = Task(
    description="""Research '{topic}' covering:
    1. Latest trends and developments
    2. Key players and their approaches
    3. Potential opportunities and risks
    4. Relevant statistics and data

    Cite at least 5 reliable sources.""",
    expected_output="Research summary (minimum 500 words)",
    agent=researcher
)

analysis_task = Task(
    description="""Analyze the information gathered by the researcher:
    1. Identify 3 key trends
    2. Perform SWOT analysis
    3. Short-term forecast (6-12 months)
    4. Key risk factors

    Provide objective, data-driven analysis.""",
    expected_output="Analysis report (minimum 400 words)",
    agent=analyst,
    context=[research_task]
)

report_task = Task(
    description="""Synthesize research and analysis into a professional report:

    Report structure:
    1. Executive Summary
    2. Current State Analysis
    3. Key Insights
    4. Recommendations
    5. Conclusion

    Write professionally and persuasively.""",
    expected_output="Completed report (minimum 800 words)",
    agent=writer,
    context=[research_task, analysis_task]
)

research_crew = Crew(
    agents=[researcher, analyst, writer],
    tasks=[research_task, analysis_task, report_task],
    process=Process.sequential,
    verbose=True
)

result = research_crew.kickoff(inputs={"topic": "AI Agent Market Analysis 2026"})
print(result)

7.3 Software Development Agent Team

from crewai import Agent, Task, Crew, Process
from crewai_tools import CodeInterpreterTool

code_interpreter = CodeInterpreterTool()

product_manager = Agent(
    role="Product Manager",
    goal="Clearly define requirements and create development plans",
    backstory="A PM with 10 years of experience connecting technical requirements to business goals.",
    llm=llm,
    verbose=True
)

senior_developer = Agent(
    role="Senior Developer",
    goal="Write high-quality, scalable code",
    backstory="A full-stack developer with expertise in Python, FastAPI, and React.",
    tools=[code_interpreter],
    llm=llm,
    verbose=True
)

qa_engineer = Agent(
    role="QA Engineer",
    goal="Thoroughly test code and ensure quality",
    backstory="A software testing expert who loves finding bugs.",
    tools=[code_interpreter],
    llm=llm,
    verbose=True
)

requirements_task = Task(
    description="""Define technical requirements for '{feature_request}':
    1. User stories
    2. Functional requirements list
    3. Non-functional requirements (performance, security)
    4. API design (endpoint list)""",
    expected_output="Requirements document",
    agent=product_manager
)

development_task = Task(
    description="""Write Python FastAPI code based on the requirements document:
    1. Complete API implementation
    2. Data models (Pydantic)
    3. Error handling
    4. Code comments""",
    expected_output="Complete Python code",
    agent=senior_developer,
    context=[requirements_task]
)

testing_task = Task(
    description="""Review and test the written code:
    1. Code review (bugs, security vulnerabilities)
    2. Write unit tests
    3. Edge case testing
    4. Suggest improvements""",
    expected_output="Test report and improved code",
    agent=qa_engineer,
    context=[development_task]
)

dev_crew = Crew(
    agents=[product_manager, senior_developer, qa_engineer],
    tasks=[requirements_task, development_task, testing_task],
    process=Process.sequential,
    verbose=True
)

result = dev_crew.kickoff(
    inputs={"feature_request": "User authentication API (JWT-based)"}
)
print(result)

7.4 Hierarchical CrewAI

# Hierarchical structure where a manager delegates work
manager = Agent(
    role="Project Manager",
    goal="Coordinate the team and produce the best results",
    backstory="An experienced PM who maximizes the strengths of each team member.",
    llm=llm,
    verbose=True,
    allow_delegation=True  # Can delegate to other agents
)

hierarchical_crew = Crew(
    agents=[manager, researcher, analyst, writer],
    tasks=[report_task],  # Only define the final task (rest auto-distributed)
    process=Process.hierarchical,
    manager_agent=manager,
    verbose=True
)

8. Agent Memory

8.1 Memory Architecture

from langchain.memory import (
    ConversationBufferMemory,
    ConversationSummaryBufferMemory,
    ConversationEntityMemory,
)
from langchain_community.vectorstores import FAISS
from langchain_openai import OpenAIEmbeddings, ChatOpenAI
import datetime

llm = ChatOpenAI(model="gpt-4o-mini")
embeddings = OpenAIEmbeddings()

# 1. Short-term memory - recent N messages
from langchain.memory import ConversationBufferWindowMemory
short_term = ConversationBufferWindowMemory(k=5, return_messages=True)

# 2. Summary memory - summarizes old conversations
summary_memory = ConversationSummaryBufferMemory(
    llm=llm,
    max_token_limit=1000,
    return_messages=True
)

# 3. Entity memory - extract key facts
entity_memory = ConversationEntityMemory(llm=llm, return_messages=True)

# 4. Long-term memory - vector DB
class LongTermMemory:
    def __init__(self):
        self.vectorstore = FAISS.from_texts(["init"], embeddings)
        self.retriever = self.vectorstore.as_retriever(search_kwargs={"k": 5})

    def save(self, text: str, metadata: dict = None):
        self.vectorstore.add_texts([text], metadatas=[metadata or {}])

    def recall(self, query: str) -> list:
        docs = self.retriever.get_relevant_documents(query)
        return [doc.page_content for doc in docs]


# 5. Episodic memory - past agent experiences
class EpisodicMemory:
    def __init__(self):
        self.episodes = []

    def save_episode(self, task: str, actions: list, result: str, success: bool):
        episode = {
            "task": task,
            "actions": actions,
            "result": result,
            "success": success,
            "timestamp": datetime.datetime.now().isoformat()
        }
        self.episodes.append(episode)

    def find_similar_episodes(self, current_task: str) -> list:
        return [e for e in self.episodes if e["success"]]


# Integrated memory system
class AgentMemorySystem:
    def __init__(self):
        self.short_term = ConversationBufferWindowMemory(k=10)
        self.long_term = LongTermMemory()
        self.episodic = EpisodicMemory()
        self.entities = {}

    def save_message(self, role: str, content: str):
        self.short_term.save_context(
            {"input": content if role == "human" else ""},
            {"output": content if role == "ai" else ""}
        )

    def get_relevant_context(self, query: str) -> str:
        recent = self.short_term.load_memory_variables({})
        long_term = self.long_term.recall(query)
        past_episodes = self.episodic.find_similar_episodes(query)

        context = f"""Recent conversation: {recent.get('history', '')}

Relevant memories: {'; '.join(long_term[:3])}

Similar past experiences: {past_episodes[:2] if past_episodes else 'None'}"""

        return context

memory_system = AgentMemorySystem()

9. Code Execution Agents

9.1 Python REPL Agent

from langchain_experimental.tools import PythonREPLTool
from langchain.agents import create_openai_tools_agent, AgentExecutor
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder

python_repl = PythonREPLTool()
llm = ChatOpenAI(model="gpt-4o", temperature=0)

data_analysis_prompt = ChatPromptTemplate.from_messages([
    ("system", """You are a professional data analyst.
You are proficient in Python, pandas, matplotlib, and seaborn.
When given a data analysis request, write and execute code to derive results.
Always explain the analysis results alongside your code."""),
    MessagesPlaceholder(variable_name="chat_history"),
    ("human", "{input}"),
    MessagesPlaceholder(variable_name="agent_scratchpad"),
])

agent = create_openai_tools_agent(llm, [python_repl], data_analysis_prompt)
data_agent = AgentExecutor(agent=agent, tools=[python_repl], verbose=True)

result = data_agent.invoke({
    "input": """Analyze the following data:
    sales = [100, 150, 120, 200, 180, 250, 220, 300, 280, 350, 320, 400]
    months = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun',
              'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']

    Analyze the monthly sales trend and calculate the growth rate.""",
    "chat_history": []
})
print(result["output"])

9.2 Docker Sandbox Code Execution

import docker
import tempfile
import os

class DockerCodeExecutor:
    """Safely execute code inside a Docker container"""

    def __init__(self, image="python:3.11-slim", timeout=30):
        self.client = docker.from_env()
        self.image = image
        self.timeout = timeout

    def execute(self, code: str, packages: list = None) -> dict:
        """
        Execute Python code inside a Docker container
        Returns: {success: bool, output: str, error: str}
        """
        with tempfile.TemporaryDirectory() as tmpdir:
            code_file = os.path.join(tmpdir, "script.py")
            with open(code_file, "w") as f:
                f.write(code)

            install_cmd = ""
            if packages:
                pkgs = " ".join(packages)
                install_cmd = f"pip install {pkgs} -q && "

            try:
                container = self.client.containers.run(
                    self.image,
                    command=f'sh -c "{install_cmd}python /code/script.py"',
                    volumes={tmpdir: {"bind": "/code", "mode": "ro"}},
                    remove=True,
                    network_mode="none",   # Block network access
                    mem_limit="256m",       # Memory limit
                    cpu_period=100000,
                    cpu_quota=50000,        # 50% CPU limit
                    timeout=self.timeout,
                    stdout=True,
                    stderr=True
                )
                return {
                    "success": True,
                    "output": container.decode("utf-8"),
                    "error": ""
                }
            except docker.errors.ContainerError as e:
                return {
                    "success": False,
                    "output": "",
                    "error": e.stderr.decode("utf-8") if e.stderr else str(e)
                }
            except Exception as e:
                return {"success": False, "output": "", "error": str(e)}


executor = DockerCodeExecutor()

code = """
import pandas as pd
import json

data = {'name': ['Alice', 'Bob', 'Charlie'], 'score': [85, 92, 78]}
df = pd.DataFrame(data)
result = df.describe().to_dict()
print(json.dumps(result, indent=2))
"""

result = executor.execute(code, packages=["pandas"])
print(result["output"])

10. Agent Evaluation and Monitoring

10.1 LangSmith Tracing

import os
from langsmith import Client

# LangSmith setup
os.environ["LANGCHAIN_TRACING_V2"] = "true"
os.environ["LANGCHAIN_API_KEY"] = "your-langsmith-api-key"
os.environ["LANGCHAIN_PROJECT"] = "ai-agent-evaluation"

from langsmith.run_helpers import traceable

@traceable(run_type="chain")
def run_agent_with_tracking(user_input: str):
    """Agent execution tracked by LangSmith"""
    result = agent_executor.invoke({"input": user_input})
    return result

# Query execution data via LangSmith client
client = Client()

runs = client.list_runs(
    project_name="ai-agent-evaluation",
    run_type="chain"
)

for run in list(runs)[:5]:
    print(f"Run ID: {run.id}")
    print(f"Status: {run.status}")
    print(f"Execution time: {run.end_time - run.start_time if run.end_time else 'N/A'}")
    print(f"Token usage: {run.total_tokens}")
    print("---")

10.2 Agent Performance Metrics

import time
import datetime
from dataclasses import dataclass, field
from typing import List, Optional

@dataclass
class AgentMetrics:
    task: str
    success: bool
    total_time: float
    num_iterations: int
    tools_used: List[str]
    tokens_used: int
    error_message: Optional[str] = None
    final_answer: Optional[str] = None

class AgentEvaluator:
    """Agent performance evaluation system"""

    def __init__(self, agent_executor):
        self.agent = agent_executor
        self.metrics_history: List[AgentMetrics] = []

    def evaluate(self, task: str, expected_keywords: list = None) -> AgentMetrics:
        start_time = time.time()

        try:
            result = self.agent.invoke({"input": task})
            total_time = time.time() - start_time
            answer = result.get("output", "")

            success = True
            if expected_keywords:
                success = any(kw.lower() in answer.lower() for kw in expected_keywords)

            metrics = AgentMetrics(
                task=task,
                success=success,
                total_time=total_time,
                num_iterations=0,
                tools_used=[],
                tokens_used=0,
                final_answer=answer
            )

        except Exception as e:
            metrics = AgentMetrics(
                task=task,
                success=False,
                total_time=time.time() - start_time,
                num_iterations=0,
                tools_used=[],
                tokens_used=0,
                error_message=str(e)
            )

        self.metrics_history.append(metrics)
        return metrics

    def batch_evaluate(self, test_cases: list) -> dict:
        results = []
        for case in test_cases:
            task = case["task"]
            keywords = case.get("expected_keywords", [])
            metrics = self.evaluate(task, keywords)
            results.append(metrics)

        successes = [r for r in results if r.success]
        success_rate = len(successes) / len(results) if results else 0
        avg_time = sum(r.total_time for r in results) / len(results)

        return {
            "total_tasks": len(results),
            "success_rate": success_rate,
            "avg_response_time": avg_time,
            "failed_tasks": [r.task for r in results if not r.success],
            "detailed_results": results
        }

    def generate_report(self) -> str:
        if not self.metrics_history:
            return "No evaluation data"

        total = len(self.metrics_history)
        successes = sum(1 for m in self.metrics_history if m.success)
        avg_time = sum(m.total_time for m in self.metrics_history) / total

        report = f"""
=== Agent Performance Report ===
Total tasks: {total}
Success rate: {successes/total*100:.1f}%
Average response time: {avg_time:.2f}s

Failed tasks:
"""
        for m in self.metrics_history:
            if not m.success:
                report += f"  - {m.task}: {m.error_message or 'Quality check failed'}\n"

        return report


test_cases = [
    {
        "task": "What time is it right now?",
        "expected_keywords": ["2026", ":", "AM", "PM"]
    },
    {
        "task": "What is the square root of 100?",
        "expected_keywords": ["10"]
    },
    {
        "task": "Explain the main components of an AI agent",
        "expected_keywords": ["LLM", "tool", "memory"]
    }
]

evaluator = AgentEvaluator(agent_executor)
report = evaluator.batch_evaluate(test_cases)
print(f"Success rate: {report['success_rate']*100:.1f}%")
print(f"Average response time: {report['avg_response_time']:.2f}s")

Conclusion

AI agents are evolving beyond simple chatbots into genuinely autonomous AI systems.

Framework selection guide:

Use CaseRecommended Framework
Rapid prototypingLangChain
Complex workflowsLangGraph
Document Q&A agentsLlamaIndex
Multi-agent collaborationCrewAI
Custom frameworkOpenAI Function Calling

Agent Development Best Practices:

  1. Start small - begin with a simple ReAct agent
  2. Clear tool descriptions - make each tool's description explicit
  3. Design memory upfront - plan what information needs to be remembered
  4. Error handling - tool failures and loop prevention are essential
  5. Monitor everything - trace all executions with LangSmith
  6. Manage costs - monitor token usage closely

References