Split View: AI 에이전트 완전 가이드: LangChain, LangGraph, CrewAI로 자율 AI 시스템 구축
AI 에이전트 완전 가이드: LangChain, LangGraph, CrewAI로 자율 AI 시스템 구축
목차
- AI 에이전트란?
- ReAct - 추론과 행동
- Tool Use - 도구 사용
- LangChain 완전 가이드
- LangGraph - 상태 머신 에이전트
- LlamaIndex 에이전트
- CrewAI - 멀티 에이전트 협업
- 에이전트 메모리
- 코드 실행 에이전트
- 에이전트 평가와 모니터링
1. AI 에이전트란?
1.1 에이전트의 정의
AI 에이전트는 환경을 인식하고, 목표를 달성하기 위해 행동을 선택하고 실행하는 자율 시스템입니다.
단순한 챗봇과의 차이:
| 특성 | 챗봇 | AI 에이전트 |
|---|---|---|
| 행동 능력 | 텍스트 생성만 | 도구 사용, 코드 실행, 검색 등 |
| 계획 능력 | 없음 | 다단계 계획 수립 |
| 메모리 | 대화 내 | 장기 메모리 가능 |
| 자율성 | 낮음 | 높음 |
| 루프 실행 | 단일 응답 | 목표 달성까지 반복 |
1.2 에이전트의 4가지 핵심 구성요소
1. LLM (두뇌)
모든 판단과 추론을 담당합니다. "다음에 뭘 해야 하는가?", "이 결과가 목표에 맞는가?" 같은 질문에 답합니다.
2. Tool Use (손)
외부 세계와 상호작용합니다. 웹 검색, 계산기, 코드 실행, 데이터베이스 조회, API 호출 등이 여기에 해당합니다.
3. Memory (기억)
단기 메모리(대화 히스토리), 장기 메모리(벡터 DB), 에피소딕 메모리(과거 경험)로 구성됩니다.
4. Planning (계획)
복잡한 목표를 작은 서브태스크로 분해하고 순서를 결정합니다.
1.3 에이전트의 작동 루프
사용자 목표 입력
↓
[계획 수립]
목표를 서브태스크로 분해
↓
[행동 선택]
다음 행동 결정 (어떤 도구 사용?)
↓
[도구 실행]
선택한 도구 실행
↓
[결과 관찰]
도구 결과 확인
↓
[목표 달성?] → 예 → 최종 답변 생성
↓ 아니오
다시 행동 선택으로
1.4 에이전트의 응용 분야
- 리서치 에이전트: 웹 검색 → 정보 수집 → 요약 보고서
- 코드 에이전트: 요구사항 분석 → 코드 작성 → 테스트 실행
- 데이터 분석 에이전트: 데이터 로드 → 분석 → 시각화
- 고객 서비스 에이전트: 문의 파악 → 시스템 조회 → 답변
- DevOps 에이전트: 모니터링 → 이슈 감지 → 자동 수정
2. ReAct
ReAct(Reasoning + Acting)는 2022년 발표된 에이전트 프레임워크입니다. "생각하고 행동하는" 루프가 핵심입니다.
2.1 ReAct 프레임워크
전통적인 체인-오브-쏘트(CoT)는 생각만 합니다. 에이전트는 생각 + 행동 + 관찰이 필요합니다.
Thought: 이 질문에 답하려면 최신 주가 데이터가 필요하다
Action: 검색["삼성전자 주가 2026"]
Observation: 삼성전자 현재가 78,000원, 전일 대비 +2.3%
Thought: 주가 데이터를 얻었다. 이제 변동률을 계산해야 한다
Action: 계산기[78000 * 0.023]
Observation: 1,794
Thought: 전일 대비 1,794원 상승했다. 이제 답할 수 있다
Final Answer: 삼성전자 현재가는 78,000원으로 전일 대비 1,794원(+2.3%) 상승했습니다.
2.2 ReAct 프롬프트 구현
from langchain import hub
from langchain.agents import create_react_agent, AgentExecutor
from langchain_openai import ChatOpenAI
from langchain.tools import Tool
from langchain_community.tools import DuckDuckGoSearchRun
from langchain_core.prompts import PromptTemplate
# ReAct 프롬프트 템플릿
REACT_TEMPLATE = """당신은 도움이 되는 AI 에이전트입니다.
다음 도구를 사용할 수 있습니다:
{tools}
다음 형식으로 응답하세요:
Question: 답해야 할 입력 질문
Thought: 무엇을 해야 하는지 항상 생각합니다
Action: 사용할 도구. [{tool_names}] 중 하나여야 합니다
Action Input: 도구에 입력할 내용
Observation: 도구의 결과
... (이 Thought/Action/Observation 과정을 N번 반복)
Thought: 이제 최종 답을 알았습니다
Final Answer: 원래 질문에 대한 최종 답변
시작합시다!
Question: {input}
Thought:{agent_scratchpad}"""
react_prompt = PromptTemplate.from_template(REACT_TEMPLATE)
# LLM 설정
llm = ChatOpenAI(model="gpt-4o", temperature=0)
# 도구 정의
search = DuckDuckGoSearchRun()
tools = [
Tool(
name="Search",
func=search.run,
description="최신 정보 검색. 현재 이벤트, 최신 데이터 검색에 유용"
),
Tool(
name="Calculator",
func=lambda x: eval(x), # 실제 환경에서는 안전한 계산기 사용
description="수학 계산. 입력은 파이썬 수식"
)
]
# ReAct 에이전트 생성
agent = create_react_agent(llm, tools, react_prompt)
agent_executor = AgentExecutor(
agent=agent,
tools=tools,
verbose=True,
max_iterations=10,
handle_parsing_errors=True
)
# 실행
result = agent_executor.invoke({
"input": "2026년 3월 현재 비트코인 가격은 얼마이고, 작년 3월 대비 변화율은?"
})
print(result["output"])
2.3 ReAct의 한계
- 환각(Hallucination): 존재하지 않는 도구 액션을 생성할 수 있음
- 무한 루프: 종료 조건 없이 계속 반복할 수 있음
- 긴 컨텍스트: Thought/Action/Observation이 쌓이면 컨텍스트 초과
3. Tool Use
도구(Tool)는 에이전트가 외부 세계와 상호작용하는 수단입니다.
3.1 OpenAI Function Calling
OpenAI의 Function Calling은 LLM이 구조화된 방식으로 함수를 호출하도록 합니다.
from openai import OpenAI
import json
client = OpenAI()
# 함수 정의
functions = [
{
"name": "get_weather",
"description": "특정 도시의 현재 날씨를 가져옵니다",
"parameters": {
"type": "object",
"properties": {
"city": {
"type": "string",
"description": "날씨를 조회할 도시 이름"
},
"unit": {
"type": "string",
"enum": ["celsius", "fahrenheit"],
"description": "온도 단위"
}
},
"required": ["city"]
}
},
{
"name": "search_database",
"description": "내부 데이터베이스에서 정보 검색",
"parameters": {
"type": "object",
"properties": {
"query": {
"type": "string",
"description": "검색 쿼리"
},
"table": {
"type": "string",
"enum": ["users", "products", "orders"],
"description": "검색할 테이블"
}
},
"required": ["query"]
}
}
]
# 실제 함수 구현
def get_weather(city: str, unit: str = "celsius") -> dict:
"""날씨 API 호출 (실제 구현)"""
# 실제로는 Weather API를 호출
return {
"city": city,
"temperature": 15,
"unit": unit,
"condition": "맑음",
"humidity": 60
}
def search_database(query: str, table: str = "products") -> list:
"""DB 검색 (실제 구현)"""
# 실제로는 DB 쿼리 실행
return [{"id": 1, "name": "예시 제품", "price": 10000}]
# 도구 매핑
available_tools = {
"get_weather": get_weather,
"search_database": search_database
}
def run_agent_with_tools(user_message: str) -> str:
"""Function Calling 에이전트 실행"""
messages = [{"role": "user", "content": user_message}]
while True:
response = client.chat.completions.create(
model="gpt-4o",
messages=messages,
tools=[{"type": "function", "function": f} for f in functions],
tool_choice="auto"
)
message = response.choices[0].message
# 도구 호출 없으면 최종 답변
if not message.tool_calls:
return message.content
# 도구 호출 처리
messages.append(message)
for tool_call in message.tool_calls:
func_name = tool_call.function.name
func_args = json.loads(tool_call.function.arguments)
print(f" 도구 호출: {func_name}({func_args})")
# 함수 실행
if func_name in available_tools:
result = available_tools[func_name](**func_args)
else:
result = {"error": f"알 수 없는 함수: {func_name}"}
# 도구 결과를 메시지에 추가
messages.append({
"role": "tool",
"tool_call_id": tool_call.id,
"content": json.dumps(result, ensure_ascii=False)
})
# 실행 예시
answer = run_agent_with_tools("서울의 현재 날씨를 알려주고, 우산이 필요한지 판단해줘")
print(answer)
3.2 다양한 도구 예제
from langchain.tools import tool
from langchain_community.tools import WikipediaQueryRun
from langchain_community.utilities import WikipediaAPIWrapper
import subprocess
import sqlite3
@tool
def calculator(expression: str) -> str:
"""수학 계산을 수행합니다. 파이썬 수식을 입력하세요."""
try:
result = eval(expression)
return str(result)
except Exception as e:
return f"계산 오류: {e}"
@tool
def run_python_code(code: str) -> str:
"""Python 코드를 실행하고 결과를 반환합니다."""
try:
# 안전한 실행 환경 (실제로는 sandbox 사용 권장)
local_vars = {}
exec(code, {"__builtins__": {}}, local_vars)
output = local_vars.get('result', 'No result variable found')
return str(output)
except Exception as e:
return f"코드 실행 오류: {e}"
@tool
def query_database(sql: str) -> str:
"""SQLite 데이터베이스에 SQL 쿼리를 실행합니다."""
try:
conn = sqlite3.connect("agent_db.sqlite")
cursor = conn.cursor()
cursor.execute(sql)
rows = cursor.fetchall()
conn.close()
return str(rows)
except Exception as e:
return f"DB 오류: {e}"
@tool
def send_email(to: str, subject: str, body: str) -> str:
"""이메일을 발송합니다."""
# 실제로는 SMTP나 이메일 API 사용
print(f"이메일 발송: {to}")
print(f"제목: {subject}")
print(f"내용: {body[:100]}...")
return f"이메일이 {to}에게 성공적으로 발송되었습니다."
# Wikipedia 도구
wikipedia = WikipediaQueryRun(api_wrapper=WikipediaAPIWrapper())
@tool
def search_wikipedia(query: str) -> str:
"""Wikipedia에서 정보를 검색합니다."""
return wikipedia.run(query)
4. LangChain
LangChain은 LLM 애플리케이션 개발의 표준 프레임워크입니다.
4.1 LangChain 핵심 컴포넌트
LangChain 아키텍처
├── Models (LLM, Chat Models, Embeddings)
├── Prompts (PromptTemplate, ChatPromptTemplate)
├── Chains (LLMChain, SequentialChain, LCEL)
├── Memory (Buffer, Summary, VectorStore)
├── Agents (ReAct, OpenAI Functions)
├── Tools (Built-in + Custom)
└── Retrievers (VectorStore, MultiQuery)
4.2 LCEL (LangChain Expression Language)
현대 LangChain은 LCEL 파이프라인을 사용합니다.
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser, JsonOutputParser
from langchain_core.runnables import RunnableParallel, RunnableLambda
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
# 기본 체인
prompt = ChatPromptTemplate.from_messages([
("system", "당신은 전문 분석가입니다."),
("human", "{question}")
])
chain = prompt | llm | StrOutputParser()
result = chain.invoke({"question": "AI 에이전트의 장점은?"})
print(result)
# 구조화된 출력
from pydantic import BaseModel, Field
class AnalysisResult(BaseModel):
summary: str = Field(description="요약")
key_points: list[str] = Field(description="핵심 포인트 목록")
recommendation: str = Field(description="권장 사항")
structured_chain = prompt | llm.with_structured_output(AnalysisResult)
result = structured_chain.invoke({"question": "LangChain과 LlamaIndex 비교"})
print(result.summary)
print(result.key_points)
# 병렬 체인
parallel_chain = RunnableParallel({
"pros": ChatPromptTemplate.from_template("{topic}의 장점은?") | llm | StrOutputParser(),
"cons": ChatPromptTemplate.from_template("{topic}의 단점은?") | llm | StrOutputParser(),
})
result = parallel_chain.invoke({"topic": "AI 에이전트"})
print("장점:", result["pros"])
print("단점:", result["cons"])
4.3 메모리 관리
from langchain.memory import ConversationBufferMemory, ConversationSummaryMemory
from langchain.chains import ConversationChain
from langchain_openai import ChatOpenAI
# 버퍼 메모리 (전체 대화 유지)
buffer_memory = ConversationBufferMemory(
memory_key="history",
return_messages=True
)
# 요약 메모리 (긴 대화를 요약)
summary_memory = ConversationSummaryMemory(
llm=ChatOpenAI(model="gpt-4o-mini"),
memory_key="history",
return_messages=True
)
# 대화 체인
llm = ChatOpenAI(model="gpt-4o", temperature=0.7)
conversation = ConversationChain(
llm=llm,
memory=buffer_memory,
verbose=True
)
# 대화
r1 = conversation.predict(input="내 이름은 김영주야")
r2 = conversation.predict(input="내 이름이 뭐라고 했지?") # 기억함
print(r1, r2)
# 벡터 스토어 기반 장기 메모리
from langchain.memory import VectorStoreRetrieverMemory
from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import FAISS
import faiss
# 벡터 스토어 초기화
embeddings = OpenAIEmbeddings()
vectorstore = FAISS.from_texts(["dummy"], embeddings)
retriever = vectorstore.as_retriever(search_kwargs={"k": 3})
vector_memory = VectorStoreRetrieverMemory(retriever=retriever)
vector_memory.save_context(
{"input": "내가 좋아하는 음식은 김치찌개야"},
{"output": "알겠습니다!"}
)
# 관련 기억 조회
relevant = vector_memory.load_memory_variables({"prompt": "음식 추천해줘"})
print(relevant)
4.4 RAG (Retrieval Augmented Generation)
from langchain_community.document_loaders import PyPDFLoader, WebBaseLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_openai import OpenAIEmbeddings, ChatOpenAI
from langchain_community.vectorstores import Chroma
from langchain.chains import RetrievalQA
from langchain_core.prompts import ChatPromptTemplate
# 문서 로드
loader = WebBaseLoader("https://example.com/document")
documents = loader.load()
# 텍스트 분할
splitter = RecursiveCharacterTextSplitter(
chunk_size=1000,
chunk_overlap=200
)
chunks = splitter.split_documents(documents)
# 벡터 저장소 생성
embeddings = OpenAIEmbeddings()
vectorstore = Chroma.from_documents(chunks, embeddings)
# RAG 체인
llm = ChatOpenAI(model="gpt-4o", temperature=0)
rag_prompt = ChatPromptTemplate.from_template("""
다음 컨텍스트를 기반으로 질문에 답하세요.
컨텍스트에 없는 정보는 모르다고 답하세요.
컨텍스트:
{context}
질문: {question}
답변:""")
qa_chain = RetrievalQA.from_chain_type(
llm=llm,
retriever=vectorstore.as_retriever(search_kwargs={"k": 4}),
chain_type_kwargs={"prompt": rag_prompt},
return_source_documents=True
)
result = qa_chain.invoke({"query": "주요 내용은?"})
print("답변:", result["result"])
print("출처:", [doc.metadata for doc in result["source_documents"]])
4.5 완전한 LangChain 에이전트
from langchain_openai import ChatOpenAI
from langchain.agents import create_openai_tools_agent, AgentExecutor
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain.memory import ConversationBufferWindowMemory
from langchain.tools import tool
from langchain_community.tools import DuckDuckGoSearchRun
import datetime
# 도구 정의
search = DuckDuckGoSearchRun()
@tool
def get_current_datetime() -> str:
"""현재 날짜와 시간을 반환합니다."""
return datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S")
@tool
def calculate(expression: str) -> str:
"""수학 계산을 수행합니다. 예: 2+2, 10*5, sqrt(16)"""
import math
safe_dict = {k: getattr(math, k) for k in dir(math) if not k.startswith('_')}
safe_dict['abs'] = abs
try:
return str(eval(expression, {"__builtins__": {}}, safe_dict))
except Exception as e:
return f"계산 오류: {e}"
@tool
def web_search(query: str) -> str:
"""웹에서 최신 정보를 검색합니다."""
return search.run(query)
tools = [get_current_datetime, calculate, web_search]
# 에이전트 프롬프트
prompt = ChatPromptTemplate.from_messages([
("system", """당신은 도움이 되는 AI 어시스턴트입니다.
사용자의 질문에 정확하고 친절하게 답변하세요.
필요한 경우 도구를 사용하여 정보를 수집하세요.
항상 한국어로 답변하세요."""),
MessagesPlaceholder(variable_name="chat_history"),
("human", "{input}"),
MessagesPlaceholder(variable_name="agent_scratchpad"),
])
# LLM
llm = ChatOpenAI(model="gpt-4o", temperature=0)
# 메모리
memory = ConversationBufferWindowMemory(
memory_key="chat_history",
return_messages=True,
k=10 # 최근 10개 대화 유지
)
# 에이전트 생성
agent = create_openai_tools_agent(llm, tools, prompt)
agent_executor = AgentExecutor(
agent=agent,
tools=tools,
memory=memory,
verbose=True,
max_iterations=5,
handle_parsing_errors=True
)
# 대화 실행
def chat(message: str) -> str:
result = agent_executor.invoke({"input": message})
return result["output"]
# 테스트
print(chat("오늘 날짜가 언제야?"))
print(chat("삼성전자의 최근 주가 동향을 알려줘"))
print(chat("방금 말한 회사의 시가총액은?")) # 메모리 활용
5. LangGraph
LangGraph는 에이전트를 **상태 머신(State Machine)**으로 구현합니다. 복잡한 루프, 분기, 조건부 실행이 가능합니다.
5.1 LangGraph가 필요한 이유
기존 LangChain 에이전트의 한계:
- 선형 실행만 가능 (루프 어렵)
- 상태 관리 불편
- 분기 처리 복잡
- 인간-루프(Human-in-the-loop) 어려움
LangGraph의 해결책:
- 그래프 기반 실행 흐름
- 명시적 상태 관리
- 조건부 엣지로 분기
- 인터럽트 포인트
5.2 LangGraph 기본 구조
from langgraph.graph import StateGraph, END
from langgraph.prebuilt import ToolNode
from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage, AIMessage, SystemMessage
from typing import TypedDict, Annotated, Sequence
import operator
# 상태 스키마 정의
class AgentState(TypedDict):
messages: Annotated[Sequence, operator.add] # 메시지 누적
next: str # 다음 노드
# 모델과 도구 설정
llm = ChatOpenAI(model="gpt-4o", temperature=0)
tools = [web_search, calculate, get_current_datetime]
llm_with_tools = llm.bind_tools(tools)
# 노드 함수 정의
def agent_node(state: AgentState) -> AgentState:
"""LLM이 다음 행동을 결정하는 노드"""
messages = state["messages"]
response = llm_with_tools.invoke(messages)
return {"messages": [response]}
def should_continue(state: AgentState) -> str:
"""계속 실행할지 종료할지 결정 (조건부 엣지)"""
messages = state["messages"]
last_message = messages[-1]
# 도구 호출이 있으면 계속, 없으면 종료
if hasattr(last_message, 'tool_calls') and last_message.tool_calls:
return "tools"
return END
# 그래프 구성
tool_node = ToolNode(tools)
workflow = StateGraph(AgentState)
# 노드 추가
workflow.add_node("agent", agent_node)
workflow.add_node("tools", tool_node)
# 진입점
workflow.set_entry_point("agent")
# 엣지 추가
workflow.add_conditional_edges(
"agent",
should_continue,
{
"tools": "tools",
END: END
}
)
workflow.add_edge("tools", "agent") # 도구 실행 후 에이전트로 복귀
# 그래프 컴파일
app = workflow.compile()
# 실행
result = app.invoke({
"messages": [HumanMessage(content="서울 날씨 알려줘")]
})
print(result["messages"][-1].content)
5.3 인간-루프 (Human-in-the-loop)
from langgraph.graph import StateGraph, END
from langgraph.checkpoint.memory import MemorySaver
# 체크포인트 저장소 (대화 세션 저장)
memory = MemorySaver()
class ApprovalState(TypedDict):
messages: Annotated[Sequence, operator.add]
pending_action: str
approved: bool
def agent_node(state: ApprovalState) -> ApprovalState:
"""에이전트가 행동 제안"""
messages = state["messages"]
response = llm_with_tools.invoke(messages)
# 중요한 행동은 승인 대기
if hasattr(response, 'tool_calls') and response.tool_calls:
tool_name = response.tool_calls[0]['name']
if tool_name in ["send_email", "delete_file", "make_payment"]:
return {
"messages": [response],
"pending_action": tool_name,
"approved": False
}
return {"messages": [response]}
def human_approval_node(state: ApprovalState) -> ApprovalState:
"""인간 승인 노드 (인터럽트)"""
# 이 노드에서 실행이 중단되고 인간 입력 대기
print(f"\n승인이 필요한 작업: {state['pending_action']}")
print("계속하려면 'approve', 취소하려면 'reject'를 입력하세요")
# 실제로는 웹 UI나 Slack으로 알림 발송
return state
def check_approval(state: ApprovalState) -> str:
if state.get("approved"):
return "execute"
elif state.get("pending_action") and not state.get("approved"):
return "human_approval"
return END
workflow = StateGraph(ApprovalState)
workflow.add_node("agent", agent_node)
workflow.add_node("human_approval", human_approval_node)
workflow.add_node("tools", ToolNode(tools))
workflow.set_entry_point("agent")
workflow.add_conditional_edges("agent", check_approval, {
"human_approval": "human_approval",
"execute": "tools",
END: END
})
workflow.add_edge("tools", "agent")
# 인터럽트 포인트 설정
app = workflow.compile(
checkpointer=memory,
interrupt_before=["human_approval"] # 이 노드 전에 중단
)
# 실행 (중단됨)
thread_id = "session_001"
config = {"configurable": {"thread_id": thread_id}}
result = app.invoke(
{"messages": [HumanMessage(content="팀원들에게 회의 초대 이메일 보내줘")]},
config=config
)
# 인간 승인 후 재개
app.update_state(config, {"approved": True})
final_result = app.invoke(None, config=config) # None = 이전 상태에서 재개
5.4 리서치 에이전트 구현
from langgraph.graph import StateGraph, END
from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage
from typing import TypedDict, Annotated, List
import operator
class ResearchState(TypedDict):
topic: str
search_queries: List[str]
search_results: List[str]
draft: str
final_report: str
iteration: int
llm = ChatOpenAI(model="gpt-4o", temperature=0)
def plan_queries(state: ResearchState) -> ResearchState:
"""리서치 쿼리 계획"""
topic = state["topic"]
response = llm.invoke([
HumanMessage(content=f"""주제: {topic}
이 주제를 완전히 조사하기 위한 5개의 검색 쿼리를 생성하세요.
JSON 형식으로 반환: {{"queries": ["쿼리1", "쿼리2", ...]}}""")
])
import json
queries = json.loads(response.content)["queries"]
return {"search_queries": queries}
def execute_searches(state: ResearchState) -> ResearchState:
"""병렬 검색 실행"""
queries = state["search_queries"]
results = []
for query in queries:
result = search.run(query)
results.append(f"[{query}]\n{result}")
return {"search_results": results}
def write_draft(state: ResearchState) -> ResearchState:
"""초안 작성"""
topic = state["topic"]
results = "\n\n".join(state["search_results"])
response = llm.invoke([
HumanMessage(content=f"""주제: {topic}
수집된 정보:
{results}
위 정보를 바탕으로 상세한 리서치 보고서 초안을 작성하세요.""")
])
return {"draft": response.content, "iteration": state.get("iteration", 0) + 1}
def review_and_improve(state: ResearchState) -> ResearchState:
"""초안 검토 및 개선"""
draft = state["draft"]
response = llm.invoke([
HumanMessage(content=f"""다음 리서치 보고서 초안을 검토하고 개선하세요:
{draft}
개선 사항:
1. 정확성 확인
2. 논리 흐름 개선
3. 중요 정보 추가
4. 결론 강화
최종 보고서를 작성하세요.""")
])
return {"final_report": response.content}
def should_improve(state: ResearchState) -> str:
"""더 개선이 필요한지 판단"""
if state.get("iteration", 0) < 2:
return "improve"
return "finalize"
# 리서치 워크플로 그래프
research_graph = StateGraph(ResearchState)
research_graph.add_node("plan_queries", plan_queries)
research_graph.add_node("execute_searches", execute_searches)
research_graph.add_node("write_draft", write_draft)
research_graph.add_node("review_and_improve", review_and_improve)
research_graph.set_entry_point("plan_queries")
research_graph.add_edge("plan_queries", "execute_searches")
research_graph.add_edge("execute_searches", "write_draft")
research_graph.add_conditional_edges(
"write_draft",
should_improve,
{
"improve": "execute_searches", # 추가 검색 필요
"finalize": "review_and_improve"
}
)
research_graph.add_edge("review_and_improve", END)
research_app = research_graph.compile()
# 실행
result = research_app.invoke({
"topic": "2026년 AI 에이전트 기술 동향",
"search_queries": [],
"search_results": [],
"draft": "",
"final_report": "",
"iteration": 0
})
print(result["final_report"])
6. LlamaIndex
LlamaIndex는 데이터 중심의 AI 에이전트 프레임워크입니다.
6.1 LlamaIndex 에이전트
from llama_index.core.agent import ReActAgent
from llama_index.core.tools import FunctionTool, QueryEngineTool
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.llms.openai import OpenAI
from llama_index.core.settings import Settings
# LLM 설정
Settings.llm = OpenAI(model="gpt-4o", temperature=0)
# FunctionTool 정의
def multiply(a: float, b: float) -> float:
"""두 수를 곱합니다."""
return a * b
def add(a: float, b: float) -> float:
"""두 수를 더합니다."""
return a + b
multiply_tool = FunctionTool.from_defaults(fn=multiply)
add_tool = FunctionTool.from_defaults(fn=add)
# QueryEngineTool (문서 검색 도구)
documents = SimpleDirectoryReader("./data").load_data()
index = VectorStoreIndex.from_documents(documents)
query_engine = index.as_query_engine(similarity_top_k=3)
query_tool = QueryEngineTool.from_defaults(
query_engine=query_engine,
name="knowledge_base",
description="회사 내부 문서에서 정보를 검색합니다."
)
# ReAct 에이전트
agent = ReActAgent.from_tools(
[multiply_tool, add_tool, query_tool],
llm=Settings.llm,
verbose=True,
max_iterations=10
)
# 실행
response = agent.chat("내부 문서에서 AI 정책을 찾고, 정책 위반 시 벌금 5만원과 10만원을 더해줘")
print(response)
6.2 멀티 도큐먼트 RAG 에이전트
from llama_index.core.agent import ReActAgent
from llama_index.core.tools import QueryEngineTool
from llama_index.core import VectorStoreIndex, SummaryIndex
from llama_index.core.node_parser import SentenceSplitter
from llama_index.core import SimpleDirectoryReader
# 여러 문서 세트 로드
docs_finance = SimpleDirectoryReader("./finance_docs").load_data()
docs_hr = SimpleDirectoryReader("./hr_docs").load_data()
docs_technical = SimpleDirectoryReader("./technical_docs").load_data()
# 각 문서 세트마다 인덱스 생성
splitter = SentenceSplitter(chunk_size=512)
finance_index = VectorStoreIndex.from_documents(
docs_finance, transformations=[splitter]
)
hr_index = VectorStoreIndex.from_documents(
docs_hr, transformations=[splitter]
)
tech_index = VectorStoreIndex.from_documents(
docs_technical, transformations=[splitter]
)
# 각 인덱스를 도구로 변환
tools = [
QueryEngineTool.from_defaults(
query_engine=finance_index.as_query_engine(),
name="finance_qa",
description="재무, 회계, 예산 관련 질문 답변"
),
QueryEngineTool.from_defaults(
query_engine=hr_index.as_query_engine(),
name="hr_qa",
description="인사, 채용, 복리후생 관련 질문 답변"
),
QueryEngineTool.from_defaults(
query_engine=tech_index.as_query_engine(),
name="tech_qa",
description="기술 사양, 개발 가이드 관련 질문 답변"
),
]
# 멀티 도큐먼트 에이전트
agent = ReActAgent.from_tools(tools, verbose=True)
response = agent.chat("2026년 IT 예산과 신규 채용 계획을 알려줘")
print(response)
7. CrewAI
CrewAI는 역할 기반의 멀티 에이전트 협업 프레임워크입니다.
7.1 CrewAI 핵심 개념
Crew (팀)
├── Agents (에이전트들) - 각자 역할과 목표
│ ├── Role (역할): "선임 리서처", "콘텐츠 작가"
│ ├── Goal (목표): 달성하고자 하는 것
│ ├── Backstory (배경): 에이전트의 성격/전문성
│ └── Tools (도구): 사용 가능한 도구들
└── Tasks (태스크들)
├── Description (설명): 해야 할 일
├── Expected Output: 기대 출력
└── Agent: 담당 에이전트
7.2 리서치 팀 에이전트
from crewai import Agent, Task, Crew, Process
from crewai_tools import SerperDevTool, WebsiteSearchTool
from langchain_openai import ChatOpenAI
llm = ChatOpenAI(model="gpt-4o", temperature=0.7)
# 도구
search_tool = SerperDevTool()
web_tool = WebsiteSearchTool()
# 에이전트 정의
researcher = Agent(
role="선임 리서처",
goal="주어진 주제에 대한 포괄적이고 정확한 정보를 수집합니다",
backstory="""당신은 10년 경력의 전문 리서처입니다. 복잡한 주제를
체계적으로 조사하고 핵심 인사이트를 추출하는 전문가입니다.
항상 신뢰할 수 있는 출처에서 최신 정보를 수집합니다.""",
tools=[search_tool, web_tool],
llm=llm,
verbose=True
)
analyst = Agent(
role="데이터 분석가",
goal="수집된 정보를 분석하고 패턴과 트렌드를 파악합니다",
backstory="""당신은 데이터 분석 전문가입니다. 원시 데이터에서
의미 있는 패턴을 발견하고 실행 가능한 인사이트를 도출합니다.
통계적 방법과 비즈니스 지식을 결합하여 분석합니다.""",
tools=[search_tool],
llm=llm,
verbose=True
)
writer = Agent(
role="콘텐츠 작가",
goal="분석 결과를 명확하고 설득력 있는 보고서로 작성합니다",
backstory="""당신은 기술적 내용을 일반 독자도 이해할 수 있게
설명하는 전문 작가입니다. 복잡한 분석 결과를 스토리텔링으로
전달하는 것을 좋아합니다.""",
llm=llm,
verbose=True
)
# 태스크 정의
research_task = Task(
description="""'{topic}'에 대해 다음을 조사하세요:
1. 최신 동향과 발전 사항
2. 주요 플레이어와 그들의 접근 방식
3. 잠재적 기회와 위험 요소
4. 관련 통계와 데이터
최소 5개의 신뢰할 수 있는 출처를 인용하세요.""",
expected_output="조사 결과 요약 (최소 500자)",
agent=researcher
)
analysis_task = Task(
description="""리서처가 수집한 정보를 분석하여:
1. 핵심 트렌드 3가지 식별
2. SWOT 분석 수행
3. 6개월~1년 단기 전망
4. 주요 리스크 요인
데이터 기반의 객관적 분석을 제공하세요.""",
expected_output="분석 보고서 (최소 400자)",
agent=analyst,
context=[research_task] # 리서치 태스크 결과 활용
)
report_task = Task(
description="""리서치와 분석 결과를 종합하여 전문적인 보고서를 작성하세요:
보고서 구성:
1. 요약 (Executive Summary)
2. 현황 분석
3. 핵심 인사이트
4. 권장 사항
5. 결론
전문적이고 설득력 있게 작성하세요.""",
expected_output="완성된 보고서 (최소 800자)",
agent=writer,
context=[research_task, analysis_task]
)
# 크루 구성 (순차적 실행)
research_crew = Crew(
agents=[researcher, analyst, writer],
tasks=[research_task, analysis_task, report_task],
process=Process.sequential, # 순차 실행
verbose=True
)
# 실행
result = research_crew.kickoff(inputs={"topic": "2026년 AI 에이전트 시장 분석"})
print(result)
7.3 소프트웨어 개발 에이전트 팀
from crewai import Agent, Task, Crew, Process
from crewai_tools import CodeInterpreterTool
code_interpreter = CodeInterpreterTool()
# 소프트웨어 개발 팀
product_manager = Agent(
role="제품 관리자",
goal="요구사항을 명확히 정의하고 개발 계획을 수립합니다",
backstory="10년 경력의 PM으로 기술적 요구사항과 비즈니스 목표를 연결합니다.",
llm=llm,
verbose=True
)
senior_developer = Agent(
role="시니어 개발자",
goal="고품질의 확장 가능한 코드를 작성합니다",
backstory="풀스택 개발자로 Python, FastAPI, React에 전문성을 보유합니다.",
tools=[code_interpreter],
llm=llm,
verbose=True
)
qa_engineer = Agent(
role="QA 엔지니어",
goal="코드를 철저히 테스트하고 품질을 보장합니다",
backstory="소프트웨어 테스팅 전문가로 버그를 찾는 것을 좋아합니다.",
tools=[code_interpreter],
llm=llm,
verbose=True
)
# 개발 태스크
requirements_task = Task(
description="""'{feature_request}'에 대한 기술 요구사항을 정의하세요:
1. 사용자 스토리
2. 기능 요구사항 목록
3. 비기능 요구사항 (성능, 보안)
4. API 설계 (엔드포인트 목록)""",
expected_output="요구사항 문서",
agent=product_manager
)
development_task = Task(
description="""요구사항 문서를 기반으로 Python FastAPI 코드를 작성하세요:
1. 완전한 API 구현
2. 데이터 모델 (Pydantic)
3. 에러 처리
4. 코드 주석""",
expected_output="완성된 Python 코드",
agent=senior_developer,
context=[requirements_task]
)
testing_task = Task(
description="""작성된 코드를 검토하고 테스트하세요:
1. 코드 리뷰 (버그, 보안 취약점)
2. 유닛 테스트 작성
3. 엣지 케이스 테스트
4. 개선 사항 제안""",
expected_output="테스트 보고서와 개선된 코드",
agent=qa_engineer,
context=[development_task]
)
# 개발 크루
dev_crew = Crew(
agents=[product_manager, senior_developer, qa_engineer],
tasks=[requirements_task, development_task, testing_task],
process=Process.sequential,
verbose=True
)
result = dev_crew.kickoff(
inputs={"feature_request": "사용자 인증 API (JWT 기반)"}
)
print(result)
7.4 계층적 CrewAI
# 관리자가 작업을 위임하는 계층적 구조
manager = Agent(
role="프로젝트 매니저",
goal="팀을 조율하고 최고의 결과물을 만들어냅니다",
backstory="경험 많은 PM으로 팀원들의 강점을 최대화합니다.",
llm=llm,
verbose=True,
allow_delegation=True # 다른 에이전트에게 위임 가능
)
hierarchical_crew = Crew(
agents=[manager, researcher, analyst, writer],
tasks=[report_task], # 최종 태스크만 정의 (나머지는 자동 분배)
process=Process.hierarchical, # 계층적 실행
manager_agent=manager,
verbose=True
)
8. 에이전트 메모리
8.1 메모리 아키텍처
from langchain.memory import (
ConversationBufferMemory,
ConversationSummaryBufferMemory,
ConversationEntityMemory,
)
from langchain_community.vectorstores import FAISS
from langchain_openai import OpenAIEmbeddings, ChatOpenAI
llm = ChatOpenAI(model="gpt-4o-mini")
embeddings = OpenAIEmbeddings()
# 1. 단기 메모리 - 최근 N개 메시지
from langchain.memory import ConversationBufferWindowMemory
short_term = ConversationBufferWindowMemory(k=5, return_messages=True)
# 2. 요약 메모리 - 오래된 대화를 요약
summary_memory = ConversationSummaryBufferMemory(
llm=llm,
max_token_limit=1000,
return_messages=True
)
# 3. 엔티티 메모리 - 핵심 정보 추출
entity_memory = ConversationEntityMemory(
llm=llm,
return_messages=True
)
# 4. 장기 메모리 - 벡터 DB
class LongTermMemory:
def __init__(self):
self.vectorstore = FAISS.from_texts(["init"], embeddings)
self.retriever = self.vectorstore.as_retriever(search_kwargs={"k": 5})
def save(self, text: str, metadata: dict = None):
"""중요 정보를 장기 메모리에 저장"""
self.vectorstore.add_texts([text], metadatas=[metadata or {}])
def recall(self, query: str) -> list:
"""관련 기억 검색"""
docs = self.retriever.get_relevant_documents(query)
return [doc.page_content for doc in docs]
# 5. 에피소딕 메모리 - 과거 경험
class EpisodicMemory:
"""에이전트의 과거 성공/실패 경험 저장"""
def __init__(self):
self.episodes = []
def save_episode(self, task: str, actions: list, result: str, success: bool):
episode = {
"task": task,
"actions": actions,
"result": result,
"success": success,
"timestamp": datetime.datetime.now().isoformat()
}
self.episodes.append(episode)
def find_similar_episodes(self, current_task: str) -> list:
"""유사한 과거 경험 검색"""
# 실제로는 벡터 유사도로 검색
return [e for e in self.episodes if e["success"]]
# 통합 메모리 시스템
class AgentMemorySystem:
def __init__(self):
self.short_term = ConversationBufferWindowMemory(k=10)
self.long_term = LongTermMemory()
self.episodic = EpisodicMemory()
self.entities = {}
def save_message(self, role: str, content: str):
self.short_term.save_context(
{"input": content if role == "human" else ""},
{"output": content if role == "ai" else ""}
)
def extract_entities(self, text: str) -> dict:
"""텍스트에서 핵심 정보 추출 (이름, 날짜, 숫자 등)"""
# 실제로는 NER 모델 사용
return {}
def get_relevant_context(self, query: str) -> str:
"""관련 메모리 검색"""
recent = self.short_term.load_memory_variables({})
long_term = self.long_term.recall(query)
past_episodes = self.episodic.find_similar_episodes(query)
context = f"""최근 대화: {recent.get('history', '')}
관련 기억: {'; '.join(long_term[:3])}
유사 경험: {past_episodes[:2] if past_episodes else '없음'}"""
return context
memory_system = AgentMemorySystem()
9. 코드 실행 에이전트
9.1 Python REPL 에이전트
from langchain_experimental.tools import PythonREPLTool
from langchain.agents import create_openai_tools_agent, AgentExecutor
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
python_repl = PythonREPLTool()
llm = ChatOpenAI(model="gpt-4o", temperature=0)
# 데이터 분석 에이전트
data_analysis_prompt = ChatPromptTemplate.from_messages([
("system", """당신은 전문 데이터 분석가입니다.
Python과 pandas, matplotlib, seaborn을 능숙하게 사용합니다.
데이터 분석 요청을 받으면 코드를 작성하고 실행하여 결과를 도출합니다.
항상 코드와 함께 분석 결과를 설명하세요."""),
MessagesPlaceholder(variable_name="chat_history"),
("human", "{input}"),
MessagesPlaceholder(variable_name="agent_scratchpad"),
])
agent = create_openai_tools_agent(
llm,
[python_repl],
data_analysis_prompt
)
data_agent = AgentExecutor(agent=agent, tools=[python_repl], verbose=True)
# 실행 예시
result = data_agent.invoke({
"input": """다음 데이터를 분석해줘:
sales = [100, 150, 120, 200, 180, 250, 220, 300, 280, 350, 320, 400]
months = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun',
'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']
월별 매출 트렌드를 분석하고 전년 대비 성장률을 계산해줘""",
"chat_history": []
})
print(result["output"])
9.2 Docker 샌드박스 코드 실행
import docker
import tempfile
import os
class DockerCodeExecutor:
"""Docker 컨테이너에서 안전하게 코드 실행"""
def __init__(self, image="python:3.11-slim", timeout=30):
self.client = docker.from_env()
self.image = image
self.timeout = timeout
def execute(self, code: str, packages: list = None) -> dict:
"""
Docker 컨테이너에서 Python 코드 실행
Returns: {success: bool, output: str, error: str}
"""
with tempfile.TemporaryDirectory() as tmpdir:
# 코드 파일 저장
code_file = os.path.join(tmpdir, "script.py")
with open(code_file, "w") as f:
f.write(code)
# 패키지 설치 명령
install_cmd = ""
if packages:
pkgs = " ".join(packages)
install_cmd = f"pip install {pkgs} -q && "
try:
container = self.client.containers.run(
self.image,
command=f'sh -c "{install_cmd}python /code/script.py"',
volumes={tmpdir: {"bind": "/code", "mode": "ro"}},
remove=True,
network_mode="none", # 네트워크 차단
mem_limit="256m", # 메모리 제한
cpu_period=100000,
cpu_quota=50000, # CPU 50% 제한
timeout=self.timeout,
stdout=True,
stderr=True
)
return {
"success": True,
"output": container.decode("utf-8"),
"error": ""
}
except docker.errors.ContainerError as e:
return {
"success": False,
"output": "",
"error": e.stderr.decode("utf-8") if e.stderr else str(e)
}
except Exception as e:
return {"success": False, "output": "", "error": str(e)}
executor = DockerCodeExecutor()
code = """
import pandas as pd
import json
data = {'이름': ['Alice', 'Bob', 'Charlie'], '점수': [85, 92, 78]}
df = pd.DataFrame(data)
result = df.describe().to_dict()
print(json.dumps(result, ensure_ascii=False, indent=2))
"""
result = executor.execute(code, packages=["pandas"])
print(result["output"])
10. 에이전트 평가와 모니터링
10.1 LangSmith 추적
import os
from langsmith import Client
from langsmith.run_helpers import traceable
# LangSmith 설정
os.environ["LANGCHAIN_TRACING_V2"] = "true"
os.environ["LANGCHAIN_API_KEY"] = "your-langsmith-api-key"
os.environ["LANGCHAIN_PROJECT"] = "ai-agent-evaluation"
@traceable(run_type="chain")
def run_agent_with_tracking(user_input: str):
"""LangSmith로 추적되는 에이전트 실행"""
result = agent_executor.invoke({"input": user_input})
return result
# LangSmith 클라이언트로 실행 데이터 조회
client = Client()
# 최근 실행 데이터 조회
runs = client.list_runs(
project_name="ai-agent-evaluation",
run_type="chain"
)
for run in list(runs)[:5]:
print(f"Run ID: {run.id}")
print(f"상태: {run.status}")
print(f"실행 시간: {run.end_time - run.start_time if run.end_time else 'N/A'}")
print(f"토큰 사용: {run.total_tokens}")
print("---")
10.2 에이전트 성능 지표
import time
import json
from dataclasses import dataclass, field
from typing import List, Optional
@dataclass
class AgentMetrics:
"""에이전트 실행 지표"""
task: str
success: bool
total_time: float
num_iterations: int
tools_used: List[str]
tokens_used: int
error_message: Optional[str] = None
final_answer: Optional[str] = None
class AgentEvaluator:
"""에이전트 성능 평가 시스템"""
def __init__(self, agent_executor):
self.agent = agent_executor
self.metrics_history: List[AgentMetrics] = []
def evaluate(self, task: str, expected_keywords: list = None) -> AgentMetrics:
"""단일 태스크 평가"""
start_time = time.time()
tools_used = []
iterations = 0
try:
result = self.agent.invoke({"input": task})
total_time = time.time() - start_time
answer = result.get("output", "")
# 성공 판단
success = True
if expected_keywords:
success = any(kw.lower() in answer.lower() for kw in expected_keywords)
metrics = AgentMetrics(
task=task,
success=success,
total_time=total_time,
num_iterations=iterations,
tools_used=tools_used,
tokens_used=0,
final_answer=answer
)
except Exception as e:
metrics = AgentMetrics(
task=task,
success=False,
total_time=time.time() - start_time,
num_iterations=0,
tools_used=[],
tokens_used=0,
error_message=str(e)
)
self.metrics_history.append(metrics)
return metrics
def batch_evaluate(self, test_cases: list) -> dict:
"""배치 평가"""
results = []
for case in test_cases:
task = case["task"]
keywords = case.get("expected_keywords", [])
metrics = self.evaluate(task, keywords)
results.append(metrics)
# 집계 통계
successes = [r for r in results if r.success]
success_rate = len(successes) / len(results) if results else 0
avg_time = sum(r.total_time for r in results) / len(results)
return {
"total_tasks": len(results),
"success_rate": success_rate,
"avg_response_time": avg_time,
"failed_tasks": [r.task for r in results if not r.success],
"detailed_results": results
}
def generate_report(self) -> str:
"""성능 보고서 생성"""
if not self.metrics_history:
return "평가 데이터 없음"
total = len(self.metrics_history)
successes = sum(1 for m in self.metrics_history if m.success)
avg_time = sum(m.total_time for m in self.metrics_history) / total
report = f"""
=== 에이전트 성능 보고서 ===
총 태스크: {total}
성공률: {successes/total*100:.1f}%
평균 응답 시간: {avg_time:.2f}초
실패 태스크:
"""
for m in self.metrics_history:
if not m.success:
report += f" - {m.task}: {m.error_message or '품질 미달'}\n"
return report
# 테스트 케이스
test_cases = [
{
"task": "현재 시간을 알려줘",
"expected_keywords": ["2026", "시", "분"]
},
{
"task": "100의 제곱근은?",
"expected_keywords": ["10"]
},
{
"task": "AI 에이전트의 주요 구성 요소를 설명해줘",
"expected_keywords": ["LLM", "도구", "메모리"]
}
]
evaluator = AgentEvaluator(agent_executor)
report = evaluator.batch_evaluate(test_cases)
print(f"성공률: {report['success_rate']*100:.1f}%")
print(f"평균 응답 시간: {report['avg_response_time']:.2f}초")
마무리
AI 에이전트는 단순한 챗봇을 넘어 진정한 자율 AI 시스템으로 발전하고 있습니다.
핵심 프레임워크 선택 가이드:
| 사용 사례 | 권장 프레임워크 |
|---|---|
| 빠른 프로토타이핑 | LangChain |
| 복잡한 워크플로 | LangGraph |
| 문서 Q&A 에이전트 | LlamaIndex |
| 멀티 에이전트 협업 | CrewAI |
| 자체 프레임워크 | OpenAI Function Calling |
에이전트 개발 베스트 프랙티스:
- 작게 시작 - 단순한 ReAct 에이전트부터
- 도구 명확히 - 각 도구의 description을 명확하게
- 메모리 설계 - 어떤 정보를 기억할지 미리 계획
- 에러 처리 - 도구 실패, 루프 방지 필수
- 모니터링 - LangSmith로 모든 실행 추적
- 비용 관리 - 토큰 사용량 모니터링
참고 자료
- ReAct 논문 - Yao et al., 2022
- LangChain 공식 문서
- LangGraph 공식 문서
- CrewAI 공식 문서
- LlamaIndex 공식 문서
AI Agents Complete Guide: Building Autonomous AI Systems with LangChain, LangGraph, and CrewAI
Table of Contents
- What Are AI Agents?
- ReAct - Reasoning and Acting
- Tool Use
- LangChain Complete Guide
- LangGraph - State Machine Agents
- LlamaIndex Agents
- CrewAI - Multi-Agent Collaboration
- Agent Memory
- Code Execution Agents
- Agent Evaluation and Monitoring
1. What Are AI Agents?
1.1 Defining an Agent
An AI agent is an autonomous system that perceives its environment, selects actions, and executes them to achieve a goal.
Comparison with simple chatbots:
| Property | Chatbot | AI Agent |
|---|---|---|
| Action capability | Text generation only | Tool use, code execution, search, etc. |
| Planning | None | Multi-step planning |
| Memory | Within conversation | Long-term memory possible |
| Autonomy | Low | High |
| Execution | Single response | Iterates until goal achieved |
1.2 The Four Core Components of an Agent
1. LLM (The Brain)
Handles all reasoning and judgment. Answers questions like "What should I do next?" and "Does this result satisfy the goal?"
2. Tool Use (The Hands)
Interacts with the external world. Web search, calculator, code execution, database queries, API calls — all fall here.
3. Memory (Recollection)
Short-term memory (conversation history), long-term memory (vector DB), and episodic memory (past experiences).
4. Planning (Strategy)
Decomposes complex goals into smaller sub-tasks and determines execution order.
1.3 The Agent Execution Loop
User goal input
↓
[Plan]
Decompose goal into sub-tasks
↓
[Select Action]
Decide next action (which tool to use?)
↓
[Execute Tool]
Run the selected tool
↓
[Observe Result]
Review tool output
↓
[Goal achieved?] → Yes → Generate final answer
↓ No
Back to Select Action
1.4 Applications of AI Agents
- Research agents: Web search → information gathering → summary report
- Code agents: Requirements analysis → code writing → test execution
- Data analysis agents: Load data → analyze → visualize
- Customer service agents: Identify query → system lookup → respond
- DevOps agents: Monitor → detect issue → auto-remediate
2. ReAct
ReAct (Reasoning + Acting) is an agent framework published in 2022. The core is a "think and act" loop.
2.1 The ReAct Framework
Traditional Chain-of-Thought (CoT) only thinks. An agent needs think + act + observe.
Thought: I need up-to-date stock price data to answer this question
Action: Search["Samsung Electronics stock price 2026"]
Observation: Samsung Electronics current price: 78,000 KRW, +2.3% from previous day
Thought: I have the price data. Now I should calculate the change amount
Action: Calculator[78000 * 0.023]
Observation: 1794
Thought: The stock rose by 1,794 KRW from yesterday. I can now answer
Final Answer: Samsung Electronics is currently at 78,000 KRW, up 1,794 KRW (+2.3%) from the prior day.
2.2 Implementing a ReAct Agent
from langchain import hub
from langchain.agents import create_react_agent, AgentExecutor
from langchain_openai import ChatOpenAI
from langchain.tools import Tool
from langchain_community.tools import DuckDuckGoSearchRun
from langchain_core.prompts import PromptTemplate
REACT_TEMPLATE = """You are a helpful AI assistant.
You have access to the following tools:
{tools}
Use the following format:
Question: the input question you must answer
Thought: you should always think about what to do
Action: the action to take, should be one of [{tool_names}]
Action Input: the input to the action
Observation: the result of the action
... (this Thought/Action/Observation can repeat N times)
Thought: I now know the final answer
Final Answer: the final answer to the original input question
Begin!
Question: {input}
Thought:{agent_scratchpad}"""
react_prompt = PromptTemplate.from_template(REACT_TEMPLATE)
llm = ChatOpenAI(model="gpt-4o", temperature=0)
search = DuckDuckGoSearchRun()
tools = [
Tool(
name="Search",
func=search.run,
description="Useful for searching for current events and up-to-date data"
),
Tool(
name="Calculator",
func=lambda x: eval(x),
description="Useful for math calculations. Input is a Python expression"
)
]
agent = create_react_agent(llm, tools, react_prompt)
agent_executor = AgentExecutor(
agent=agent,
tools=tools,
verbose=True,
max_iterations=10,
handle_parsing_errors=True
)
result = agent_executor.invoke({
"input": "What is Bitcoin's current price in March 2026, and how does it compare to a year ago?"
})
print(result["output"])
2.3 Limitations of ReAct
- Hallucination: May generate tool actions that do not exist
- Infinite loops: Can repeat without a termination condition
- Long context: Accumulated Thought/Action/Observation steps can exceed the context window
3. Tool Use
Tools are how agents interact with the external world.
3.1 OpenAI Function Calling
OpenAI's Function Calling allows the LLM to invoke functions in a structured way.
from openai import OpenAI
import json
client = OpenAI()
functions = [
{
"name": "get_weather",
"description": "Get current weather for a specific city",
"parameters": {
"type": "object",
"properties": {
"city": {
"type": "string",
"description": "City name to get weather for"
},
"unit": {
"type": "string",
"enum": ["celsius", "fahrenheit"],
"description": "Temperature unit"
}
},
"required": ["city"]
}
},
{
"name": "search_database",
"description": "Search internal database for information",
"parameters": {
"type": "object",
"properties": {
"query": {
"type": "string",
"description": "Search query"
},
"table": {
"type": "string",
"enum": ["users", "products", "orders"],
"description": "Table to search"
}
},
"required": ["query"]
}
}
]
def get_weather(city: str, unit: str = "celsius") -> dict:
return {
"city": city,
"temperature": 15,
"unit": unit,
"condition": "sunny",
"humidity": 60
}
def search_database(query: str, table: str = "products") -> list:
return [{"id": 1, "name": "Sample Product", "price": 100}]
available_tools = {
"get_weather": get_weather,
"search_database": search_database
}
def run_agent_with_tools(user_message: str) -> str:
messages = [{"role": "user", "content": user_message}]
while True:
response = client.chat.completions.create(
model="gpt-4o",
messages=messages,
tools=[{"type": "function", "function": f} for f in functions],
tool_choice="auto"
)
message = response.choices[0].message
if not message.tool_calls:
return message.content
messages.append(message)
for tool_call in message.tool_calls:
func_name = tool_call.function.name
func_args = json.loads(tool_call.function.arguments)
print(f" Tool call: {func_name}({func_args})")
if func_name in available_tools:
result = available_tools[func_name](**func_args)
else:
result = {"error": f"Unknown function: {func_name}"}
messages.append({
"role": "tool",
"tool_call_id": tool_call.id,
"content": json.dumps(result)
})
answer = run_agent_with_tools("What's the weather in London? Do I need an umbrella?")
print(answer)
3.2 Diverse Tool Examples
from langchain.tools import tool
from langchain_community.tools import WikipediaQueryRun
from langchain_community.utilities import WikipediaAPIWrapper
import subprocess
import sqlite3
@tool
def calculator(expression: str) -> str:
"""Perform mathematical calculations. Input a Python expression."""
try:
result = eval(expression)
return str(result)
except Exception as e:
return f"Calculation error: {e}"
@tool
def run_python_code(code: str) -> str:
"""Execute Python code and return the result."""
try:
local_vars = {}
exec(code, {"__builtins__": {}}, local_vars)
output = local_vars.get('result', 'No result variable found')
return str(output)
except Exception as e:
return f"Code execution error: {e}"
@tool
def query_database(sql: str) -> str:
"""Execute a SQL query against the SQLite database."""
try:
conn = sqlite3.connect("agent_db.sqlite")
cursor = conn.cursor()
cursor.execute(sql)
rows = cursor.fetchall()
conn.close()
return str(rows)
except Exception as e:
return f"DB error: {e}"
@tool
def send_email(to: str, subject: str, body: str) -> str:
"""Send an email."""
print(f"Sending email to: {to}")
print(f"Subject: {subject}")
print(f"Body: {body[:100]}...")
return f"Email successfully sent to {to}."
wikipedia = WikipediaQueryRun(api_wrapper=WikipediaAPIWrapper())
@tool
def search_wikipedia(query: str) -> str:
"""Search for information on Wikipedia."""
return wikipedia.run(query)
4. LangChain
LangChain is the standard framework for building LLM applications.
4.1 LangChain Core Components
LangChain Architecture
├── Models (LLM, Chat Models, Embeddings)
├── Prompts (PromptTemplate, ChatPromptTemplate)
├── Chains (LLMChain, SequentialChain, LCEL)
├── Memory (Buffer, Summary, VectorStore)
├── Agents (ReAct, OpenAI Functions)
├── Tools (Built-in + Custom)
└── Retrievers (VectorStore, MultiQuery)
4.2 LCEL (LangChain Expression Language)
Modern LangChain uses LCEL pipelines.
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnableParallel
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
# Basic chain
prompt = ChatPromptTemplate.from_messages([
("system", "You are an expert analyst."),
("human", "{question}")
])
chain = prompt | llm | StrOutputParser()
result = chain.invoke({"question": "What are the advantages of AI agents?"})
print(result)
# Structured output
from pydantic import BaseModel, Field
class AnalysisResult(BaseModel):
summary: str = Field(description="Summary")
key_points: list[str] = Field(description="List of key points")
recommendation: str = Field(description="Recommendation")
structured_chain = prompt | llm.with_structured_output(AnalysisResult)
result = structured_chain.invoke({"question": "Compare LangChain vs LlamaIndex"})
print(result.summary)
print(result.key_points)
# Parallel chain
parallel_chain = RunnableParallel({
"pros": ChatPromptTemplate.from_template("What are the advantages of {topic}?") | llm | StrOutputParser(),
"cons": ChatPromptTemplate.from_template("What are the disadvantages of {topic}?") | llm | StrOutputParser(),
})
result = parallel_chain.invoke({"topic": "AI agents"})
print("Pros:", result["pros"])
print("Cons:", result["cons"])
4.3 Memory Management
from langchain.memory import ConversationBufferMemory, ConversationSummaryMemory
from langchain.chains import ConversationChain
from langchain_openai import ChatOpenAI
# Buffer memory (keeps all messages)
buffer_memory = ConversationBufferMemory(
memory_key="history",
return_messages=True
)
# Summary memory (summarizes old conversations)
summary_memory = ConversationSummaryMemory(
llm=ChatOpenAI(model="gpt-4o-mini"),
memory_key="history",
return_messages=True
)
llm = ChatOpenAI(model="gpt-4o", temperature=0.7)
conversation = ConversationChain(
llm=llm,
memory=buffer_memory,
verbose=True
)
r1 = conversation.predict(input="My name is Alice")
r2 = conversation.predict(input="What's my name?") # Remembers!
print(r1, r2)
# Vector store based long-term memory
from langchain.memory import VectorStoreRetrieverMemory
from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import FAISS
embeddings = OpenAIEmbeddings()
vectorstore = FAISS.from_texts(["dummy"], embeddings)
retriever = vectorstore.as_retriever(search_kwargs={"k": 3})
vector_memory = VectorStoreRetrieverMemory(retriever=retriever)
vector_memory.save_context(
{"input": "My favorite food is sushi"},
{"output": "Got it!"}
)
relevant = vector_memory.load_memory_variables({"prompt": "Recommend a food"})
print(relevant)
4.4 RAG (Retrieval Augmented Generation)
from langchain_community.document_loaders import WebBaseLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_openai import OpenAIEmbeddings, ChatOpenAI
from langchain_community.vectorstores import Chroma
from langchain.chains import RetrievalQA
from langchain_core.prompts import ChatPromptTemplate
loader = WebBaseLoader("https://example.com/document")
documents = loader.load()
splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
chunks = splitter.split_documents(documents)
embeddings = OpenAIEmbeddings()
vectorstore = Chroma.from_documents(chunks, embeddings)
llm = ChatOpenAI(model="gpt-4o", temperature=0)
rag_prompt = ChatPromptTemplate.from_template("""
Answer the question based only on the following context.
If the answer is not in the context, say you don't know.
Context:
{context}
Question: {question}
Answer:""")
qa_chain = RetrievalQA.from_chain_type(
llm=llm,
retriever=vectorstore.as_retriever(search_kwargs={"k": 4}),
chain_type_kwargs={"prompt": rag_prompt},
return_source_documents=True
)
result = qa_chain.invoke({"query": "What are the main points?"})
print("Answer:", result["result"])
print("Sources:", [doc.metadata for doc in result["source_documents"]])
4.5 Complete LangChain Agent
from langchain_openai import ChatOpenAI
from langchain.agents import create_openai_tools_agent, AgentExecutor
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain.memory import ConversationBufferWindowMemory
from langchain.tools import tool
from langchain_community.tools import DuckDuckGoSearchRun
import datetime
search = DuckDuckGoSearchRun()
@tool
def get_current_datetime() -> str:
"""Returns the current date and time."""
return datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S")
@tool
def calculate(expression: str) -> str:
"""Perform math calculations. Example: 2+2, 10*5, sqrt(16)"""
import math
safe_dict = {k: getattr(math, k) for k in dir(math) if not k.startswith('_')}
safe_dict['abs'] = abs
try:
return str(eval(expression, {"__builtins__": {}}, safe_dict))
except Exception as e:
return f"Calculation error: {e}"
@tool
def web_search(query: str) -> str:
"""Search the web for current information."""
return search.run(query)
tools = [get_current_datetime, calculate, web_search]
prompt = ChatPromptTemplate.from_messages([
("system", """You are a helpful AI assistant.
Answer the user's questions accurately and helpfully.
Use tools when necessary to gather information."""),
MessagesPlaceholder(variable_name="chat_history"),
("human", "{input}"),
MessagesPlaceholder(variable_name="agent_scratchpad"),
])
llm = ChatOpenAI(model="gpt-4o", temperature=0)
memory = ConversationBufferWindowMemory(
memory_key="chat_history",
return_messages=True,
k=10
)
agent = create_openai_tools_agent(llm, tools, prompt)
agent_executor = AgentExecutor(
agent=agent,
tools=tools,
memory=memory,
verbose=True,
max_iterations=5,
handle_parsing_errors=True
)
def chat(message: str) -> str:
result = agent_executor.invoke({"input": message})
return result["output"]
print(chat("What's today's date?"))
print(chat("Tell me about recent trends in AI agents"))
print(chat("What's the market cap of the company you just mentioned?")) # Uses memory
5. LangGraph
LangGraph implements agents as State Machines. Complex loops, branches, and conditional execution become possible.
5.1 Why LangGraph?
Limitations of standard LangChain agents:
- Only linear execution (loops are difficult)
- Inconvenient state management
- Complex branching
- Human-in-the-loop is hard
LangGraph's solutions:
- Graph-based execution flow
- Explicit state management
- Conditional edges for branching
- Interrupt points
5.2 LangGraph Basics
from langgraph.graph import StateGraph, END
from langgraph.prebuilt import ToolNode
from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage
from typing import TypedDict, Annotated, Sequence
import operator
# Define state schema
class AgentState(TypedDict):
messages: Annotated[Sequence, operator.add]
next: str
llm = ChatOpenAI(model="gpt-4o", temperature=0)
tools = [web_search, calculate, get_current_datetime]
llm_with_tools = llm.bind_tools(tools)
def agent_node(state: AgentState) -> AgentState:
"""LLM decides the next action"""
messages = state["messages"]
response = llm_with_tools.invoke(messages)
return {"messages": [response]}
def should_continue(state: AgentState) -> str:
"""Decide whether to continue or end (conditional edge)"""
messages = state["messages"]
last_message = messages[-1]
if hasattr(last_message, 'tool_calls') and last_message.tool_calls:
return "tools"
return END
tool_node = ToolNode(tools)
workflow = StateGraph(AgentState)
workflow.add_node("agent", agent_node)
workflow.add_node("tools", tool_node)
workflow.set_entry_point("agent")
workflow.add_conditional_edges(
"agent",
should_continue,
{
"tools": "tools",
END: END
}
)
workflow.add_edge("tools", "agent")
app = workflow.compile()
result = app.invoke({
"messages": [HumanMessage(content="What's the weather in London?")]
})
print(result["messages"][-1].content)
5.3 Human-in-the-Loop
from langgraph.graph import StateGraph, END
from langgraph.checkpoint.memory import MemorySaver
memory = MemorySaver()
class ApprovalState(TypedDict):
messages: Annotated[Sequence, operator.add]
pending_action: str
approved: bool
def agent_node(state: ApprovalState) -> ApprovalState:
messages = state["messages"]
response = llm_with_tools.invoke(messages)
if hasattr(response, 'tool_calls') and response.tool_calls:
tool_name = response.tool_calls[0]['name']
if tool_name in ["send_email", "delete_file", "make_payment"]:
return {
"messages": [response],
"pending_action": tool_name,
"approved": False
}
return {"messages": [response]}
def human_approval_node(state: ApprovalState) -> ApprovalState:
"""Human approval node (interrupt)"""
print(f"\nAction requiring approval: {state['pending_action']}")
print("Type 'approve' to continue, 'reject' to cancel")
return state
def check_approval(state: ApprovalState) -> str:
if state.get("approved"):
return "execute"
elif state.get("pending_action") and not state.get("approved"):
return "human_approval"
return END
workflow = StateGraph(ApprovalState)
workflow.add_node("agent", agent_node)
workflow.add_node("human_approval", human_approval_node)
workflow.add_node("tools", ToolNode(tools))
workflow.set_entry_point("agent")
workflow.add_conditional_edges("agent", check_approval, {
"human_approval": "human_approval",
"execute": "tools",
END: END
})
workflow.add_edge("tools", "agent")
app = workflow.compile(
checkpointer=memory,
interrupt_before=["human_approval"]
)
thread_id = "session_001"
config = {"configurable": {"thread_id": thread_id}}
result = app.invoke(
{"messages": [HumanMessage(content="Send a meeting invitation to the team")]},
config=config
)
# After human approval, resume
app.update_state(config, {"approved": True})
final_result = app.invoke(None, config=config)
5.4 Research Agent with LangGraph
from langgraph.graph import StateGraph, END
from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage
from typing import TypedDict, List
import json
class ResearchState(TypedDict):
topic: str
search_queries: List[str]
search_results: List[str]
draft: str
final_report: str
iteration: int
llm = ChatOpenAI(model="gpt-4o", temperature=0)
def plan_queries(state: ResearchState) -> ResearchState:
topic = state["topic"]
response = llm.invoke([
HumanMessage(content=f"""Topic: {topic}
Generate 5 search queries to thoroughly research this topic.
Return as JSON: {{"queries": ["query1", "query2", ...]}}""")
])
queries = json.loads(response.content)["queries"]
return {"search_queries": queries}
def execute_searches(state: ResearchState) -> ResearchState:
queries = state["search_queries"]
results = []
for query in queries:
result = search.run(query)
results.append(f"[{query}]\n{result}")
return {"search_results": results}
def write_draft(state: ResearchState) -> ResearchState:
topic = state["topic"]
results = "\n\n".join(state["search_results"])
response = llm.invoke([
HumanMessage(content=f"""Topic: {topic}
Collected information:
{results}
Write a detailed research report draft based on the above.""")
])
return {"draft": response.content, "iteration": state.get("iteration", 0) + 1}
def review_and_improve(state: ResearchState) -> ResearchState:
draft = state["draft"]
response = llm.invoke([
HumanMessage(content=f"""Review and improve the following research report draft:
{draft}
Improvements:
1. Verify accuracy
2. Improve logical flow
3. Add important information
4. Strengthen conclusions
Write the final report.""")
])
return {"final_report": response.content}
def should_improve(state: ResearchState) -> str:
if state.get("iteration", 0) < 2:
return "improve"
return "finalize"
research_graph = StateGraph(ResearchState)
research_graph.add_node("plan_queries", plan_queries)
research_graph.add_node("execute_searches", execute_searches)
research_graph.add_node("write_draft", write_draft)
research_graph.add_node("review_and_improve", review_and_improve)
research_graph.set_entry_point("plan_queries")
research_graph.add_edge("plan_queries", "execute_searches")
research_graph.add_edge("execute_searches", "write_draft")
research_graph.add_conditional_edges(
"write_draft",
should_improve,
{
"improve": "execute_searches",
"finalize": "review_and_improve"
}
)
research_graph.add_edge("review_and_improve", END)
research_app = research_graph.compile()
result = research_app.invoke({
"topic": "AI Agent Technology Trends in 2026",
"search_queries": [],
"search_results": [],
"draft": "",
"final_report": "",
"iteration": 0
})
print(result["final_report"])
6. LlamaIndex
LlamaIndex is a data-centric AI agent framework.
6.1 LlamaIndex Agents
from llama_index.core.agent import ReActAgent
from llama_index.core.tools import FunctionTool, QueryEngineTool
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.llms.openai import OpenAI
from llama_index.core.settings import Settings
Settings.llm = OpenAI(model="gpt-4o", temperature=0)
def multiply(a: float, b: float) -> float:
"""Multiplies two numbers."""
return a * b
def add(a: float, b: float) -> float:
"""Adds two numbers."""
return a + b
multiply_tool = FunctionTool.from_defaults(fn=multiply)
add_tool = FunctionTool.from_defaults(fn=add)
documents = SimpleDirectoryReader("./data").load_data()
index = VectorStoreIndex.from_documents(documents)
query_engine = index.as_query_engine(similarity_top_k=3)
query_tool = QueryEngineTool.from_defaults(
query_engine=query_engine,
name="knowledge_base",
description="Search internal company documents for information."
)
agent = ReActAgent.from_tools(
[multiply_tool, add_tool, query_tool],
llm=Settings.llm,
verbose=True,
max_iterations=10
)
response = agent.chat(
"Find the AI policy in internal documents, and add the penalty amounts of $500 and $1000"
)
print(response)
6.2 Multi-Document RAG Agent
from llama_index.core.agent import ReActAgent
from llama_index.core.tools import QueryEngineTool
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.core.node_parser import SentenceSplitter
docs_finance = SimpleDirectoryReader("./finance_docs").load_data()
docs_hr = SimpleDirectoryReader("./hr_docs").load_data()
docs_technical = SimpleDirectoryReader("./technical_docs").load_data()
splitter = SentenceSplitter(chunk_size=512)
finance_index = VectorStoreIndex.from_documents(docs_finance, transformations=[splitter])
hr_index = VectorStoreIndex.from_documents(docs_hr, transformations=[splitter])
tech_index = VectorStoreIndex.from_documents(docs_technical, transformations=[splitter])
tools = [
QueryEngineTool.from_defaults(
query_engine=finance_index.as_query_engine(),
name="finance_qa",
description="Answer questions about finance, accounting, and budgets"
),
QueryEngineTool.from_defaults(
query_engine=hr_index.as_query_engine(),
name="hr_qa",
description="Answer questions about HR, hiring, and benefits"
),
QueryEngineTool.from_defaults(
query_engine=tech_index.as_query_engine(),
name="tech_qa",
description="Answer questions about technical specifications and development guides"
),
]
agent = ReActAgent.from_tools(tools, verbose=True)
response = agent.chat("Tell me about the 2026 IT budget and new hiring plans")
print(response)
7. CrewAI
CrewAI is a role-based multi-agent collaboration framework.
7.1 CrewAI Core Concepts
Crew
├── Agents - each with role and goal
│ ├── Role: "Senior Researcher", "Content Writer"
│ ├── Goal: what the agent aims to achieve
│ ├── Backstory: personality/expertise
│ └── Tools: available tools
└── Tasks
├── Description: what needs to be done
├── Expected Output: expected result
└── Agent: assigned agent
7.2 Research Team Agent
from crewai import Agent, Task, Crew, Process
from crewai_tools import SerperDevTool, WebsiteSearchTool
from langchain_openai import ChatOpenAI
llm = ChatOpenAI(model="gpt-4o", temperature=0.7)
search_tool = SerperDevTool()
web_tool = WebsiteSearchTool()
researcher = Agent(
role="Senior Researcher",
goal="Collect comprehensive and accurate information on the given topic",
backstory="""You are a professional researcher with 10 years of experience.
You are an expert at systematically investigating complex topics
and extracting key insights from reliable, up-to-date sources.""",
tools=[search_tool, web_tool],
llm=llm,
verbose=True
)
analyst = Agent(
role="Data Analyst",
goal="Analyze collected information and identify patterns and trends",
backstory="""You are a data analysis expert. You discover meaningful patterns
in raw data and derive actionable insights by combining statistical methods
with business knowledge.""",
tools=[search_tool],
llm=llm,
verbose=True
)
writer = Agent(
role="Content Writer",
goal="Write clear, compelling reports from analysis results",
backstory="""You are a professional writer who can explain technical content
to general audiences. You love conveying complex analytical results through
storytelling.""",
llm=llm,
verbose=True
)
research_task = Task(
description="""Research '{topic}' covering:
1. Latest trends and developments
2. Key players and their approaches
3. Potential opportunities and risks
4. Relevant statistics and data
Cite at least 5 reliable sources.""",
expected_output="Research summary (minimum 500 words)",
agent=researcher
)
analysis_task = Task(
description="""Analyze the information gathered by the researcher:
1. Identify 3 key trends
2. Perform SWOT analysis
3. Short-term forecast (6-12 months)
4. Key risk factors
Provide objective, data-driven analysis.""",
expected_output="Analysis report (minimum 400 words)",
agent=analyst,
context=[research_task]
)
report_task = Task(
description="""Synthesize research and analysis into a professional report:
Report structure:
1. Executive Summary
2. Current State Analysis
3. Key Insights
4. Recommendations
5. Conclusion
Write professionally and persuasively.""",
expected_output="Completed report (minimum 800 words)",
agent=writer,
context=[research_task, analysis_task]
)
research_crew = Crew(
agents=[researcher, analyst, writer],
tasks=[research_task, analysis_task, report_task],
process=Process.sequential,
verbose=True
)
result = research_crew.kickoff(inputs={"topic": "AI Agent Market Analysis 2026"})
print(result)
7.3 Software Development Agent Team
from crewai import Agent, Task, Crew, Process
from crewai_tools import CodeInterpreterTool
code_interpreter = CodeInterpreterTool()
product_manager = Agent(
role="Product Manager",
goal="Clearly define requirements and create development plans",
backstory="A PM with 10 years of experience connecting technical requirements to business goals.",
llm=llm,
verbose=True
)
senior_developer = Agent(
role="Senior Developer",
goal="Write high-quality, scalable code",
backstory="A full-stack developer with expertise in Python, FastAPI, and React.",
tools=[code_interpreter],
llm=llm,
verbose=True
)
qa_engineer = Agent(
role="QA Engineer",
goal="Thoroughly test code and ensure quality",
backstory="A software testing expert who loves finding bugs.",
tools=[code_interpreter],
llm=llm,
verbose=True
)
requirements_task = Task(
description="""Define technical requirements for '{feature_request}':
1. User stories
2. Functional requirements list
3. Non-functional requirements (performance, security)
4. API design (endpoint list)""",
expected_output="Requirements document",
agent=product_manager
)
development_task = Task(
description="""Write Python FastAPI code based on the requirements document:
1. Complete API implementation
2. Data models (Pydantic)
3. Error handling
4. Code comments""",
expected_output="Complete Python code",
agent=senior_developer,
context=[requirements_task]
)
testing_task = Task(
description="""Review and test the written code:
1. Code review (bugs, security vulnerabilities)
2. Write unit tests
3. Edge case testing
4. Suggest improvements""",
expected_output="Test report and improved code",
agent=qa_engineer,
context=[development_task]
)
dev_crew = Crew(
agents=[product_manager, senior_developer, qa_engineer],
tasks=[requirements_task, development_task, testing_task],
process=Process.sequential,
verbose=True
)
result = dev_crew.kickoff(
inputs={"feature_request": "User authentication API (JWT-based)"}
)
print(result)
7.4 Hierarchical CrewAI
# Hierarchical structure where a manager delegates work
manager = Agent(
role="Project Manager",
goal="Coordinate the team and produce the best results",
backstory="An experienced PM who maximizes the strengths of each team member.",
llm=llm,
verbose=True,
allow_delegation=True # Can delegate to other agents
)
hierarchical_crew = Crew(
agents=[manager, researcher, analyst, writer],
tasks=[report_task], # Only define the final task (rest auto-distributed)
process=Process.hierarchical,
manager_agent=manager,
verbose=True
)
8. Agent Memory
8.1 Memory Architecture
from langchain.memory import (
ConversationBufferMemory,
ConversationSummaryBufferMemory,
ConversationEntityMemory,
)
from langchain_community.vectorstores import FAISS
from langchain_openai import OpenAIEmbeddings, ChatOpenAI
import datetime
llm = ChatOpenAI(model="gpt-4o-mini")
embeddings = OpenAIEmbeddings()
# 1. Short-term memory - recent N messages
from langchain.memory import ConversationBufferWindowMemory
short_term = ConversationBufferWindowMemory(k=5, return_messages=True)
# 2. Summary memory - summarizes old conversations
summary_memory = ConversationSummaryBufferMemory(
llm=llm,
max_token_limit=1000,
return_messages=True
)
# 3. Entity memory - extract key facts
entity_memory = ConversationEntityMemory(llm=llm, return_messages=True)
# 4. Long-term memory - vector DB
class LongTermMemory:
def __init__(self):
self.vectorstore = FAISS.from_texts(["init"], embeddings)
self.retriever = self.vectorstore.as_retriever(search_kwargs={"k": 5})
def save(self, text: str, metadata: dict = None):
self.vectorstore.add_texts([text], metadatas=[metadata or {}])
def recall(self, query: str) -> list:
docs = self.retriever.get_relevant_documents(query)
return [doc.page_content for doc in docs]
# 5. Episodic memory - past agent experiences
class EpisodicMemory:
def __init__(self):
self.episodes = []
def save_episode(self, task: str, actions: list, result: str, success: bool):
episode = {
"task": task,
"actions": actions,
"result": result,
"success": success,
"timestamp": datetime.datetime.now().isoformat()
}
self.episodes.append(episode)
def find_similar_episodes(self, current_task: str) -> list:
return [e for e in self.episodes if e["success"]]
# Integrated memory system
class AgentMemorySystem:
def __init__(self):
self.short_term = ConversationBufferWindowMemory(k=10)
self.long_term = LongTermMemory()
self.episodic = EpisodicMemory()
self.entities = {}
def save_message(self, role: str, content: str):
self.short_term.save_context(
{"input": content if role == "human" else ""},
{"output": content if role == "ai" else ""}
)
def get_relevant_context(self, query: str) -> str:
recent = self.short_term.load_memory_variables({})
long_term = self.long_term.recall(query)
past_episodes = self.episodic.find_similar_episodes(query)
context = f"""Recent conversation: {recent.get('history', '')}
Relevant memories: {'; '.join(long_term[:3])}
Similar past experiences: {past_episodes[:2] if past_episodes else 'None'}"""
return context
memory_system = AgentMemorySystem()
9. Code Execution Agents
9.1 Python REPL Agent
from langchain_experimental.tools import PythonREPLTool
from langchain.agents import create_openai_tools_agent, AgentExecutor
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
python_repl = PythonREPLTool()
llm = ChatOpenAI(model="gpt-4o", temperature=0)
data_analysis_prompt = ChatPromptTemplate.from_messages([
("system", """You are a professional data analyst.
You are proficient in Python, pandas, matplotlib, and seaborn.
When given a data analysis request, write and execute code to derive results.
Always explain the analysis results alongside your code."""),
MessagesPlaceholder(variable_name="chat_history"),
("human", "{input}"),
MessagesPlaceholder(variable_name="agent_scratchpad"),
])
agent = create_openai_tools_agent(llm, [python_repl], data_analysis_prompt)
data_agent = AgentExecutor(agent=agent, tools=[python_repl], verbose=True)
result = data_agent.invoke({
"input": """Analyze the following data:
sales = [100, 150, 120, 200, 180, 250, 220, 300, 280, 350, 320, 400]
months = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun',
'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']
Analyze the monthly sales trend and calculate the growth rate.""",
"chat_history": []
})
print(result["output"])
9.2 Docker Sandbox Code Execution
import docker
import tempfile
import os
class DockerCodeExecutor:
"""Safely execute code inside a Docker container"""
def __init__(self, image="python:3.11-slim", timeout=30):
self.client = docker.from_env()
self.image = image
self.timeout = timeout
def execute(self, code: str, packages: list = None) -> dict:
"""
Execute Python code inside a Docker container
Returns: {success: bool, output: str, error: str}
"""
with tempfile.TemporaryDirectory() as tmpdir:
code_file = os.path.join(tmpdir, "script.py")
with open(code_file, "w") as f:
f.write(code)
install_cmd = ""
if packages:
pkgs = " ".join(packages)
install_cmd = f"pip install {pkgs} -q && "
try:
container = self.client.containers.run(
self.image,
command=f'sh -c "{install_cmd}python /code/script.py"',
volumes={tmpdir: {"bind": "/code", "mode": "ro"}},
remove=True,
network_mode="none", # Block network access
mem_limit="256m", # Memory limit
cpu_period=100000,
cpu_quota=50000, # 50% CPU limit
timeout=self.timeout,
stdout=True,
stderr=True
)
return {
"success": True,
"output": container.decode("utf-8"),
"error": ""
}
except docker.errors.ContainerError as e:
return {
"success": False,
"output": "",
"error": e.stderr.decode("utf-8") if e.stderr else str(e)
}
except Exception as e:
return {"success": False, "output": "", "error": str(e)}
executor = DockerCodeExecutor()
code = """
import pandas as pd
import json
data = {'name': ['Alice', 'Bob', 'Charlie'], 'score': [85, 92, 78]}
df = pd.DataFrame(data)
result = df.describe().to_dict()
print(json.dumps(result, indent=2))
"""
result = executor.execute(code, packages=["pandas"])
print(result["output"])
10. Agent Evaluation and Monitoring
10.1 LangSmith Tracing
import os
from langsmith import Client
# LangSmith setup
os.environ["LANGCHAIN_TRACING_V2"] = "true"
os.environ["LANGCHAIN_API_KEY"] = "your-langsmith-api-key"
os.environ["LANGCHAIN_PROJECT"] = "ai-agent-evaluation"
from langsmith.run_helpers import traceable
@traceable(run_type="chain")
def run_agent_with_tracking(user_input: str):
"""Agent execution tracked by LangSmith"""
result = agent_executor.invoke({"input": user_input})
return result
# Query execution data via LangSmith client
client = Client()
runs = client.list_runs(
project_name="ai-agent-evaluation",
run_type="chain"
)
for run in list(runs)[:5]:
print(f"Run ID: {run.id}")
print(f"Status: {run.status}")
print(f"Execution time: {run.end_time - run.start_time if run.end_time else 'N/A'}")
print(f"Token usage: {run.total_tokens}")
print("---")
10.2 Agent Performance Metrics
import time
import datetime
from dataclasses import dataclass, field
from typing import List, Optional
@dataclass
class AgentMetrics:
task: str
success: bool
total_time: float
num_iterations: int
tools_used: List[str]
tokens_used: int
error_message: Optional[str] = None
final_answer: Optional[str] = None
class AgentEvaluator:
"""Agent performance evaluation system"""
def __init__(self, agent_executor):
self.agent = agent_executor
self.metrics_history: List[AgentMetrics] = []
def evaluate(self, task: str, expected_keywords: list = None) -> AgentMetrics:
start_time = time.time()
try:
result = self.agent.invoke({"input": task})
total_time = time.time() - start_time
answer = result.get("output", "")
success = True
if expected_keywords:
success = any(kw.lower() in answer.lower() for kw in expected_keywords)
metrics = AgentMetrics(
task=task,
success=success,
total_time=total_time,
num_iterations=0,
tools_used=[],
tokens_used=0,
final_answer=answer
)
except Exception as e:
metrics = AgentMetrics(
task=task,
success=False,
total_time=time.time() - start_time,
num_iterations=0,
tools_used=[],
tokens_used=0,
error_message=str(e)
)
self.metrics_history.append(metrics)
return metrics
def batch_evaluate(self, test_cases: list) -> dict:
results = []
for case in test_cases:
task = case["task"]
keywords = case.get("expected_keywords", [])
metrics = self.evaluate(task, keywords)
results.append(metrics)
successes = [r for r in results if r.success]
success_rate = len(successes) / len(results) if results else 0
avg_time = sum(r.total_time for r in results) / len(results)
return {
"total_tasks": len(results),
"success_rate": success_rate,
"avg_response_time": avg_time,
"failed_tasks": [r.task for r in results if not r.success],
"detailed_results": results
}
def generate_report(self) -> str:
if not self.metrics_history:
return "No evaluation data"
total = len(self.metrics_history)
successes = sum(1 for m in self.metrics_history if m.success)
avg_time = sum(m.total_time for m in self.metrics_history) / total
report = f"""
=== Agent Performance Report ===
Total tasks: {total}
Success rate: {successes/total*100:.1f}%
Average response time: {avg_time:.2f}s
Failed tasks:
"""
for m in self.metrics_history:
if not m.success:
report += f" - {m.task}: {m.error_message or 'Quality check failed'}\n"
return report
test_cases = [
{
"task": "What time is it right now?",
"expected_keywords": ["2026", ":", "AM", "PM"]
},
{
"task": "What is the square root of 100?",
"expected_keywords": ["10"]
},
{
"task": "Explain the main components of an AI agent",
"expected_keywords": ["LLM", "tool", "memory"]
}
]
evaluator = AgentEvaluator(agent_executor)
report = evaluator.batch_evaluate(test_cases)
print(f"Success rate: {report['success_rate']*100:.1f}%")
print(f"Average response time: {report['avg_response_time']:.2f}s")
Conclusion
AI agents are evolving beyond simple chatbots into genuinely autonomous AI systems.
Framework selection guide:
| Use Case | Recommended Framework |
|---|---|
| Rapid prototyping | LangChain |
| Complex workflows | LangGraph |
| Document Q&A agents | LlamaIndex |
| Multi-agent collaboration | CrewAI |
| Custom framework | OpenAI Function Calling |
Agent Development Best Practices:
- Start small - begin with a simple ReAct agent
- Clear tool descriptions - make each tool's description explicit
- Design memory upfront - plan what information needs to be remembered
- Error handling - tool failures and loop prevention are essential
- Monitor everything - trace all executions with LangSmith
- Manage costs - monitor token usage closely