Multi-Agent 시스템 완전 비교: AutoGen, CrewAI, LangGraph 중 무엇을 선택하는가

언제 Multi-Agent가 필요한가?
Framework 1: Microsoft AutoGen
Framework 2: CrewAI
- 핵심 개념
- CrewAI의 장단점
Framework 3: LangGraph
프레임워크 비교 표
선택 가이드
프로덕션에서의 공통 함정
마치며

"Multi-Agent 시스템을 써야 하나요?"라는 질문을 자주 받습니다. 솔직한 답변은: 대부분의 경우 필요 없습니다. 단일 에이전트로 충분합니다. 하지만 특정 상황에서는 멀티 에이전트가 진짜 필요해지고, 그때 프레임워크 선택이 중요해집니다.

언제 Multi-Agent가 필요한가?

단일 에이전트로 부족한 3가지 상황을 명확히 봅시다.

상황 1: 작업이 너무 복잡해서 한 LLM의 context에 안 들어갈 때

1만 줄의 코드를 전체적으로 리팩토링해야 한다면? 단일 에이전트의 context 창에 전체 코드가 들어가지 않습니다. 여러 에이전트가 각 모듈을 나눠서 처리하면 해결됩니다.

상황 2: 서로 다른 전문성이 필요할 때

"뉴스 기사를 수집하고, 분석하고, 보고서로 작성하라"는 작업을 생각해보세요. 검색에 최적화된 에이전트, 분석에 집중하는 에이전트, 글쓰기에 특화된 에이전트 — 각각 다른 시스템 프롬프트와 도구를 가집니다.

상황 3: 병렬 처리로 속도를 높이고 싶을 때

10개 국가의 시장 조사를 동시에 진행한다면, 10개의 에이전트가 병렬로 각 국가를 조사하면 훨씬 빠릅니다.

이 세 가지 상황이 아니라면 단일 에이전트로 충분합니다. 불필요한 복잡도를 추가하지 마세요.

Framework 1: Microsoft AutoGen

AutoGen은 마이크로소프트가 만든 멀티에이전트 프레임워크입니다. 에이전트 간 대화(conversation)를 핵심 추상화로 삼습니다.

핵심 개념

AutoGen의 철학은 단순합니다: 에이전트들이 채팅하는 것처럼 문제를 해결합니다. 사람이 팀으로 일하듯, 에이전트들도 서로 메시지를 주고받으며 협력합니다.

import autogen

# LLM 설정
llm_config = {
    "model": "gpt-4",
    "api_key": "your-api-key"
}

# 에이전트 정의
coder = autogen.AssistantAgent(
    name="Coder",
    llm_config=llm_config,
    system_message=(
        "You are a Python expert. Write clean, well-tested code. "
        "Always include error handling and type hints. "
        "When you finish, say 'TERMINATE'."
    )
)

reviewer = autogen.AssistantAgent(
    name="Reviewer",
    llm_config=llm_config,
    system_message=(
        "You are a senior software engineer. Review code for: "
        "1. Bugs and edge cases "
        "2. Security vulnerabilities "
        "3. Performance issues "
        "4. Code style and maintainability "
        "Provide specific, actionable feedback."
    )
)

# UserProxyAgent: 실제 코드 실행 담당
user_proxy = autogen.UserProxyAgent(
    name="User",
    human_input_mode="NEVER",       # 완전 자동화
    max_consecutive_auto_reply=10,
    code_execution_config={
        "work_dir": "coding",
        "use_docker": False  # 프로덕션에서는 True 권장
    },
    is_termination_msg=lambda x: "TERMINATE" in x.get("content", "")
)

# 대화 시작
user_proxy.initiate_chat(
    coder,
    message="날씨 데이터를 가져와서 matplotlib으로 시각화하는 Python 스크립트를 작성해줘"
)

GroupChat으로 여러 에이전트 조율

# 여러 에이전트가 하나의 그룹 대화에 참여
groupchat = autogen.GroupChat(
    agents=[user_proxy, coder, reviewer],
    messages=[],
    max_round=20,
    speaker_selection_method="auto"  # LLM이 다음 발언자 결정
)

manager = autogen.GroupChatManager(
    groupchat=groupchat,
    llm_config=llm_config
)

user_proxy.initiate_chat(
    manager,
    message="REST API 클라이언트 라이브러리를 설계하고 구현해줘"
)

AutoGen의 장단점

장점:

설정이 간단 — 빠르게 프로토타입 가능
코드 실행이 내장되어 있음 (UserProxyAgent)
직관적인 대화 모델

단점:

무한 루프에 빠지기 쉬움 (TERMINATE 조건을 잘 설계해야 함)
대화 상태 관리가 단순 — 복잡한 워크플로우에 한계
에이전트 간 흐름 제어가 불명확해질 수 있음

Framework 2: CrewAI

CrewAI는 "AI 팀"을 만드는 것에 집중한 프레임워크입니다. 역할(Role)과 목표(Goal)를 가진 에이전트들이 명확히 정의된 작업(Task)을 수행합니다.

핵심 개념

CrewAI의 철학: 회사처럼 역할과 책임을 명확히 분리합니다.

from crewai import Agent, Task, Crew, Process
from crewai_tools import SerperDevTool, WebsiteSearchTool

# 검색 도구 설정
search_tool = SerperDevTool()
web_tool = WebsiteSearchTool()

# 에이전트 정의: 역할과 목표가 핵심
researcher = Agent(
    role="Research Analyst",
    goal="Find accurate, comprehensive, and up-to-date information on any topic",
    backstory=(
        "You are an expert researcher with 10 years of experience. "
        "You always verify information from multiple sources and cite your findings."
    ),
    tools=[search_tool, web_tool],
    llm="gpt-4",
    verbose=True
)

analyst = Agent(
    role="Data Analyst",
    goal="Analyze information and identify key trends and insights",
    backstory=(
        "You are a data analyst who excels at finding patterns and drawing "
        "actionable conclusions from complex information."
    ),
    llm="gpt-4",
    verbose=True
)

writer = Agent(
    role="Technical Writer",
    goal="Write clear, engaging, and well-structured content",
    backstory=(
        "You are a technical writer who makes complex topics accessible. "
        "You write for engineers who value precision and clarity."
    ),
    llm="gpt-4",
    verbose=True
)

# 작업 정의: 의존성이 중요
research_task = Task(
    description=(
        "AI 에이전트 분야의 최신 트렌드 5가지를 조사하세요. "
        "각 트렌드에 대해 구체적인 예시와 출처를 포함하세요."
    ),
    agent=researcher,
    expected_output="5가지 트렌드, 각각 설명 2-3문장 + 출처"
)

analysis_task = Task(
    description=(
        "조사된 트렌드를 분석하여 가장 영향력 있는 트렌드를 "
        "엔지니어 관점에서 순위를 매기고, 각각의 실용적 의미를 설명하세요."
    ),
    agent=analyst,
    expected_output="순위가 매겨진 트렌드 분석, 각각의 실용적 의미 포함",
    context=[research_task]  # research_task 결과를 context로 사용
)

writing_task = Task(
    description=(
        "분석 결과를 바탕으로 1500자 분량의 기술 블로그 포스트를 작성하세요. "
        "실제 엔지니어가 당장 활용할 수 있는 실용적인 내용으로 구성하세요."
    ),
    agent=writer,
    expected_output="마크다운 형식의 블로그 포스트, 제목/소제목/코드 예시 포함",
    context=[research_task, analysis_task]
)

# Crew 실행
crew = Crew(
    agents=[researcher, analyst, writer],
    tasks=[research_task, analysis_task, writing_task],
    process=Process.sequential,  # 순차 실행 (또는 Process.hierarchical)
    verbose=True
)

result = crew.kickoff()
print(result)

CrewAI의 장단점

장점:

역할 기반 설계가 직관적 — 비개발자도 이해하기 쉬움
Task 의존성 관리가 명확
빠른 프로토타이핑에 좋음
커뮤니티가 빠르게 성장 중

단점:

복잡한 조건부 흐름에 제한적
상태 관리가 단순 — 장기 실행 에이전트에 한계
LangGraph에 비해 유연성이 낮음

Framework 3: LangGraph

LangGraph는 LangChain 팀이 만든 프레임워크로, 에이전트 워크플로우를 그래프로 표현합니다. 가장 유연하지만 학습 곡선이 있습니다.

핵심 개념

LangGraph의 철학: 에이전트 시스템을 방향성 그래프(DAG)로 모델링합니다. 노드는 함수, 엣지는 흐름 제어입니다.

from langgraph.graph import StateGraph, END
from typing import TypedDict, List, Annotated
import operator

# 상태 정의: 그래프 전체에서 공유되는 상태
class ResearchState(TypedDict):
    messages: Annotated[List[str], operator.add]  # 메시지 누적
    research_done: bool
    analysis_done: bool
    draft: str
    final_report: str

# 그래프 생성
workflow = StateGraph(ResearchState)

# 노드 함수 정의
def research_node(state: ResearchState) -> dict:
    """인터넷에서 정보 수집"""
    results = search_web(state["messages"][-1])
    return {
        "messages": [f"Research results: {results}"],
        "research_done": True
    }

def analysis_node(state: ResearchState) -> dict:
    """수집된 정보 분석"""
    research = [m for m in state["messages"] if "Research results:" in m]
    analysis = analyze_data(research)
    return {
        "messages": [f"Analysis: {analysis}"],
        "analysis_done": True
    }

def writing_node(state: ResearchState) -> dict:
    """최종 보고서 작성"""
    all_context = "\n".join(state["messages"])
    draft = write_report(all_context)
    return {"draft": draft}

def review_node(state: ResearchState) -> dict:
    """초안 검토 및 개선"""
    reviewed = review_and_improve(state["draft"])
    return {"final_report": reviewed}

# 조건부 라우팅 함수
def route_after_research(state: ResearchState) -> str:
    """조사 결과에 따라 다음 노드 결정"""
    if len(state["messages"]) > 3:
        return "analysis"
    else:
        return "research"  # 더 많은 조사가 필요

# 노드 추가
workflow.add_node("research", research_node)
workflow.add_node("analysis", analysis_node)
workflow.add_node("writing", writing_node)
workflow.add_node("review", review_node)

# 엣지 추가 (흐름 정의)
workflow.set_entry_point("research")
workflow.add_conditional_edges(
    "research",
    route_after_research,
    {
        "analysis": "analysis",
        "research": "research"  # 자기 자신으로 돌아감 (루프)
    }
)
workflow.add_edge("analysis", "writing")
workflow.add_edge("writing", "review")
workflow.add_edge("review", END)

# 컴파일 및 실행
app = workflow.compile()

initial_state = {
    "messages": ["AI 에이전트 최신 트렌드 조사"],
    "research_done": False,
    "analysis_done": False,
    "draft": "",
    "final_report": ""
}

result = app.invoke(initial_state)
print(result["final_report"])

LangGraph의 스트리밍과 체크포인트

LangGraph의 차별화된 기능입니다:

from langgraph.checkpoint.sqlite import SqliteSaver

# 체크포인트 설정 — 중간 상태 저장/복원
memory = SqliteSaver.from_conn_string(":memory:")
app = workflow.compile(checkpointer=memory)

# 스레드 ID로 실행 — 같은 대화 이어서 진행 가능
config = {"configurable": {"thread_id": "session-123"}}

# 첫 번째 실행
result = app.invoke(initial_state, config=config)

# 나중에 같은 스레드로 이어서 실행
follow_up = app.invoke(
    {"messages": ["더 자세한 분석을 추가해줘"]},
    config=config
)

Mermaid 다이어그램으로 그래프 시각화

# LangGraph는 그래프를 Mermaid 형식으로 출력 가능
print(app.get_graph().draw_mermaid())

# 출력 예시:
# graph TD
#     __start__ --> research
#     research -->|더 많은 조사 필요| research
#     research -->|충분| analysis
#     analysis --> writing
#     writing --> review
#     review --> __end__

LangGraph의 장단점

장점:

가장 유연한 흐름 제어 — 조건부, 루프, 병렬 모두 가능
강력한 상태 관리 — 체크포인트, 히스토리
프로덕션 준비도 높음
Human-in-the-loop 지원

단점:

학습 곡선이 가파름 — 그래프 개념 이해 필요
간단한 작업에는 과도한 설계
코드가 더 장황해짐

프레임워크 비교 표

특성	AutoGen	CrewAI	LangGraph
학습 곡선	낮음	낮음	높음
유연성	중간	중간	높음
상태 관리	기본	기본	강력
프로덕션 준비도	중간	중간	높음
코드 실행	내장	별도 설정	별도 설정
커뮤니티 규모	대형	빠른 성장	성장 중
적합한 사용 사례	코드 생성, 디버깅	역할 기반 팀 작업	복잡한 워크플로우

선택 가이드

AutoGen을 선택하는 경우:

빠른 프로토타입이 필요할 때
코드 생성/실행이 중심인 워크플로우
팀이 LLM에 익숙하지 않을 때

CrewAI를 선택하는 경우:

"역할 분담"이 자연스럽게 떠오르는 작업
순차적 파이프라인 (조사 → 분석 → 작성)
빠른 MVP가 필요하고 복잡성이 중간 수준

LangGraph를 선택하는 경우:

복잡한 조건부 로직이 필요할 때
장기 실행 에이전트 (체크포인트 중요)
프로덕션 배포 (신뢰성, 모니터링 중요)
Human-in-the-loop이 필요할 때

프로덕션에서의 공통 함정

프레임워크와 무관하게 멀티에이전트 시스템을 프로덕션에 배포할 때 겪는 문제들입니다.

1. 비용 폭발

여러 에이전트가 각각 LLM을 호출합니다. 예상보다 훨씬 많은 비용이 발생합니다.

# 반드시 비용 추적 추가
import tiktoken

def estimate_cost(messages, model="gpt-4"):
    enc = tiktoken.encoding_for_model(model)
    total_tokens = sum(len(enc.encode(m["content"])) for m in messages)
    cost_per_1k = 0.03  # GPT-4 기준 (입력)
    return (total_tokens / 1000) * cost_per_1k

2. 에이전트 간 context 전달 실패

한 에이전트의 출력이 다음 에이전트에게 올바르게 전달되지 않으면 전체 파이프라인이 깨집니다. 항상 명시적인 context 전달을 확인하세요.

3. 타임아웃 없는 에이전트

에이전트가 무한히 실행될 수 있습니다. 반드시 전체 파이프라인 타임아웃을 설정하세요.

import asyncio

async def run_with_timeout(crew, timeout=300):
    try:
        return await asyncio.wait_for(
            asyncio.to_thread(crew.kickoff),
            timeout=timeout
        )
    except asyncio.TimeoutError:
        raise RuntimeError(f"Crew execution timed out after {timeout}s")

마치며

결론부터 말하면: 처음에는 CrewAI를 써보세요. 직관적이고 빠르게 결과를 볼 수 있습니다. 프로덕션으로 가야 하거나 복잡한 워크플로우가 필요하다면 LangGraph로 전환하세요. AutoGen은 특히 코드 생성 중심 워크플로우에 강점이 있습니다.

어떤 프레임워크를 쓰든, 단일 에이전트로 먼저 시도해보세요. 멀티에이전트는 진짜 필요할 때만 추가하는 것이 올바른 접근입니다.

다음 글에서는 에이전트의 핵심 메커니즘인 Tool Calling 실전 가이드 — 흔한 함정과 해결법을 다룹니다.