Chaos and Order

💡 왼쪽 원문을 읽으면서 오른쪽에 따라 써보세요. Tab 키로 힌트를 받을 수 있습니다.

Prologue — 2026 and the Cambrian Explosion of Agent Frameworks

In spring 2023, when the LangChain ReAct notebook first went viral, "agent framework" was practically a synonym for LangChain. In 2024, AutoGen and CrewAI arrived and the multi-agent buzzword era began. In 2025, OpenAI promoted its experimental Swarm to a formal Agents SDK, and Anthropic released its internal Claude Code engine as the Claude Agent SDK. And in 2026, the market is in full Cambrian explosion mode.

Under the single label "tools for writing AI agents" you now find: official vendor SDKs, graph-based state machine frameworks, role-based crews, minimalist code-execution agents, full-stack TypeScript backend frameworks, structured-output-first libraries, and UI-integrated SDKs. This post draws the map.

An agent framework is not a library — it is an opinion. Choosing a framework means agreeing with its opinion of "this is how an agent should be written." So you are not picking a tool, you are picking an opinion.

This guide covers:

The 2026 map — who builds what, who uses what
What is an agent — Andrew Ng's four design patterns
ReAct, Plan-and-Execute, Tree-of-Thought
OpenAI Agents SDK (March 2025) — heir to Swarm
Anthropic Agent SDK / Claude Code SDK (Sep 2025)
LangGraph — state-machine graphs
AutoGen 0.4 — multi-agent conversation
CrewAI — role-based crews
smolagents, Mastra, Pydantic AI — the minimalist wave
MCP (Nov 2024) — the tool-integration standard
A2A — agent-to-agent protocol
Which framework to pick
Adoption notes from Korea and Japan
References

1. The 2026 Agent Framework Map

Big picture first. Group by who built the framework:

Model-vendor official SDKs

OpenAI Agents SDK (released March 2025). Promotes the experimental Swarm library to an official SDK. Handoffs, tracing, and guardrails are first-class.
Anthropic Agent SDK / Claude Code SDK (GA September 2025). Exposes the agent loop battle-tested inside Claude Code. Subagents and hooks are core.
Vercel AI SDK (technically a model aggregator). React/Next.js UI components are the strength.

Orchestration frameworks

LangGraph (LangChain's heir). State machines and graph workflows. Explicit nodes and edges describe the flow.
AutoGen 0.4 (Microsoft Research). Multi-agent conversations. Asynchronous actor model after a major rewrite.
CrewAI. "Role-based crews" — fastest prototyping. You compose agents like a manager, a researcher, a writer.
LlamaIndex Agents. RAG-centric. Indexing and retrieval are first-class.
Bee Agent Framework (IBM). Enterprise agents, multi-model and multi-vendor.

Minimalist and code-first

smolagents (Hugging Face, 2024). Code execution as a first-class agent tool. "Code is the new function call."
Mastra (TypeScript). Full-stack backend. Workflows, agents, memory, and RAG in one box.
Pydantic AI (Pydantic team). Structured outputs first. Type safety is the core value.

Standards camp — protocols, not frameworks

MCP (Model Context Protocol) — Anthropic's tool integration protocol (November 2024).
A2A (Agent-to-Agent) — emerging in 2025 as a candidate protocol for cross-vendor agent collaboration.

The map's takeaway is simple: there is no single right framework. It depends on the problem, the team, and the model.

2. What Is an Agent — Andrew Ng's Four Patterns

Andrew Ng's spring 2024 talk gave us the cleanest taxonomy of agent design patterns. Two years later it still holds up.

Pattern 1: Reflection. The model critiques and revises its own output. A "draft -> critique -> revise" loop. You get a big quality lift without changing models.

Pattern 2: Tool Use. The model calls external tools — search, calculator, code execution, APIs. This is the step that turns an LLM into an "agent."

Pattern 3: Planning. The model plans a multi-step task before executing. Plan-and-Execute. More efficient than ad-hoc ReAct for complex tasks.

Pattern 4: Multi-Agent. Multiple agents specialize and collaborate. Manager-worker, debate, writer-editor. Distributes the context-overflow problem that single agents suffer from.

A framework's identity is largely defined by which of these it treats as first-class:

Framework	Reflection	Tool Use	Planning	Multi-Agent
OpenAI Agents SDK	DIY	First-class	Support	First-class (handoff)
Anthropic Agent SDK	DIY	First-class	Support	First-class (subagent)
LangGraph	Express as nodes	First-class	First-class (graph)	First-class
AutoGen	First-class (critic)	First-class	First-class	First-class (conversation)
CrewAI	DIY	First-class	First-class (Process)	First-class (Crew)
smolagents	Support	First-class (CodeAgent)	Support	Support
Pydantic AI	Support	First-class	Support	Support
LlamaIndex Agents	Support	First-class	First-class	Support
Mastra	Support	First-class	First-class (workflow)	First-class
Vercel AI SDK	Support	First-class	Support	Support

3. ReAct, Plan-and-Execute, Tree-of-Thought — Comparing the Patterns

Beneath every framework is a runtime algorithm. Three are the most common.

3.1 ReAct (Reasoning + Acting)

The original ReAct paper is Yao et al. 2022. The model alternates Thought, Action, and Observation. In code it's a plain while loop.

while not done and step < max_steps:
    response = model.complete(messages)
    if response.is_final_answer:
        break
    tool_result = execute(response.tool_call)
    messages.append(response)
    messages.append(tool_result)
    step += 1

Pros: simple, uses raw model reasoning, adapts on failure.
Cons: context grows quickly, easy to lose direction on complex tasks.

LangGraph's prebuilt ReAct agent, the OpenAI Agents SDK default agent, and CrewAI's default task execution all sit on this pattern.

3.2 Plan-and-Execute

Plan first, execute second. Wang et al. 2023's "Plan-and-Solve." The planner calls the model once; the executor can use a smaller model or deterministic code.

Stage 1 (Planner):  goal -> step 1, step 2, ..., step N
Stage 2 (Executor): run steps sequentially/in parallel, replan if needed

Pros: token-efficient, parallelizable, auditable.
Cons: slower to adapt; bad plans force expensive replanning.

LangGraph has a canonical Plan-and-Execute example. CrewAI's hierarchical Process expresses this too.

3.3 Tree-of-Thought

Branch multiple reasoning paths into a tree and pick the most promising. Yao et al. 2023. Expensive but strong on hard reasoning and planning.

        goal
       /  |  \
   thoughtA  thoughtB  thoughtC
     |    |    |
   ... evaluate, pick best branch ...

Pros: powerful on hard problems, self-verification built in.
Cons: cost explodes, complex to implement.

Few frameworks support pure ToT first-class. LangGraph can model it; AutoGen's GroupChat can approximate it.

4. OpenAI Agents SDK (March 2025) — The Heir to Swarm

In March 2025, OpenAI graduated the experimental Swarm into the official Agents SDK. Python (official) and TypeScript (late 2025).

Core abstractions

Agent — system prompt + tools + model.
Handoff — one agent transfers control to another.
Guardrail — input/output validation (PII, policy).
Tracing — every step shows up in the OpenAI dashboard.

Example

from agents import Agent, Runner, handoff

triage = Agent(
    name="triage",
    instructions="Classify the customer's question and hand off.",
    handoffs=[refund_agent, support_agent],
)

result = Runner.run_sync(triage, "I want a refund")
print(result.final_output)

Handoff is similar to a LangGraph edge or AutoGen's next-speaker selection, but here an agent calls another agent like a function. The mental model is more imperative.

Strengths

Best integration with OpenAI models — Realtime API, Voice, Computer Use are first-class.
Tracing dashboard is excellent — every tool call and handoff is visualized.
Combined with structured outputs it's easy to write type-safe workflows.

Weaknesses

Vendor lock-in — non-OpenAI models go through a LiteLLM adapter.
Graph visualization and complex state machines are weaker than LangGraph.
"Conversation" as multi-agent metaphor isn't as strong as in AutoGen.

5. Anthropic Agent SDK / Claude Code SDK (September 2025)

Anthropic has run Claude Code, its own coding agent, since 2024. In September 2025 it exposed that internal engine as the Claude Agent SDK.

Core concepts

Agent loop — built on top of Anthropic's tool_use tokens.
Subagent — an agent spawns a child agent that runs in an isolated context window.
Hook — user code injectable around tool calls, model responses, and other lifecycle events.
Permission gate — high-risk tools require explicit user approval.
MCP integration — Anthropic's MCP standard is first-class.

Example

import { ClaudeAgent } from '@anthropic-ai/claude-agent-sdk'

const agent = new ClaudeAgent({
  model: 'claude-sonnet-4',
  systemPrompt: 'You are a careful research assistant.',
  tools: [webSearchTool, fileTool],
  hooks: {
    beforeToolUse: async (call) => {
      console.log('about to call', call.name)
    },
  },
})

const result = await agent.run('Explain MCP in 5 bullets.')

Strengths

Loop hardened by years of Claude Code production use — context management and context-rot mitigation are the most mature.
Subagents and permission gates are first-class — code execution and other high-risk tools are safer.
MCP is native — Anthropic invented it.

Weaknesses

Anthropic model lock-in (other models need adapters).
Smaller ecosystem than OpenAI's despite Python and TypeScript support.
Graph-shaped workflow modeling isn't as explicit as LangGraph.

6. LangGraph — State-Machine Graphs

LangGraph is the LangChain team's "agents as graphs" framework. Free-form LangChain chains became debugging nightmares, so LangGraph introduced explicit nodes, edges, and shared state.

Core model

State — a single state object shared across the graph.
Node — a function that consumes and produces state.
Edge — decides which node runs next.
Conditional edge — routes based on state.
Checkpoint — snapshot and resume from any point.

Example

from langgraph.graph import StateGraph, END

graph = StateGraph(State)
graph.add_node("planner", planner_node)
graph.add_node("executor", executor_node)
graph.add_node("verifier", verifier_node)

graph.set_entry_point("planner")
graph.add_edge("planner", "executor")
graph.add_conditional_edges(
    "verifier",
    lambda s: "retry" if s.needs_retry else "done",
    {"retry": "executor", "done": END},
)

app = graph.compile()
result = app.invoke({"goal": "deploy app"})

Strengths

Explicitness — the graph is in the code, debugging is easy.
Human-in-the-loop points can be made explicit nodes.
Checkpoints enable pause, resume, and time-travel.
LangSmith gives you strong tracing.

Weaknesses

Steepest learning curve — the graph mental model takes time.
Overkill for simple ReAct.
Drags in some heavy LangChain dependencies.

7. AutoGen 0.4 (Microsoft) — Multi-Agent Conversation

Microsoft Research's AutoGen has been the multi-agent flagship since 2023. Late 2024 to early 2025 saw an almost complete rewrite at 0.4, moving to an asynchronous actor model.

Core concepts

AssistantAgent — an LLM-backed agent.
UserProxyAgent — proxies the user (or deterministic code).
GroupChat — multiple agents in one chat room, a manager picks the next speaker.
AutoGen Studio — a GUI for designing multi-agent workflows.

Example (0.4 new API)

from autogen_agentchat.agents import AssistantAgent
from autogen_agentchat.teams import RoundRobinGroupChat
from autogen_ext.models.openai import OpenAIChatCompletionClient

model = OpenAIChatCompletionClient(model="gpt-4o")
coder = AssistantAgent("coder", model_client=model)
reviewer = AssistantAgent("reviewer", model_client=model)

team = RoundRobinGroupChat([coder, reviewer], termination_condition=...)
result = await team.run(task="Implement quicksort in Python")

Strengths

The most natural multi-agent conversation metaphor.
AutoGen Studio lets non-coders build workflows.
Microsoft Research backing means fast adoption of research patterns (debate, society of minds, etc).

Weaknesses

The 0.3 -> 0.4 migration is non-trivial.
Overkill for single-agent needs.
"Conversation" doesn't suit every workflow shape.

8. CrewAI — Role-Based Crews

CrewAI appeared in 2024 and quickly took the "fastest multi-agent prototyping" crown. The metaphor is intuitive: agents have roles, and crews execute tasks.

Core concepts

Agent — has a role, goal, backstory, and tools.
Task — who does what, in what output format.
Crew — a bundle of agents. Process picks the execution mode (sequential, hierarchical, etc).
Flow — deterministic workflow + agents (added late 2024).

Example

from crewai import Agent, Task, Crew, Process

researcher = Agent(
    role="Senior Researcher",
    goal="Find the latest trends in AI agents.",
    backstory="You are a tireless researcher with 10 years of experience.",
    tools=[search_tool],
)
writer = Agent(
    role="Tech Writer",
    goal="Turn research into a clear blog post.",
    backstory="You write for developers who want signal, not hype.",
)

research_task = Task(description="Research the top 5 trends.", agent=researcher)
write_task = Task(description="Write a 1000-word post.", agent=writer)

crew = Crew(agents=[researcher, writer], tasks=[research_task, write_task], process=Process.sequential)
result = crew.kickoff()

Strengths

Top-tier prototyping speed — ten minutes from zero to a working multi-agent.
The role/crew metaphor reads well to non-technical stakeholders.
Easy to learn within a free tier.

Weaknesses

Internal behavior is somewhat opaque, debugging is harder.
Complex branching is weaker than LangGraph.
Production reliability often means migrating onto Flows for more determinism.

9. smolagents, Mastra, Pydantic AI — The Minimalist Wave

For people tired of heavy frameworks, 2024 and 2025 brought a minimalist camp.

9.1 smolagents (Hugging Face, 2024)

A deliberately small agent library from Hugging Face. One-liner: "Code is the new function call."

The default agent type is CodeAgent — instead of JSON tool calls, the model writes Python and a sandbox executes it.
The codebase is tiny (a few thousand lines).
Thin abstraction so any model plugs in.

from smolagents import CodeAgent, HfApiModel

agent = CodeAgent(tools=[search_tool], model=HfApiModel())
agent.run("What is the GDP per capita of Korea in 2025?")

Pros: code execution is powerful — one code block can chain N tool calls, much easier to debug than a tool-call trace.
Cons: sandbox security is on you. Few abstractions for large workflows.

9.2 Mastra (TypeScript, gaining traction in 2025)

Mastra is a TypeScript full-stack backend framework. If Vercel AI SDK is strong on the frontend React side, Mastra puts backend workflows, memory, agents, and RAG in one box.

Deterministic workflows and LLM agents coexist as first-class citizens.
Built-in memory, vector store, and RAG pipeline.
Runs on Cloudflare Workers, Vercel, or self-hosted.

import { Agent } from '@mastra/core'

const agent = new Agent({
  name: 'support',
  instructions: 'You answer support questions.',
  model: { provider: 'OPEN_AI', name: 'gpt-4o' },
  tools: { lookupOrder, refund },
})

const result = await agent.generate('Where is my order?')

Pros: natural for TypeScript full-stack backends. Clean integration between workflows and agents.
Cons: young ecosystem. Many gaps relative to Python.

9.3 Pydantic AI (Pydantic team, late 2024)

From the Pydantic team. One-liner: "What FastAPI did for the web, this does for LLMs."

All I/O is typed via Pydantic models.
Agent functions are decorator-based and lightweight.
Model dependency injection feels natural.

from pydantic import BaseModel
from pydantic_ai import Agent

class Order(BaseModel):
    id: str
    status: str

agent = Agent('openai:gpt-4o', result_type=Order)
result = await agent.run('Look up order 12345.')
print(result.data.status)  # type-safe

Pros: type safety and structured outputs are top priority. Almost zero friction in FastAPI/Pydantic-shaped codebases.
Cons: multi-agent and graph workflows are DIY. Young library.

10. Model Context Protocol (MCP, November 2024) — A Tool Standard Arrives

In November 2024 Anthropic released the Model Context Protocol (MCP). One-liner: USB-C between agents and tools.

The problem

Every framework had its own tool definition format. LangChain Tools, OpenAI functions, Anthropic tool_use — all slightly different. Writing a new tool meant writing it N times for N frameworks. MCP exists to fix this.

Core concepts

MCP server — a process that exposes tools, resources, and prompts.
MCP client — an LLM client (agent) that connects to servers.
Transport — stdio, HTTP plus SSE, or WebSocket.
Schema — JSON Schema describes tool signatures.

Where it's used

Anthropic Claude Desktop and Claude Code are native MCP clients.
Since mid-2025, major tools — OpenAI Agents SDK, LangGraph, Cursor, Windsurf — have added MCP client support.
Result: one well-written MCP server is reusable across many agent runtimes.

Why it matters

MCP standardizes tools independently of the framework choice. You may swap frameworks, but a good MCP server lives across them. That's a real reduction in lock-in.

11. A2A (Agent-to-Agent) — The Cross-Agent Protocol

If MCP standardizes "agent and tool," A2A is the candidate standard for "agent and agent." Google led the initial 2025 release, and several vendors are participating.

Why it's needed

Cross-company agent collaboration is on the horizon. Example: our sales agent asks a partner company's pricing agent for a quote. The two agents run on different frameworks, models, and hosts.

You then need:

A way for an agent to advertise its capabilities (skills).
A way for an agent to discover another agent's capabilities.
A way to delegate work and receive results.
Auth, permissions, and audit trails.

Core ideas

Agent Card — each agent exposes a JSON card describing its capabilities.
A2A calls — HTTP plus JSON-RPC. Asynchronous tasks and streaming supported.
Orthogonal to MCP — MCP is tool abstraction, A2A is agent abstraction.

As of 2026 A2A is not yet a "settled" standard, but multi-agent systems crossing org boundaries will eventually need this. MCP went from launch to de facto standard in about a year; A2A could follow.

12. Which Framework Should You Pick?

Recommendations by scenario.

Scenario A — "Spin up a notebook and try one quickly"

-> OpenAI Agents SDK or CrewAI. Results within ten minutes.

Scenario B — "Production chatbot or support agent"

-> OpenAI Agents SDK (if OpenAI-heavy) or Anthropic Agent SDK (if Claude-heavy). Guardrails, tracing, permission gates are first-class.

Scenario C — "Complex workflow with branches and retries"

-> LangGraph. Explicit graphs and checkpoints shine.

Scenario D — "Multiple agents debating or collaborating"

-> AutoGen 0.4 or CrewAI. Strong multi-agent metaphors.

Scenario E — "RAG-centric — search our docs well"

-> LlamaIndex Agents. First-class indexing and retrieval. Or Mastra for a TS backend.

Scenario F — "TypeScript / Next.js full-stack"

-> Vercel AI SDK for the UI + Mastra for backend workflows.

Scenario G — "Type safety and structured outputs above all"

-> Pydantic AI (Python) or Vercel AI SDK (TS).

Scenario H — "Code execution is the main tool — data and science"

-> smolagents CodeAgent. Or OpenAI Code Interpreter.

Scenario I — "Enterprise — multi-vendor models, audit, governance"

-> Bee Agent Framework (IBM) or LangGraph with LangSmith.

Scenario J — "Future cross-vendor agent collaboration"

-> Any framework with MCP support. Watch A2A.

Decision checklist (10 items)

Which model is your primary? (Can you accept a vendor-locked SDK?)
Single agent or multi?
Is ReAct enough, or do you need Plan-and-Execute?
Do you need a graph and state machine?
Do you need human-in-the-loop?
Do you need checkpoints and time travel?
Are your tools worth standardizing on MCP?
Where do you watch traces? (LangSmith, OpenAI traces, custom)
Language preference? Python or TypeScript?
Six months from now, what's the exit to another framework?

13. Adoption Notes from Korea and Japan

A brief look at East Asia.

Korea

LG AI Research (EXAONE) — Building an internal agent platform on top of the EXAONE model line. Internal R&D and document search agents come first.
Kakao Enterprise / Kakao Brain — Successor models to KoGPT plus in-house agent tooling. Started with internal productivity bots and is expanding outward.
Naver Clova / HyperCLOVA X — Coding and call-center agents being trialed inside CLOVA Studio.
Startups — Companies like Upstage and Squeeze AI lead in RAG and document agents. LangGraph and CrewAI adoption is common.
Samsung / SK — Evaluating Claude Code SDK and OpenAI Agents SDK side by side for internal coding assistants and ops agents.

Japan

NTT (tsuzumi) — Industrial agents combining NTT's own LLM tsuzumi with NTT DOCOMO's voice and telecom data. R&D experiments span AutoGen and LangGraph.
SoftBank — Building on the OpenAI partnership and the Stargate Japan data-center buildout. Validating OpenAI Agents SDK for in-house call-center and sales agents.
Preferred Networks — Industrial automation agents on top of their PLaMo model. Manufacturing and logistics focus.
Rakuten — Rakuten AI as the in-house model, with internal agents built on LangChain and LangGraph.
Mercari — Anthropic Claude-based agents in customer support and used-goods classification trials.

Common patterns

"Own model plus external SDK" is the dominant combination. Sovereignty pushes the model to be domestic; the SDK follows global standards.
Internal productivity (coding assistants, meeting transcripts, ops bots) is usually the first wave.
Japan plays to telecom-and-LLM strengths via NTT and SoftBank. Korea moves faster on search, content, and gaming use cases.
MCP adoption accelerated in 2026 — pulled by domestic Cursor and Claude Code usage.

14. References

Official docs first, then a few key academic and design-pattern references.

Framework official docs

OpenAI Agents SDK — https://openai.github.io/openai-agents-python/
Anthropic Claude Agent SDK — https://docs.anthropic.com/en/docs/claude-code/sdk
LangGraph — https://langchain-ai.github.io/langgraph/
AutoGen — https://microsoft.github.io/autogen/
CrewAI — https://docs.crewai.com/
smolagents — https://huggingface.co/docs/smolagents
Mastra — https://mastra.ai/docs
Pydantic AI — https://ai.pydantic.dev/
LlamaIndex Agents — https://docs.llamaindex.ai/en/stable/use_cases/agents/
Vercel AI SDK — https://sdk.vercel.ai/docs
Bee Agent Framework — https://i-am-bee.github.io/bee-agent-framework/

Protocols and standards

Model Context Protocol (MCP) — https://modelcontextprotocol.io/
Anthropic MCP announcement — https://www.anthropic.com/news/model-context-protocol
A2A (Agent-to-Agent) — https://a2aproject.github.io/A2A/

Core papers and design patterns

ReAct (Yao et al., 2022) — https://arxiv.org/abs/2210.03629
Plan-and-Solve (Wang et al., 2023) — https://arxiv.org/abs/2305.04091
Tree of Thoughts (Yao et al., 2023) — https://arxiv.org/abs/2305.10601
Reflexion (Shinn et al., 2023) — https://arxiv.org/abs/2303.11366
Andrew Ng — Agentic Design Patterns (DeepLearning.AI, 2024)

Further reading

OpenAI — A practical guide to building agents (2024) — https://openai.com/index/new-tools-for-building-agents/
Anthropic — Building effective agents (2024) — https://www.anthropic.com/engineering/building-effective-agents
LangChain — State of AI Agents 2024 — https://blog.langchain.dev/

Epilogue — Choosing an Opinion

One-sentence summary: an agent framework is not a tool, it's an opinion. OpenAI Agents SDK says "agents hand off like functions." LangGraph says "agents are state machines." CrewAI says "agents are roles." Anthropic Agent SDK says "agents are governed by context management." smolagents says "agents write code."

The same problem gets different solutions under different opinions. So — as much as you debate the model — be conscious that you are also choosing the framework's opinion.

Next post candidates: a deep dive on agent evaluation systems (Inspect AI, Promptfoo, LangSmith), writing your own MCP server, and patterns for subagent orchestration.

"A framework is not a library, it is an opinion. Realizing you are choosing an opinion is the first button on the tool-selection coat."

— AI Agent Frameworks in 2026, end.