필사 모드: AI Agent Frameworks 2026 Deep Dive — LangGraph, AutoGen, CrewAI, Semantic Kernel, OpenAI Agents SDK, Claude Agent SDK, Pydantic AI
EnglishIntro — Is the "Agent Framework" Category Settled in May 2026?
Between late 2025 and Q1 2026 the agent-framework landscape went through two big inflection points. First, **OpenAI Swarm was retired and replaced with the OpenAI Agents SDK** at GA. Second, **Anthropic rebranded Claude Code SDK to Claude Agent SDK**, signaling it is no longer a coding-only tool. LangChain shipped LangGraph 0.3, promoting interrupt and human-in-the-loop to first-class citizens. Microsoft now runs two tracks — AutoGen 0.4 (a full actor-model rewrite) and Semantic Kernel Agent Framework — in parallel.
This article is a "which agent framework should I pick in production today" guide for May 2026. It does not copy marketing pages. We look at real API shape, multi-agent patterns, observability hooks, model agnosticism, and even Korean and Japanese community signals.
Evaluation Axes — Eight Dimensions We Use
Any comparison needs criteria first. We decompose every framework along eight axes.
1. **Graph or state model**: explicit state machine, free-form chat, or event workflow?
2. **Tool calling**: who is responsible for converting function signatures to JSON schemas?
3. **Multi-agent patterns**: handoff, supervisor-worker, magentic, role-based, etc.
4. **Interrupt and resume**: checkpoints, human-in-the-loop, long-running task support.
5. **Memory**: short-term (context window) and long-term (vector / graph) abstractions.
6. **Observability**: tracing, evals, debugging UIs.
7. **Model agnosticism**: cost of switching between OpenAI, Anthropic, Gemini, open-source.
8. **Language SDKs**: depth of .NET, TypeScript, Java support beyond Python.
At a Glance — May 2026 Framework Lineup
| Framework | Vendor | First-class language | Core model | Multi-agent |
| --- | --- | --- | --- | --- |
| LangGraph 0.3+ | LangChain | Python, TS | state graph + checkpoints | supervisor, swarm |
| AutoGen 0.4 | Microsoft | Python | async actor | GroupChat, Magentic-One |
| CrewAI | CrewAI Inc. | Python | role-based crew + Flows | hierarchical, sequential |
| Semantic Kernel 1.30+ | Microsoft | .NET, Python, Java | plugin + planner | Agent Framework |
| OpenAI Agents SDK | OpenAI | Python, TS | Responses API + handoff | handoff DAG |
| Claude Agent SDK | Anthropic | Python, TS | system prompt + tools + MCP | subagents |
| Pydantic AI | Pydantic Team | Python | type-safe agent | delegation |
| Smol Agents | Hugging Face | Python | code-executing agent | managed agents |
| LlamaIndex Agent Workflows | LlamaIndex | Python, TS | event-driven workflow | multi-agent workflow |
| Vercel AI SDK | Vercel | TypeScript | streaming + tool calling | manual orchestration |
Each row is derived from real API shape, not marketing claims. Now we walk through each framework.
LangGraph 0.3 — The De Facto Standard for State Graphs + Checkpoints
LangGraph is the graph-based orchestration library from the LangChain team. The 0.3 line is anchored on three improvements.
- **Interrupt and resume promoted to first-class**: calling `interrupt()` pauses execution and external code resumes the graph with a `Command` object after a human decides.
- **Checkpointers diversified**: memory, SQLite, Postgres, and Redis checkpointers are all officially supported.
- **LangGraph Studio + LangSmith**: visually debug the graph and ship traces to LangSmith automatically.
A typical LangGraph 0.3 program looks like this.
from typing import Annotated, TypedDict
from langgraph.graph import StateGraph, START, END
from langgraph.checkpoint.memory import MemorySaver
from langgraph.types import interrupt, Command
class State(TypedDict):
messages: Annotated[list, "msgs"]
approved: bool
def plan(state: State) -> State:
return {"messages": state["messages"] + ["plan drafted"]}
def human_review(state: State) -> State:
decision = interrupt({"plan": state["messages"][-1]})
return {"approved": decision == "approve"}
def execute(state: State) -> State:
if not state["approved"]:
return {"messages": state["messages"] + ["aborted"]}
return {"messages": state["messages"] + ["executed"]}
graph = (
StateGraph(State)
.add_node("plan", plan)
.add_node("human_review", human_review)
.add_node("execute", execute)
.add_edge(START, "plan")
.add_edge("plan", "human_review")
.add_edge("human_review", "execute")
.add_edge("execute", END)
.compile(checkpointer=MemorySaver())
)
The appeal is that **graph topology is right there in the code**. Nodes and edges are explicit, which makes debugging and replay tractable. The cost is ramp-up: even a simple chatbot forces you to learn graphs, state, checkpointers, and interrupts.
AutoGen 0.4 — Actor-Model Async Rewrite
Microsoft Research rewrote AutoGen almost from scratch for 0.4. The synchronous GroupChat model of 0.2 hit walls at debugging and scale, so the new release is built on an async actor runtime.
Core components are now split.
- **`autogen-core`**: actor runtime with async message passing and routing.
- **`autogen-agentchat`**: multi-agent chat abstraction similar to 0.2.
- **`autogen-ext`**: integrations with OpenAI, Anthropic, Azure OpenAI, Ollama, and more.
- **AutoGen Studio**: no-code designer and debugger UI.
- **Magentic-One**: a generalist multi-agent system with predefined agents (WebSurfer, FileSurfer, Coder, ComputerTerminal) coordinated by an Orchestrator.
A minimal two-agent collaboration looks like this.
from autogen_agentchat.agents import AssistantAgent
from autogen_agentchat.teams import RoundRobinGroupChat
from autogen_agentchat.conditions import MaxMessageTermination
from autogen_ext.models.openai import OpenAIChatCompletionClient
async def main():
client = OpenAIChatCompletionClient(model="gpt-4o")
writer = AssistantAgent("writer", client, system_message="You write drafts.")
reviewer = AssistantAgent("reviewer", client, system_message="You critique drafts.")
team = RoundRobinGroupChat([writer, reviewer], termination_condition=MaxMessageTermination(6))
await team.run(task="Draft a 3-sentence pitch for a Korean coffee subscription.")
asyncio.run(main())
The strength of 0.4 is **distributed execution and observability**. OpenTelemetry tracing is built in and per-actor logs are isolated. Magentic-One is a reference implementation that posted SOTA-level numbers on GAIA out of the box.
CrewAI — The Most Intuitive Abstraction for Role-Based Collaboration
CrewAI models "a company of people with roles" directly. The core abstractions are Agent, Task, Crew, and Process.
- **Agent**: an agent with a role, goal, and backstory.
- **Task**: a unit of work with an explicit input and expected output.
- **Crew**: a bundle of agents plus tasks.
- **Process**: sequential or hierarchical execution.
- **Flows**: an event-based decision API added in late 2025. Combines with Crew for branching and stateful execution.
A typical CrewAI program looks like this.
from crewai import Agent, Task, Crew, Process
researcher = Agent(
role="Senior researcher",
goal="Investigate the AI agent market",
backstory="10 years as a technology analyst",
allow_delegation=False,
)
writer = Agent(
role="Tech writer",
goal="Convert research into a blog post",
backstory="Developer-turned-technical-writer",
)
research = Task(
description="Compare 10 AI agent frameworks in 2026",
expected_output="markdown table plus key signals",
agent=researcher,
)
draft = Task(
description="Convert the research into a 1500-character article",
expected_output="markdown body",
agent=writer,
context=[research],
)
crew = Crew(agents=[researcher, writer], tasks=[research, draft], process=Process.sequential)
result = crew.kickoff()
CrewAI Enterprise is a SaaS, but there is also a self-hosting story that bundles eval, deploy, and monitoring. It is one of the fastest frameworks for building a demo.
Semantic Kernel 1.30+ — Microsoft's .NET, Python, and Java Track
Semantic Kernel is the SDK Microsoft uses across the Copilot stack. The 1.30 line in May 2026 has these defining traits.
- **Agent Framework GA**: `ChatCompletionAgent`, `OpenAIAssistantAgent`, `AzureAIAgent`, and `BedrockAgent` share one interface.
- **Plugin system**: annotate a function and it is exposed as a tool automatically. C# uses `KernelFunctionAttribute`, Python uses `@kernel_function`.
- **Planner**: takes a natural-language goal and produces a sequence of plugin calls. Function-Calling planner, Handlebars planner, etc.
- **Memory**: backends for Azure AI Search, Qdrant, Chroma, Postgres pgvector, and more.
A C# example looks like this.
using Microsoft.SemanticKernel;
using Microsoft.SemanticKernel.Agents;
var builder = Kernel.CreateBuilder();
builder.AddAzureOpenAIChatCompletion("gpt-4o", endpoint, apiKey);
builder.Plugins.AddFromType<WeatherPlugin>();
var kernel = builder.Build();
var agent = new ChatCompletionAgent
{
Name = "WeatherBot",
Instructions = "You answer weather questions using the Weather plugin.",
Kernel = kernel,
};
await foreach (var item in agent.InvokeAsync("What is the weather in Seoul today?"))
{
Console.WriteLine(item.Content);
}
If you live on .NET, or need Microsoft 365 / Azure AI Foundry integration, Semantic Kernel is effectively the default. The Java SDK is steadily catching up across the 1.x line.
OpenAI Agents SDK — The Successor to Swarm
OpenAI Swarm was an "educational example" in fall 2024, but the OpenAI Agents SDK went GA in spring 2025. There are four core abstractions.
- **`Agent`**: bundles a model, system prompt, tools, handoffs, and guardrails.
- **`handoff`**: a first-class object that transfers control from one agent to another.
- **`guardrail`**: a validation hook attached to input or output, halting on policy violations.
- **`trace`**: tracing on top of the Responses API, ready to view in the OpenAI dashboard.
The code is short.
from agents import Agent, Runner, handoff
triage = Agent(
name="Triage",
instructions="Decide whether the user wants billing or technical help.",
)
billing = Agent(name="Billing", instructions="Answer billing questions.")
tech = Agent(name="Tech", instructions="Answer technical questions.")
triage.handoffs = [handoff(billing), handoff(tech)]
result = Runner.run_sync(triage, "I want a refund")
print(result.final_output)
The strength is **depth of OpenAI integration**. Responses API, structured output, tools, and traces ship as one cohesive product. The downside is model agnosticism: you can plug other models in via LiteLLM, but first-class support is OpenAI.
Claude Agent SDK — The Rebrand of Claude Code SDK
Anthropic rebranded Claude Code SDK to **Claude Agent SDK** at the end of 2025. The renaming says it all: it is no longer a "coding-only" tool but a general SDK for building Claude-powered agents.
Core building blocks include:
- **System-prompt builder**: composes role, tool descriptions, and safety guidance.
- **Tools**: function plus JSON schema. Built-ins include `bash`, `file_edit`, and `web_search`.
- **MCP (Model Context Protocol)**: a standard protocol for plugging in external context sources. Anthropic published it in November 2025 and it has become the de facto standard.
- **Subagents**: a pattern where a main agent invokes smaller agents.
A Python example with the Anthropic SDK looks like this.
from anthropic import Anthropic
client = Anthropic()
response = client.messages.create(
model="claude-opus-4-7",
max_tokens=1024,
system="You are a helpful coding assistant.",
tools=[
{
"name": "read_file",
"description": "Read a file from disk",
"input_schema": {
"type": "object",
"properties": {"path": {"type": "string"}},
"required": ["path"],
},
}
],
messages=[{"role": "user", "content": "read README.md"}],
)
The strengths are **long context plus reliable tool calling**. The Claude 4.5 / 4.6 series ships with a 200K-token context window and strong tool-calling consistency, both exposed through the SDK.
Pydantic AI — The Type-Safe Agent Framework
Built by the Pydantic team and first released in late 2024, Pydantic AI has grown fast. The slogan is simple: "what FastAPI did for API servers, Pydantic AI does for agents."
Core design decisions:
- **Type safety**: inputs, tools, and outputs are all Pydantic models.
- **Dependency injection**: external context (DB connections, API keys, user info) is injected at run time.
- **Structured responses**: the LLM output is forced into a Pydantic model.
- **Model agnosticism**: OpenAI, Anthropic, Gemini, Groq, Mistral, and Ollama share one interface.
The code feels very Pythonic.
from pydantic import BaseModel
from pydantic_ai import Agent, RunContext
class WeatherResponse(BaseModel):
city: str
temperature_c: float
condition: str
class Deps(BaseModel):
api_key: str
agent = Agent(
"anthropic:claude-sonnet-4-6",
deps_type=Deps,
output_type=WeatherResponse,
system_prompt="You answer weather questions using the get_weather tool.",
)
@agent.tool
def get_weather(ctx: RunContext[Deps], city: str) -> dict:
return {"city": city, "temperature_c": 21.0, "condition": "clear"}
result = agent.run_sync("What is the weather in Seoul?", deps=Deps(api_key="..."))
print(result.output)
Teams that care deeply about type safety, especially FastAPI shops, have adopted it quickly.
Smol Agents — Hugging Face's Code-Executing Agent
Hugging Face released Smol Agents in late 2024 around the thesis that "agents should write code, not function-call JSON." It follows the ReAct + CodeAct line.
- **CodeAgent**: writes and executes Python on every turn. Real code instead of a JSON tool call.
- **ToolCallingAgent**: a traditional JSON tool-calling mode for when you want it.
- **Sandboxes**: E2B, Docker, and local interpreter modes for isolation.
- **HF Hub integration**: tools and agents can be pushed and pulled from the Hub.
from smolagents import CodeAgent, HfApiModel, DuckDuckGoSearchTool
model = HfApiModel("meta-llama/Llama-3.3-70B-Instruct")
agent = CodeAgent(tools=[DuckDuckGoSearchTool()], model=model)
agent.run("One-line summary of the AI agent market in 2026")
The code-execution path gives you maximum flexibility in chaining tools and transforming data. The trade-off is the cost and attack surface of the sandbox.
LlamaIndex Agent Workflows — Event-Driven Workflows
In early 2025 LlamaIndex consolidated its older `AgentRunner` / `AgentWorker` abstractions into **Workflows**. Event-driven design is the core idea.
- Nodes receive events and emit other events.
- Branching is expressed naturally by event type.
- Multi-agent workflows live in a separate `MultiAgentWorkflow` module.
- It composes seamlessly with the LlamaIndex index and retriever ecosystem.
from llama_index.core.workflow import Workflow, step, Event, StartEvent, StopEvent
class ResearchEvent(Event):
query: str
class MyWorkflow(Workflow):
@step
async def start(self, ev: StartEvent) -> ResearchEvent:
return ResearchEvent(query=ev.input)
@step
async def research(self, ev: ResearchEvent) -> StopEvent:
in practice this calls a retriever
return StopEvent(result=f"summary of {ev.query}")
asyncio.run(MyWorkflow().run(input="AI agents 2026"))
If your core workload is RAG, LlamaIndex Workflows is almost the natural choice.
Vercel AI SDK — TypeScript-First Streaming and Tool Calling
The Vercel AI SDK is effectively the standard in TypeScript front-end and full-stack land. The 4.x line in May 2026 has these traits.
- **`streamText`, `generateText`, `generateObject`**: simple functional API.
- **`tools`**: define tools with zod schemas. Type inference is clean.
- **`useChat`, `useCompletion`**: React hooks for streaming UI.
- **Provider abstraction**: `@ai-sdk/openai`, `@ai-sdk/anthropic`, `@ai-sdk/google`, `@ai-sdk/mistral`, and more.
const result = await generateText({
model: anthropic("claude-sonnet-4-6"),
tools: {
getWeather: tool({
description: "Get current weather for a city",
parameters: z.object({ city: z.string() }),
execute: async ({ city }) => ({ city, tempC: 21 }),
}),
},
prompt: "What is the weather in Seoul?",
});
The Vercel AI SDK explicitly positions itself as **not an agent orchestration framework**. Multi-agent flows are something you implement yourself. That makes it the fastest path for simple chat + tool calling, but a poor fit for complex graphs.
Multi-Agent Patterns Compared — Same Problem, Different Shapes
Here is how five frameworks express the same "research, write, review" workflow.
| Framework | How it's expressed |
| --- | --- |
| LangGraph | Three nodes plus explicit edges. All intermediate state in the State object. |
| AutoGen | RoundRobinGroupChat or SelectorGroupChat with an explicit termination condition. |
| CrewAI | Three Agents plus three Tasks plus `Process.sequential`. Results flow through `context`. |
| OpenAI Agents SDK | One triage agent plus two handoffs. The Runner routes between them. |
| Claude Agent SDK | The main agent calls subagents, exposed like tools. |
The core difference is **who decides routing**. LangGraph leaves it to code (edges), AutoGen to the group-chat policy, CrewAI to the process type, OpenAI Agents SDK to the handoff decision, and Claude Agent SDK to the main agent.
Tool Calling — Who Owns Schema Generation?
Tool calling is the common abstraction across all frameworks, but ownership of schema generation differs.
- **Pydantic AI / Semantic Kernel / Vercel AI SDK**: extract automatically from the type definition (Pydantic, C# signature, zod).
- **LangGraph / OpenAI Agents SDK**: extract from `@tool` decorators or function signatures.
- **CrewAI**: subclass `BaseTool` or use the `@tool` decorator.
- **Claude Agent SDK / AutoGen**: explicit JSON schemas are also welcome.
- **Smol Agents**: no schema at all — Python functions are the tools because the agent writes code.
Auto-generated schemas are faster to develop with, while explicit schemas give you tighter control over the interface exposed to the LLM.
Interrupts and Human-in-the-Loop — Who Does It Best?
For long-running agents, the critical question is "can a human step in?"
- **LangGraph**: first-class via `interrupt()` + `Command`. Combined with checkpointers, pause/resume is natural.
- **AutoGen**: expressed via `UserProxyAgent`. Somewhat indirect.
- **CrewAI**: partial support via Human Input tasks.
- **OpenAI Agents SDK**: handled by guardrail interruption or outside the main loop.
- **Semantic Kernel**: paired with the Process Framework for external events.
- **Claude Agent SDK**: tool-call approval is exposed at the SDK level.
For workflows that require both long-running execution and human approval (refund approval, code-change merging), LangGraph remains the most natural pick.
Observability — Without Traces, Debugging Is Impossible
Agents are non-deterministic, so traces are not optional.
- **LangGraph**: auto-integrates with LangSmith. Per-node inputs, outputs, tokens, and cost visualized.
- **AutoGen 0.4**: OpenTelemetry built in. Connects to Jaeger, Honeycomb, Datadog.
- **CrewAI**: a native dashboard plus integrations with W&B and Langfuse.
- **OpenAI Agents SDK**: Responses API traces appear automatically in the OpenAI dashboard.
- **Claude Agent SDK**: traces plus eval in the Anthropic console.
- **Pydantic AI**: native integration with Logfire (also from the Pydantic team).
- **Vercel AI SDK**: OpenTelemetry plus Vercel Observability.
Among framework-agnostic tools, **Langfuse, Arize, and Helicone** are the names that show up most often.
Model Agnosticism — How Easy Is It to Swap Models?
Model prices shift every quarter, so model agnosticism is a cost decision.
- **Strong model agnosticism**: LangGraph, AutoGen, CrewAI, Pydantic AI, Smol Agents, Vercel AI SDK, LlamaIndex.
- **Vendor-first**: OpenAI Agents SDK (OpenAI-first), Claude Agent SDK (Claude-first).
- **Microsoft-first**: Semantic Kernel (Azure OpenAI plus Bedrock-compatible).
Even vendor-first frameworks can be coerced via LiteLLM or OpenAI-compatible gateways. But because tracing and tool formats are tuned to the home vendor, swapping out the model often weakens some features.
Korean Community — LangChain Korea as the Hub
In Korea the LangChain Korea community, which started in late 2024, functions as the de facto hub for AI agent practitioners. Regular meetups, Korean-language tutorials, and LangGraph workshops are active.
The other axis is education: **Modulabs' LLM Full-Stack track** and **paid courses on Fastcampus and Inflearn**. As of Q1 2026, Korean courses for LangGraph, CrewAI, and the OpenAI Agents SDK are all available.
The patterns we see in enterprise adoption:
- Naver / Kakao / LG AI: internal models combined with LangChain or LangGraph for internal agents.
- Startups: OpenAI Agents SDK or Claude Agent SDK + Pydantic AI for fast MVPs.
- Fintech: Semantic Kernel (.NET backends) with emphasis on human-in-the-loop.
Japanese Community — Microsoft Lineup and LlamaIndex Lead
In Japan two tracks stand out.
- **Microsoft Tokyo plus Japanese SIs**: Semantic Kernel and Azure AI Foundry adoption is high. Japanese documentation is well maintained.
- **LlamaIndex Japan User Group**: Tokyo and Osaka meetups, mostly RAG-focused.
Research-oriented organizations like **PFN (Preferred Networks)** and **Sakana AI** either build their own frameworks or fork AutoGen / LangGraph for experiments. On Qiita and Zenn, Japanese tutorials for the OpenAI Agents SDK and Claude Agent SDK have grown quickly through 2026.
Which Framework Should You Pick? — Scenario Guide
The conclusion, compressed by scenario.
- **Long-running workflow plus human-in-the-loop**: LangGraph 0.3.
- **Research prototype plus multi-agent experimentation**: AutoGen 0.4 + Magentic-One.
- **Fast demo plus role-based collaboration**: CrewAI.
- **.NET / Azure integration**: Semantic Kernel.
- **OpenAI model-centric plus handoff**: OpenAI Agents SDK.
- **Claude model plus long context plus MCP**: Claude Agent SDK.
- **Type safety plus FastAPI style**: Pydantic AI.
- **Code execution plus HF ecosystem**: Smol Agents.
- **RAG-centric**: LlamaIndex Agent Workflows.
- **TypeScript front end plus streaming**: Vercel AI SDK.
Most production systems combine **two or more**. For example, building a Vercel AI SDK front end while orchestrating in the back end with LangGraph.
Closing — Frameworks Aren't Converging; They Are Diverging
As of May 2026, "framework consolidation" has not happened. The divergence is actually sharper. OpenAI and Anthropic moved toward vendor-integrated SDKs, Microsoft toward .NET + Azure integration, LangChain toward graphs + observability, and Pydantic toward type safety.
That divergence is itself a sign of market maturity. There is no single way to build agents — workload, team language, model dependence, and observability needs all differ. We hope this guide is a useful starting point for picking the framework that fits your use case.
References
1. [LangGraph documentation](https://langchain-ai.github.io/langgraph/)
2. [LangGraph GitHub](https://github.com/langchain-ai/langgraph)
3. [AutoGen 0.4 documentation](https://microsoft.github.io/autogen/)
4. [AutoGen GitHub](https://github.com/microsoft/autogen)
5. [Magentic-One paper](https://arxiv.org/abs/2411.04468)
6. [CrewAI documentation](https://docs.crewai.com/)
7. [CrewAI GitHub](https://github.com/crewAIInc/crewAI)
8. [Semantic Kernel documentation](https://learn.microsoft.com/en-us/semantic-kernel/)
9. [Semantic Kernel GitHub](https://github.com/microsoft/semantic-kernel)
10. [OpenAI Agents SDK documentation](https://openai.github.io/openai-agents-python/)
11. [OpenAI Agents SDK GitHub](https://github.com/openai/openai-agents-python)
12. [Claude Agent SDK documentation](https://docs.anthropic.com/en/docs/claude-code/sdk)
13. [Anthropic — Building Effective Agents](https://www.anthropic.com/research/building-effective-agents)
14. [Model Context Protocol](https://modelcontextprotocol.io/)
15. [Pydantic AI documentation](https://ai.pydantic.dev/)
16. [Pydantic AI GitHub](https://github.com/pydantic/pydantic-ai)
17. [Smol Agents blog](https://huggingface.co/blog/smolagents)
18. [Smol Agents GitHub](https://github.com/huggingface/smolagents)
19. [LlamaIndex Workflows documentation](https://docs.llamaindex.ai/en/stable/module_guides/workflow/)
20. [LlamaIndex GitHub](https://github.com/run-llama/llama_index)
21. [Vercel AI SDK documentation](https://sdk.vercel.ai/docs)
22. [Vercel AI SDK GitHub](https://github.com/vercel/ai)
23. [LangSmith](https://www.langchain.com/langsmith)
24. [Langfuse](https://langfuse.com/)
25. [LangChain Korea GitHub](https://github.com/teddylee777/langchain-kr)
현재 단락 (1/335)
Between late 2025 and Q1 2026 the agent-framework landscape went through two big inflection points. ...