AI Agent Frameworks 2026 Deep Dive — LangGraph, AutoGen, CrewAI, Semantic Kernel, OpenAI Agents SDK, Claude Agent SDK, Pydantic AI

Intro — Is the "Agent Framework" Category Settled in May 2026?

Between late 2025 and Q1 2026 the agent-framework landscape went through two big inflection points. First, OpenAI Swarm was retired and replaced with the OpenAI Agents SDK at GA. Second, Anthropic rebranded Claude Code SDK to Claude Agent SDK, signaling it is no longer a coding-only tool. LangChain shipped LangGraph 0.3, promoting interrupt and human-in-the-loop to first-class citizens. Microsoft now runs two tracks — AutoGen 0.4 (a full actor-model rewrite) and Semantic Kernel Agent Framework — in parallel.

This article is a "which agent framework should I pick in production today" guide for May 2026. It does not copy marketing pages. We look at real API shape, multi-agent patterns, observability hooks, model agnosticism, and even Korean and Japanese community signals.

Evaluation Axes — Eight Dimensions We Use

Any comparison needs criteria first. We decompose every framework along eight axes.

Graph or state model: explicit state machine, free-form chat, or event workflow?
Tool calling: who is responsible for converting function signatures to JSON schemas?
Multi-agent patterns: handoff, supervisor-worker, magentic, role-based, etc.
Interrupt and resume: checkpoints, human-in-the-loop, long-running task support.
Memory: short-term (context window) and long-term (vector / graph) abstractions.
Observability: tracing, evals, debugging UIs.
Model agnosticism: cost of switching between OpenAI, Anthropic, Gemini, open-source.
Language SDKs: depth of .NET, TypeScript, Java support beyond Python.

At a Glance — May 2026 Framework Lineup

Framework	Vendor	First-class language	Core model	Multi-agent
LangGraph 0.3+	LangChain	Python, TS	state graph + checkpoints	supervisor, swarm
AutoGen 0.4	Microsoft	Python	async actor	GroupChat, Magentic-One
CrewAI	CrewAI Inc.	Python	role-based crew + Flows	hierarchical, sequential
Semantic Kernel 1.30+	Microsoft	.NET, Python, Java	plugin + planner	Agent Framework
OpenAI Agents SDK	OpenAI	Python, TS	Responses API + handoff	handoff DAG
Claude Agent SDK	Anthropic	Python, TS	system prompt + tools + MCP	subagents
Pydantic AI	Pydantic Team	Python	type-safe agent	delegation
Smol Agents	Hugging Face	Python	code-executing agent	managed agents
LlamaIndex Agent Workflows	LlamaIndex	Python, TS	event-driven workflow	multi-agent workflow
Vercel AI SDK	Vercel	TypeScript	streaming + tool calling	manual orchestration

Each row is derived from real API shape, not marketing claims. Now we walk through each framework.

LangGraph 0.3 — The De Facto Standard for State Graphs + Checkpoints

LangGraph is the graph-based orchestration library from the LangChain team. The 0.3 line is anchored on three improvements.

Interrupt and resume promoted to first-class: calling interrupt() pauses execution and external code resumes the graph with a Command object after a human decides.
Checkpointers diversified: memory, SQLite, Postgres, and Redis checkpointers are all officially supported.
LangGraph Studio + LangSmith: visually debug the graph and ship traces to LangSmith automatically.

A typical LangGraph 0.3 program looks like this.

from typing import Annotated, TypedDict
from langgraph.graph import StateGraph, START, END
from langgraph.checkpoint.memory import MemorySaver
from langgraph.types import interrupt, Command

class State(TypedDict):
    messages: Annotated[list, "msgs"]
    approved: bool

def plan(state: State) -> State:
    return {"messages": state["messages"] + ["plan drafted"]}

def human_review(state: State) -> State:
    decision = interrupt({"plan": state["messages"][-1]})
    return {"approved": decision == "approve"}

def execute(state: State) -> State:
    if not state["approved"]:
        return {"messages": state["messages"] + ["aborted"]}
    return {"messages": state["messages"] + ["executed"]}

graph = (
    StateGraph(State)
    .add_node("plan", plan)
    .add_node("human_review", human_review)
    .add_node("execute", execute)
    .add_edge(START, "plan")
    .add_edge("plan", "human_review")
    .add_edge("human_review", "execute")
    .add_edge("execute", END)
    .compile(checkpointer=MemorySaver())
)

The appeal is that graph topology is right there in the code. Nodes and edges are explicit, which makes debugging and replay tractable. The cost is ramp-up: even a simple chatbot forces you to learn graphs, state, checkpointers, and interrupts.

AutoGen 0.4 — Actor-Model Async Rewrite

Microsoft Research rewrote AutoGen almost from scratch for 0.4. The synchronous GroupChat model of 0.2 hit walls at debugging and scale, so the new release is built on an async actor runtime.

Core components are now split.

autogen-core: actor runtime with async message passing and routing.
autogen-agentchat: multi-agent chat abstraction similar to 0.2.
autogen-ext: integrations with OpenAI, Anthropic, Azure OpenAI, Ollama, and more.
AutoGen Studio: no-code designer and debugger UI.
Magentic-One: a generalist multi-agent system with predefined agents (WebSurfer, FileSurfer, Coder, ComputerTerminal) coordinated by an Orchestrator.

A minimal two-agent collaboration looks like this.

import asyncio
from autogen_agentchat.agents import AssistantAgent
from autogen_agentchat.teams import RoundRobinGroupChat
from autogen_agentchat.conditions import MaxMessageTermination
from autogen_ext.models.openai import OpenAIChatCompletionClient

async def main():
    client = OpenAIChatCompletionClient(model="gpt-4o")
    writer = AssistantAgent("writer", client, system_message="You write drafts.")
    reviewer = AssistantAgent("reviewer", client, system_message="You critique drafts.")
    team = RoundRobinGroupChat([writer, reviewer], termination_condition=MaxMessageTermination(6))
    await team.run(task="Draft a 3-sentence pitch for a Korean coffee subscription.")

asyncio.run(main())

The strength of 0.4 is distributed execution and observability. OpenTelemetry tracing is built in and per-actor logs are isolated. Magentic-One is a reference implementation that posted SOTA-level numbers on GAIA out of the box.

CrewAI — The Most Intuitive Abstraction for Role-Based Collaboration

CrewAI models "a company of people with roles" directly. The core abstractions are Agent, Task, Crew, and Process.

Agent: an agent with a role, goal, and backstory.
Task: a unit of work with an explicit input and expected output.
Crew: a bundle of agents plus tasks.
Process: sequential or hierarchical execution.
Flows: an event-based decision API added in late 2025. Combines with Crew for branching and stateful execution.

A typical CrewAI program looks like this.

from crewai import Agent, Task, Crew, Process

researcher = Agent(
    role="Senior researcher",
    goal="Investigate the AI agent market",
    backstory="10 years as a technology analyst",
    allow_delegation=False,
)
writer = Agent(
    role="Tech writer",
    goal="Convert research into a blog post",
    backstory="Developer-turned-technical-writer",
)

research = Task(
    description="Compare 10 AI agent frameworks in 2026",
    expected_output="markdown table plus key signals",
    agent=researcher,
)
draft = Task(
    description="Convert the research into a 1500-character article",
    expected_output="markdown body",
    agent=writer,
    context=[research],
)

crew = Crew(agents=[researcher, writer], tasks=[research, draft], process=Process.sequential)
result = crew.kickoff()

CrewAI Enterprise is a SaaS, but there is also a self-hosting story that bundles eval, deploy, and monitoring. It is one of the fastest frameworks for building a demo.

Semantic Kernel 1.30+ — Microsoft's .NET, Python, and Java Track

Semantic Kernel is the SDK Microsoft uses across the Copilot stack. The 1.30 line in May 2026 has these defining traits.

Agent Framework GA: ChatCompletionAgent, OpenAIAssistantAgent, AzureAIAgent, and BedrockAgent share one interface.
Plugin system: annotate a function and it is exposed as a tool automatically. C# uses KernelFunctionAttribute, Python uses @kernel_function.
Planner: takes a natural-language goal and produces a sequence of plugin calls. Function-Calling planner, Handlebars planner, etc.
Memory: backends for Azure AI Search, Qdrant, Chroma, Postgres pgvector, and more.

A C# example looks like this.

using Microsoft.SemanticKernel;
using Microsoft.SemanticKernel.Agents;

var builder = Kernel.CreateBuilder();
builder.AddAzureOpenAIChatCompletion("gpt-4o", endpoint, apiKey);
builder.Plugins.AddFromType<WeatherPlugin>();
var kernel = builder.Build();

var agent = new ChatCompletionAgent
{
    Name = "WeatherBot",
    Instructions = "You answer weather questions using the Weather plugin.",
    Kernel = kernel,
};

await foreach (var item in agent.InvokeAsync("What is the weather in Seoul today?"))
{
    Console.WriteLine(item.Content);
}

If you live on .NET, or need Microsoft 365 / Azure AI Foundry integration, Semantic Kernel is effectively the default. The Java SDK is steadily catching up across the 1.x line.

OpenAI Agents SDK — The Successor to Swarm

OpenAI Swarm was an "educational example" in fall 2024, but the OpenAI Agents SDK went GA in spring 2025. There are four core abstractions.

Agent: bundles a model, system prompt, tools, handoffs, and guardrails.
handoff: a first-class object that transfers control from one agent to another.
guardrail: a validation hook attached to input or output, halting on policy violations.
trace: tracing on top of the Responses API, ready to view in the OpenAI dashboard.

The code is short.

from agents import Agent, Runner, handoff

triage = Agent(
    name="Triage",
    instructions="Decide whether the user wants billing or technical help.",
)
billing = Agent(name="Billing", instructions="Answer billing questions.")
tech = Agent(name="Tech", instructions="Answer technical questions.")

triage.handoffs = [handoff(billing), handoff(tech)]

result = Runner.run_sync(triage, "I want a refund")
print(result.final_output)

The strength is depth of OpenAI integration. Responses API, structured output, tools, and traces ship as one cohesive product. The downside is model agnosticism: you can plug other models in via LiteLLM, but first-class support is OpenAI.

Claude Agent SDK — The Rebrand of Claude Code SDK

Anthropic rebranded Claude Code SDK to Claude Agent SDK at the end of 2025. The renaming says it all: it is no longer a "coding-only" tool but a general SDK for building Claude-powered agents.

Core building blocks include:

System-prompt builder: composes role, tool descriptions, and safety guidance.
Tools: function plus JSON schema. Built-ins include bash, file_edit, and web_search.
MCP (Model Context Protocol): a standard protocol for plugging in external context sources. Anthropic published it in November 2025 and it has become the de facto standard.
Subagents: a pattern where a main agent invokes smaller agents.

A Python example with the Anthropic SDK looks like this.

from anthropic import Anthropic

client = Anthropic()

response = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=1024,
    system="You are a helpful coding assistant.",
    tools=[
        {
            "name": "read_file",
            "description": "Read a file from disk",
            "input_schema": {
                "type": "object",
                "properties": {"path": {"type": "string"}},
                "required": ["path"],
            },
        }
    ],
    messages=[{"role": "user", "content": "read README.md"}],
)

The strengths are long context plus reliable tool calling. The Claude 4.5 / 4.6 series ships with a 200K-token context window and strong tool-calling consistency, both exposed through the SDK.

Pydantic AI — The Type-Safe Agent Framework

Built by the Pydantic team and first released in late 2024, Pydantic AI has grown fast. The slogan is simple: "what FastAPI did for API servers, Pydantic AI does for agents."

Core design decisions:

Type safety: inputs, tools, and outputs are all Pydantic models.
Dependency injection: external context (DB connections, API keys, user info) is injected at run time.
Structured responses: the LLM output is forced into a Pydantic model.
Model agnosticism: OpenAI, Anthropic, Gemini, Groq, Mistral, and Ollama share one interface.

The code feels very Pythonic.

from pydantic import BaseModel
from pydantic_ai import Agent, RunContext

class WeatherResponse(BaseModel):
    city: str
    temperature_c: float
    condition: str

class Deps(BaseModel):
    api_key: str

agent = Agent(
    "anthropic:claude-sonnet-4-6",
    deps_type=Deps,
    output_type=WeatherResponse,
    system_prompt="You answer weather questions using the get_weather tool.",
)

@agent.tool
def get_weather(ctx: RunContext[Deps], city: str) -> dict:
    return {"city": city, "temperature_c": 21.0, "condition": "clear"}

result = agent.run_sync("What is the weather in Seoul?", deps=Deps(api_key="..."))
print(result.output)

Teams that care deeply about type safety, especially FastAPI shops, have adopted it quickly.

Smol Agents — Hugging Face's Code-Executing Agent

Hugging Face released Smol Agents in late 2024 around the thesis that "agents should write code, not function-call JSON." It follows the ReAct + CodeAct line.

CodeAgent: writes and executes Python on every turn. Real code instead of a JSON tool call.
ToolCallingAgent: a traditional JSON tool-calling mode for when you want it.
Sandboxes: E2B, Docker, and local interpreter modes for isolation.
HF Hub integration: tools and agents can be pushed and pulled from the Hub.

from smolagents import CodeAgent, HfApiModel, DuckDuckGoSearchTool

model = HfApiModel("meta-llama/Llama-3.3-70B-Instruct")
agent = CodeAgent(tools=[DuckDuckGoSearchTool()], model=model)
agent.run("One-line summary of the AI agent market in 2026")

The code-execution path gives you maximum flexibility in chaining tools and transforming data. The trade-off is the cost and attack surface of the sandbox.

LlamaIndex Agent Workflows — Event-Driven Workflows

In early 2025 LlamaIndex consolidated its older AgentRunner / AgentWorker abstractions into Workflows. Event-driven design is the core idea.

Nodes receive events and emit other events.
Branching is expressed naturally by event type.
Multi-agent workflows live in a separate MultiAgentWorkflow module.
It composes seamlessly with the LlamaIndex index and retriever ecosystem.

from llama_index.core.workflow import Workflow, step, Event, StartEvent, StopEvent

class ResearchEvent(Event):
    query: str

class MyWorkflow(Workflow):
    @step
    async def start(self, ev: StartEvent) -> ResearchEvent:
        return ResearchEvent(query=ev.input)

    @step
    async def research(self, ev: ResearchEvent) -> StopEvent:
        # in practice this calls a retriever
        return StopEvent(result=f"summary of {ev.query}")

import asyncio
asyncio.run(MyWorkflow().run(input="AI agents 2026"))

If your core workload is RAG, LlamaIndex Workflows is almost the natural choice.

Vercel AI SDK — TypeScript-First Streaming and Tool Calling

The Vercel AI SDK is effectively the standard in TypeScript front-end and full-stack land. The 4.x line in May 2026 has these traits.

streamText, generateText, generateObject: simple functional API.
tools: define tools with zod schemas. Type inference is clean.
useChat, useCompletion: React hooks for streaming UI.
Provider abstraction: @ai-sdk/openai, @ai-sdk/anthropic, @ai-sdk/google, @ai-sdk/mistral, and more.

import { generateText, tool } from "ai";
import { anthropic } from "@ai-sdk/anthropic";
import { z } from "zod";

const result = await generateText({
  model: anthropic("claude-sonnet-4-6"),
  tools: {
    getWeather: tool({
      description: "Get current weather for a city",
      parameters: z.object({ city: z.string() }),
      execute: async ({ city }) => ({ city, tempC: 21 }),
    }),
  },
  prompt: "What is the weather in Seoul?",
});

The Vercel AI SDK explicitly positions itself as not an agent orchestration framework. Multi-agent flows are something you implement yourself. That makes it the fastest path for simple chat + tool calling, but a poor fit for complex graphs.

Multi-Agent Patterns Compared — Same Problem, Different Shapes

Here is how five frameworks express the same "research, write, review" workflow.

Framework	How it's expressed
LangGraph	Three nodes plus explicit edges. All intermediate state in the State object.
AutoGen	RoundRobinGroupChat or SelectorGroupChat with an explicit termination condition.
CrewAI	Three Agents plus three Tasks plus `Process.sequential`. Results flow through `context`.
OpenAI Agents SDK	One triage agent plus two handoffs. The Runner routes between them.
Claude Agent SDK	The main agent calls subagents, exposed like tools.

The core difference is who decides routing. LangGraph leaves it to code (edges), AutoGen to the group-chat policy, CrewAI to the process type, OpenAI Agents SDK to the handoff decision, and Claude Agent SDK to the main agent.

Tool Calling — Who Owns Schema Generation?

Tool calling is the common abstraction across all frameworks, but ownership of schema generation differs.

Pydantic AI / Semantic Kernel / Vercel AI SDK: extract automatically from the type definition (Pydantic, C# signature, zod).
LangGraph / OpenAI Agents SDK: extract from @tool decorators or function signatures.
CrewAI: subclass BaseTool or use the @tool decorator.
Claude Agent SDK / AutoGen: explicit JSON schemas are also welcome.
Smol Agents: no schema at all — Python functions are the tools because the agent writes code.

Auto-generated schemas are faster to develop with, while explicit schemas give you tighter control over the interface exposed to the LLM.

Interrupts and Human-in-the-Loop — Who Does It Best?

For long-running agents, the critical question is "can a human step in?"

LangGraph: first-class via interrupt() + Command. Combined with checkpointers, pause/resume is natural.
AutoGen: expressed via UserProxyAgent. Somewhat indirect.
CrewAI: partial support via Human Input tasks.
OpenAI Agents SDK: handled by guardrail interruption or outside the main loop.
Semantic Kernel: paired with the Process Framework for external events.
Claude Agent SDK: tool-call approval is exposed at the SDK level.

For workflows that require both long-running execution and human approval (refund approval, code-change merging), LangGraph remains the most natural pick.

Observability — Without Traces, Debugging Is Impossible

Agents are non-deterministic, so traces are not optional.

LangGraph: auto-integrates with LangSmith. Per-node inputs, outputs, tokens, and cost visualized.
AutoGen 0.4: OpenTelemetry built in. Connects to Jaeger, Honeycomb, Datadog.
CrewAI: a native dashboard plus integrations with W&B and Langfuse.
OpenAI Agents SDK: Responses API traces appear automatically in the OpenAI dashboard.
Claude Agent SDK: traces plus eval in the Anthropic console.
Pydantic AI: native integration with Logfire (also from the Pydantic team).
Vercel AI SDK: OpenTelemetry plus Vercel Observability.

Among framework-agnostic tools, Langfuse, Arize, and Helicone are the names that show up most often.

Model Agnosticism — How Easy Is It to Swap Models?

Model prices shift every quarter, so model agnosticism is a cost decision.

Strong model agnosticism: LangGraph, AutoGen, CrewAI, Pydantic AI, Smol Agents, Vercel AI SDK, LlamaIndex.
Vendor-first: OpenAI Agents SDK (OpenAI-first), Claude Agent SDK (Claude-first).
Microsoft-first: Semantic Kernel (Azure OpenAI plus Bedrock-compatible).

Even vendor-first frameworks can be coerced via LiteLLM or OpenAI-compatible gateways. But because tracing and tool formats are tuned to the home vendor, swapping out the model often weakens some features.

Korean Community — LangChain Korea as the Hub

In Korea the LangChain Korea community, which started in late 2024, functions as the de facto hub for AI agent practitioners. Regular meetups, Korean-language tutorials, and LangGraph workshops are active.

The other axis is education: Modulabs' LLM Full-Stack track and paid courses on Fastcampus and Inflearn. As of Q1 2026, Korean courses for LangGraph, CrewAI, and the OpenAI Agents SDK are all available.

The patterns we see in enterprise adoption:

Naver / Kakao / LG AI: internal models combined with LangChain or LangGraph for internal agents.
Startups: OpenAI Agents SDK or Claude Agent SDK + Pydantic AI for fast MVPs.
Fintech: Semantic Kernel (.NET backends) with emphasis on human-in-the-loop.

Japanese Community — Microsoft Lineup and LlamaIndex Lead

In Japan two tracks stand out.

Microsoft Tokyo plus Japanese SIs: Semantic Kernel and Azure AI Foundry adoption is high. Japanese documentation is well maintained.
LlamaIndex Japan User Group: Tokyo and Osaka meetups, mostly RAG-focused.

Research-oriented organizations like PFN (Preferred Networks) and Sakana AI either build their own frameworks or fork AutoGen / LangGraph for experiments. On Qiita and Zenn, Japanese tutorials for the OpenAI Agents SDK and Claude Agent SDK have grown quickly through 2026.

Which Framework Should You Pick? — Scenario Guide

The conclusion, compressed by scenario.

Long-running workflow plus human-in-the-loop: LangGraph 0.3.
Research prototype plus multi-agent experimentation: AutoGen 0.4 + Magentic-One.
Fast demo plus role-based collaboration: CrewAI.
.NET / Azure integration: Semantic Kernel.
OpenAI model-centric plus handoff: OpenAI Agents SDK.
Claude model plus long context plus MCP: Claude Agent SDK.
Type safety plus FastAPI style: Pydantic AI.
Code execution plus HF ecosystem: Smol Agents.
RAG-centric: LlamaIndex Agent Workflows.
TypeScript front end plus streaming: Vercel AI SDK.

Most production systems combine two or more. For example, building a Vercel AI SDK front end while orchestrating in the back end with LangGraph.

Closing — Frameworks Aren't Converging; They Are Diverging

As of May 2026, "framework consolidation" has not happened. The divergence is actually sharper. OpenAI and Anthropic moved toward vendor-integrated SDKs, Microsoft toward .NET + Azure integration, LangChain toward graphs + observability, and Pydantic toward type safety.

That divergence is itself a sign of market maturity. There is no single way to build agents — workload, team language, model dependence, and observability needs all differ. We hope this guide is a useful starting point for picking the framework that fits your use case.