- Published on
GraphQL Federation Complete Guide 2025: Apollo Federation, Supergraph, Distributed GraphQL
- Authors

- Name
- Youngju Kim
- @fjvbn20031
TL;DR
- Federation: Unify multiple GraphQL services into a single supergraph. The standard for microservices.
- Successor to Schema Stitching: Federation v2 replaces stitching. More powerful and explicit.
- Subgraph design: Split by domain. Share data across subgraphs via entities.
- Apollo Router: Written in Rust. 10x faster than Gateway. Becoming the CNCF standard.
- Netflix, Airbnb, Expedia, GitHub all adopted Federation.
1. Why GraphQL Federation?
1.1 Limits of Monolithic GraphQL
[Client]
↓
[GraphQL Server]
↓
[User Service] [Order Service] [Product Service] [...]
Problems:
- Single-team bottleneck: Schema managed in one place.
- Huge codebase: All domains in one server.
- Deployment coupling: One domain change = full redeploy.
- Scalability: Single point of failure.
1.2 Schema Stitching (previous attempt)
Combines multiple GraphQL schemas into one.
Limits: manual conflict resolution, complex config, performance issues, no explicit contract.
1.3 Enter Federation
Apollo Federation v1 (2019): evolution of schema stitching. Federation v2 (2022): clearer design, more powerful.
[Apollo Router]
/ | \
[Users] [Orders] [Products] ← each subgraph is independent
Key idea: Each team owns its domain's GraphQL service. Router composes automatically.
2. Federation Core Concepts
2.1 Subgraph
A Subgraph is the unit of federation — its own GraphQL schema.
# users-subgraph
type User @key(fields: "id") {
id: ID!
name: String!
email: String!
}
type Query {
user(id: ID!): User
users: [User!]!
}
@key: makes this type referable from other subgraphs.
2.2 Entity
An Entity is a type shared across subgraphs, marked with @key.
users-subgraph:
type User @key(fields: "id") {
id: ID!
name: String!
email: String!
}
orders-subgraph:
type Order @key(fields: "id") {
id: ID!
total: Float!
user: User! # reference to User entity
}
type User @key(fields: "id") {
id: ID! @external
orders: [Order!]! # new field
}
Two subgraphs contribute different fields to the same User type.
2.3 Reference Resolver
How a subgraph resolves its entity when referenced by others:
// users-subgraph
const resolvers = {
User: {
__resolveReference(reference) {
// reference = { id: "123" }
return getUserById(reference.id)
}
}
}
2.4 Composition
Router takes all subgraph schemas and produces a supergraph schema. Clients only see the single supergraph.
3. Real Example — E-commerce
3.1 Subgraph Design
Users:
type User @key(fields: "id") {
id: ID!
name: String!
email: String!
address: Address!
}
type Address {
street: String!
city: String!
zipcode: String!
}
Products:
type Product @key(fields: "id") {
id: ID!
name: String!
description: String!
price: Float!
stock: Int!
}
Reviews:
type Review @key(fields: "id") {
id: ID!
rating: Int!
comment: String!
productId: ID!
authorId: ID!
}
type Product @key(fields: "id") {
id: ID! @external
reviews: [Review!]!
averageRating: Float!
}
type User @key(fields: "id") {
id: ID! @external
reviews: [Review!]!
}
3.2 Client Query
query {
user(id: "123") {
name # users-subgraph
email # users-subgraph
reviews { # reviews-subgraph
rating
comment
product { # products-subgraph
name
price
}
}
}
}
Router fetches fields from each subgraph and assembles one response.
4. Apollo Router — The New Standard
4.1 Gateway to Router Migration
- Apollo Gateway (Node.js, 2019): first implementation, single-threaded, memory-heavy.
- Apollo Router (Rust, 2022): rewritten in Rust, 10x+ faster, memory efficient.
4.2 Install
curl -sSL https://router.apollo.dev/download/nix/latest | sh
./router --config router.yaml --supergraph supergraph.graphql
router.yaml:
supergraph:
listen: 0.0.0.0:4000
cors:
origins:
- https://example.com
telemetry:
metrics:
prometheus:
enabled: true
tracing:
otlp:
enabled: true
endpoint: http://otel-collector:4317
4.3 Schema Composition
rover supergraph compose --config supergraph-config.yaml > supergraph.graphql
federation_version: 2
subgraphs:
users:
routing_url: http://users-service:4001
schema:
subgraph_url: http://users-service:4001
products:
routing_url: http://products-service:4002
schema:
subgraph_url: http://products-service:4002
reviews:
routing_url: http://reviews-service:4003
schema:
subgraph_url: http://reviews-service:4003
4.4 GraphOS — Apollo's Managed Platform
Schema registry (auto composition), schema validation (CI), operation analytics, performance monitoring, schema diff (breaking change detection).
rover subgraph check my-graph@prod \
--schema users.graphql \
--name users
5. Federation Directives
5.1 @key
type User @key(fields: "id") {
id: ID!
name: String!
}
# composite key
type Order @key(fields: "id userId") {
id: ID!
userId: ID!
total: Float!
}
# multiple keys
type Product
@key(fields: "id")
@key(fields: "sku") {
id: ID!
sku: String!
name: String!
}
5.2 @external
Reference a field defined in another subgraph.
type User @key(fields: "id") {
id: ID! @external
reviews: [Review!]!
}
5.3 @requires
Depend on a field owned by another subgraph.
type Product @key(fields: "id") {
id: ID! @external
weight: Float! @external
shippingCost: Float! @requires(fields: "weight")
}
5.4 @provides
Declare fields this subgraph can return directly — avoids extra calls.
type Review @key(fields: "id") {
id: ID!
product: Product! @provides(fields: "name")
}
type Product @key(fields: "id") {
id: ID! @external
name: String! @external
}
5.5 @shareable (v2)
Multiple subgraphs may define the same field.
5.6 @inaccessible
Hide a field from the supergraph (internal-only).
6. Common Patterns
6.1 Domain Split
Wrong: by technology (DB / Cache / Auth subgraph). Right: by business domain (Users / Orders / Products / Inventory / Shipping).
6.2 Entity Ownership
Every entity needs a single owning subgraph that defines its identity. Other subgraphs only extend with additional fields.
6.3 Avoiding N+1
Use DataLoader to batch entity lookups:
const orderLoader = new DataLoader(async (userIds) => {
const orders = await db.query('SELECT * FROM orders WHERE user_id IN (?)', userIds)
return userIds.map(id => orders.filter(o => o.user_id === id))
})
const resolvers = {
User: {
orders: (user) => orderLoader.load(user.id)
}
}
6.4 AuthN/AuthZ
Validate JWT at the Router, propagate context headers to subgraphs.
authentication:
router:
jwt:
jwks:
- url: https://auth.example.com/.well-known/jwks.json
headers:
all:
request:
- propagate:
named: authorization
7. Operations & Observability
7.1 Metrics
Apollo Router exposes Prometheus metrics out of the box: apollo_router_http_requests_total, apollo_router_http_request_duration_seconds, etc.
7.2 Distributed Tracing
telemetry:
tracing:
otlp:
enabled: true
endpoint: http://otel-collector:4317
propagation:
trace_context: true
jaeger: true
Router-to-subgraph calls share the same trace.
7.3 Schema Registry
Detect breaking changes and composition conflicts in CI before they hit prod.
7.4 Persisted Queries
Clients send a query ID (hash). Router runs only pre-registered queries — blocks arbitrary queries, shrinks payload.
8. Migration Strategy
8.1 Monolithic GraphQL to Federation
- Schema analysis — classify by domain.
- Extract the first subgraph (clearest domain, e.g. users). Put Router in front of the monolith.
- Gradually split — one domain at a time.
- Decommission the monolith.
8.2 REST to Federation
- Wrap each REST API as a GraphQL subgraph.
- Federate the wrappers.
- Progressively rewrite to native GraphQL.
8.3 Common Pitfalls
- Splitting too small: a 5-person domain across 10 subgraphs is ops hell. Follow Conway's Law.
- Ambiguous entity ownership: decide a single owner.
- Synchronous dependencies: add DataLoader and caching to prevent cascading failures.
9. Case Studies
- Netflix: hundreds of subgraphs, Java + DGS framework, schema-first, BFF pattern.
- Airbnb: monolith to Federation. Key lesson — migrate gradually, not all at once.
- Expedia: 50+ subgraphs for travel search, heavy DataLoader usage.
- GitHub: public GraphQL API using Federation internally.
10. Federation vs Alternatives
10.1 Monolith vs Federation
| Monolith | Federation | |
|---|---|---|
| Complexity | Low | High |
| Team autonomy | Low | High |
| Scalability | Single node | Unlimited |
| Dev speed | Small team fast | Large team fast |
| Operations | Simple | Complex |
Rule of thumb: under 5 engineers, stay monolithic. Otherwise consider Federation.
10.2 Federation vs BFF
BFF = GraphQL server per client (web, mobile, partner). Federation = single supergraph. Combine them: Federation + client-specific views.
10.3 Federation vs gRPC
Typical pattern: Client to Apollo Router (GraphQL), Router to Subgraph (GraphQL), Subgraph to internal services (gRPC).
Quiz
1. Schema Stitching vs Federation?
Answer: Stitching required manual conflict resolution and lacked explicit contracts. Federation introduces explicit contracts (@key, @external), automatic composition, and breaking-change detection. Federation v2 addressed stitching's shortcomings and became the de facto standard. Stitching is deprecated.
2. Role of the @key directive?
Answer: Defines an entity. @key(fields: "id") means "this type is identifiable by id and can be referenced from other subgraphs." Router uses it to call __resolveReference on the owning subgraph. Composite (@key(fields: "id userId")) and multiple keys are supported. The most important Federation directive.
3. Why is Apollo Router faster than Gateway?
Answer: Rewritten in Rust. Gateway ran on Node.js (single-threaded, memory-heavy). Router benefits from Rust's zero-cost abstractions, Tokio async runtime (true multi-threading), and memory efficiency. Result: 10x+ faster, roughly half the memory. Gateway is deprecated.
4. Correct basis for splitting subgraphs?
Answer: By business domain. Wrong: DB, Cache, Auth (technology). Right: Users, Orders, Products, Inventory, Shipping. Follow Conway's Law — one team per subgraph. Avoid over-splitting (5 people, 10 subgraphs) and under-splitting (50 people in one subgraph).
5. How to avoid N+1 in Federation?
Answer: Use DataLoader to batch entity lookups per tick. Router sends entity references in batches; subgraphs resolve them with DataLoader. Also reduce cross-subgraph calls with @provides, add caching (Redis), and analyze operations to optimize frequently co-requested entities. DataLoader is essential.