Skip to content
Published on

GraphQL Federation Complete Guide 2025: Apollo Federation, Supergraph, Distributed GraphQL

Authors

TL;DR

  • Federation: Unify multiple GraphQL services into a single supergraph. The standard for microservices.
  • Successor to Schema Stitching: Federation v2 replaces stitching. More powerful and explicit.
  • Subgraph design: Split by domain. Share data across subgraphs via entities.
  • Apollo Router: Written in Rust. 10x faster than Gateway. Becoming the CNCF standard.
  • Netflix, Airbnb, Expedia, GitHub all adopted Federation.

1. Why GraphQL Federation?

1.1 Limits of Monolithic GraphQL

[Client]
[GraphQL Server]
[User Service] [Order Service] [Product Service] [...]

Problems:

  1. Single-team bottleneck: Schema managed in one place.
  2. Huge codebase: All domains in one server.
  3. Deployment coupling: One domain change = full redeploy.
  4. Scalability: Single point of failure.

1.2 Schema Stitching (previous attempt)

Combines multiple GraphQL schemas into one.

Limits: manual conflict resolution, complex config, performance issues, no explicit contract.

1.3 Enter Federation

Apollo Federation v1 (2019): evolution of schema stitching. Federation v2 (2022): clearer design, more powerful.

        [Apollo Router]
       /       |        \
[Users]    [Orders]   [Products]  ← each subgraph is independent

Key idea: Each team owns its domain's GraphQL service. Router composes automatically.


2. Federation Core Concepts

2.1 Subgraph

A Subgraph is the unit of federation — its own GraphQL schema.

# users-subgraph
type User @key(fields: "id") {
  id: ID!
  name: String!
  email: String!
}

type Query {
  user(id: ID!): User
  users: [User!]!
}

@key: makes this type referable from other subgraphs.

2.2 Entity

An Entity is a type shared across subgraphs, marked with @key.

users-subgraph:

type User @key(fields: "id") {
  id: ID!
  name: String!
  email: String!
}

orders-subgraph:

type Order @key(fields: "id") {
  id: ID!
  total: Float!
  user: User!  # reference to User entity
}

type User @key(fields: "id") {
  id: ID! @external
  orders: [Order!]!  # new field
}

Two subgraphs contribute different fields to the same User type.

2.3 Reference Resolver

How a subgraph resolves its entity when referenced by others:

// users-subgraph
const resolvers = {
  User: {
    __resolveReference(reference) {
      // reference = { id: "123" }
      return getUserById(reference.id)
    }
  }
}

2.4 Composition

Router takes all subgraph schemas and produces a supergraph schema. Clients only see the single supergraph.


3. Real Example — E-commerce

3.1 Subgraph Design

Users:

type User @key(fields: "id") {
  id: ID!
  name: String!
  email: String!
  address: Address!
}

type Address {
  street: String!
  city: String!
  zipcode: String!
}

Products:

type Product @key(fields: "id") {
  id: ID!
  name: String!
  description: String!
  price: Float!
  stock: Int!
}

Reviews:

type Review @key(fields: "id") {
  id: ID!
  rating: Int!
  comment: String!
  productId: ID!
  authorId: ID!
}

type Product @key(fields: "id") {
  id: ID! @external
  reviews: [Review!]!
  averageRating: Float!
}

type User @key(fields: "id") {
  id: ID! @external
  reviews: [Review!]!
}

3.2 Client Query

query {
  user(id: "123") {
    name              # users-subgraph
    email             # users-subgraph
    reviews {         # reviews-subgraph
      rating
      comment
      product {       # products-subgraph
        name
        price
      }
    }
  }
}

Router fetches fields from each subgraph and assembles one response.


4. Apollo Router — The New Standard

4.1 Gateway to Router Migration

  • Apollo Gateway (Node.js, 2019): first implementation, single-threaded, memory-heavy.
  • Apollo Router (Rust, 2022): rewritten in Rust, 10x+ faster, memory efficient.

4.2 Install

curl -sSL https://router.apollo.dev/download/nix/latest | sh
./router --config router.yaml --supergraph supergraph.graphql

router.yaml:

supergraph:
  listen: 0.0.0.0:4000

cors:
  origins:
    - https://example.com

telemetry:
  metrics:
    prometheus:
      enabled: true
  tracing:
    otlp:
      enabled: true
      endpoint: http://otel-collector:4317

4.3 Schema Composition

rover supergraph compose --config supergraph-config.yaml > supergraph.graphql
federation_version: 2
subgraphs:
  users:
    routing_url: http://users-service:4001
    schema:
      subgraph_url: http://users-service:4001
  products:
    routing_url: http://products-service:4002
    schema:
      subgraph_url: http://products-service:4002
  reviews:
    routing_url: http://reviews-service:4003
    schema:
      subgraph_url: http://reviews-service:4003

4.4 GraphOS — Apollo's Managed Platform

Schema registry (auto composition), schema validation (CI), operation analytics, performance monitoring, schema diff (breaking change detection).

rover subgraph check my-graph@prod \
  --schema users.graphql \
  --name users

5. Federation Directives

5.1 @key

type User @key(fields: "id") {
  id: ID!
  name: String!
}

# composite key
type Order @key(fields: "id userId") {
  id: ID!
  userId: ID!
  total: Float!
}

# multiple keys
type Product
  @key(fields: "id")
  @key(fields: "sku") {
  id: ID!
  sku: String!
  name: String!
}

5.2 @external

Reference a field defined in another subgraph.

type User @key(fields: "id") {
  id: ID! @external
  reviews: [Review!]!
}

5.3 @requires

Depend on a field owned by another subgraph.

type Product @key(fields: "id") {
  id: ID! @external
  weight: Float! @external
  shippingCost: Float! @requires(fields: "weight")
}

5.4 @provides

Declare fields this subgraph can return directly — avoids extra calls.

type Review @key(fields: "id") {
  id: ID!
  product: Product! @provides(fields: "name")
}

type Product @key(fields: "id") {
  id: ID! @external
  name: String! @external
}

5.5 @shareable (v2)

Multiple subgraphs may define the same field.

5.6 @inaccessible

Hide a field from the supergraph (internal-only).


6. Common Patterns

6.1 Domain Split

Wrong: by technology (DB / Cache / Auth subgraph). Right: by business domain (Users / Orders / Products / Inventory / Shipping).

6.2 Entity Ownership

Every entity needs a single owning subgraph that defines its identity. Other subgraphs only extend with additional fields.

6.3 Avoiding N+1

Use DataLoader to batch entity lookups:

const orderLoader = new DataLoader(async (userIds) => {
  const orders = await db.query('SELECT * FROM orders WHERE user_id IN (?)', userIds)
  return userIds.map(id => orders.filter(o => o.user_id === id))
})

const resolvers = {
  User: {
    orders: (user) => orderLoader.load(user.id)
  }
}

6.4 AuthN/AuthZ

Validate JWT at the Router, propagate context headers to subgraphs.

authentication:
  router:
    jwt:
      jwks:
        - url: https://auth.example.com/.well-known/jwks.json

headers:
  all:
    request:
      - propagate:
          named: authorization

7. Operations & Observability

7.1 Metrics

Apollo Router exposes Prometheus metrics out of the box: apollo_router_http_requests_total, apollo_router_http_request_duration_seconds, etc.

7.2 Distributed Tracing

telemetry:
  tracing:
    otlp:
      enabled: true
      endpoint: http://otel-collector:4317
    propagation:
      trace_context: true
      jaeger: true

Router-to-subgraph calls share the same trace.

7.3 Schema Registry

Detect breaking changes and composition conflicts in CI before they hit prod.

7.4 Persisted Queries

Clients send a query ID (hash). Router runs only pre-registered queries — blocks arbitrary queries, shrinks payload.


8. Migration Strategy

8.1 Monolithic GraphQL to Federation

  1. Schema analysis — classify by domain.
  2. Extract the first subgraph (clearest domain, e.g. users). Put Router in front of the monolith.
  3. Gradually split — one domain at a time.
  4. Decommission the monolith.

8.2 REST to Federation

  1. Wrap each REST API as a GraphQL subgraph.
  2. Federate the wrappers.
  3. Progressively rewrite to native GraphQL.

8.3 Common Pitfalls

  • Splitting too small: a 5-person domain across 10 subgraphs is ops hell. Follow Conway's Law.
  • Ambiguous entity ownership: decide a single owner.
  • Synchronous dependencies: add DataLoader and caching to prevent cascading failures.

9. Case Studies

  • Netflix: hundreds of subgraphs, Java + DGS framework, schema-first, BFF pattern.
  • Airbnb: monolith to Federation. Key lesson — migrate gradually, not all at once.
  • Expedia: 50+ subgraphs for travel search, heavy DataLoader usage.
  • GitHub: public GraphQL API using Federation internally.

10. Federation vs Alternatives

10.1 Monolith vs Federation

MonolithFederation
ComplexityLowHigh
Team autonomyLowHigh
ScalabilitySingle nodeUnlimited
Dev speedSmall team fastLarge team fast
OperationsSimpleComplex

Rule of thumb: under 5 engineers, stay monolithic. Otherwise consider Federation.

10.2 Federation vs BFF

BFF = GraphQL server per client (web, mobile, partner). Federation = single supergraph. Combine them: Federation + client-specific views.

10.3 Federation vs gRPC

Typical pattern: Client to Apollo Router (GraphQL), Router to Subgraph (GraphQL), Subgraph to internal services (gRPC).


Quiz

1. Schema Stitching vs Federation?

Answer: Stitching required manual conflict resolution and lacked explicit contracts. Federation introduces explicit contracts (@key, @external), automatic composition, and breaking-change detection. Federation v2 addressed stitching's shortcomings and became the de facto standard. Stitching is deprecated.

2. Role of the @key directive?

Answer: Defines an entity. @key(fields: "id") means "this type is identifiable by id and can be referenced from other subgraphs." Router uses it to call __resolveReference on the owning subgraph. Composite (@key(fields: "id userId")) and multiple keys are supported. The most important Federation directive.

3. Why is Apollo Router faster than Gateway?

Answer: Rewritten in Rust. Gateway ran on Node.js (single-threaded, memory-heavy). Router benefits from Rust's zero-cost abstractions, Tokio async runtime (true multi-threading), and memory efficiency. Result: 10x+ faster, roughly half the memory. Gateway is deprecated.

4. Correct basis for splitting subgraphs?

Answer: By business domain. Wrong: DB, Cache, Auth (technology). Right: Users, Orders, Products, Inventory, Shipping. Follow Conway's Law — one team per subgraph. Avoid over-splitting (5 people, 10 subgraphs) and under-splitting (50 people in one subgraph).

5. How to avoid N+1 in Federation?

Answer: Use DataLoader to batch entity lookups per tick. Router sends entity references in batches; subgraphs resolve them with DataLoader. Also reduce cross-subgraph calls with @provides, add caching (Redis), and analyze operations to optimize frequently co-requested entities. DataLoader is essential.


References