Skip to content
Published on

MongoDB Sharding Complete Guide: From Shard Key Design to Operational Automation

Authors
  • Name
    Twitter

Introduction

When a single MongoDB instance can no longer handle the data volume or throughput, Sharding becomes necessary. Sharding is a technique that horizontally distributes data across multiple servers (shards) to scale read/write performance and storage capacity.

This article covers building and operating a Sharded Cluster based on MongoDB 7.0+.

Sharded Cluster Architecture

A MongoDB Sharded Cluster consists of three components:

  • Shard: Replica Set that stores the actual data
  • Config Server: Replica Set that stores metadata and routing information
  • mongos: Router that routes client requests to the appropriate Shard

Building a Sharded Cluster with Docker Compose

# docker-compose.yml
services:
  # Config Server Replica Set
  config-svr-1:
    image: mongo:7.0
    command: mongod --configsvr --replSet configRS --port 27019
    volumes:
      - config1-data:/data/db

  config-svr-2:
    image: mongo:7.0
    command: mongod --configsvr --replSet configRS --port 27019
    volumes:
      - config2-data:/data/db

  config-svr-3:
    image: mongo:7.0
    command: mongod --configsvr --replSet configRS --port 27019
    volumes:
      - config3-data:/data/db

  # Shard 1 Replica Set
  shard1-1:
    image: mongo:7.0
    command: mongod --shardsvr --replSet shard1RS --port 27018
    volumes:
      - shard1-1-data:/data/db

  shard1-2:
    image: mongo:7.0
    command: mongod --shardsvr --replSet shard1RS --port 27018
    volumes:
      - shard1-2-data:/data/db

  # Shard 2 Replica Set
  shard2-1:
    image: mongo:7.0
    command: mongod --shardsvr --replSet shard2RS --port 27018
    volumes:
      - shard2-1-data:/data/db

  shard2-2:
    image: mongo:7.0
    command: mongod --shardsvr --replSet shard2RS --port 27018
    volumes:
      - shard2-2-data:/data/db

  # Mongos Router
  mongos:
    image: mongo:7.0
    command: mongos --configdb configRS/config-svr-1:27019,config-svr-2:27019,config-svr-3:27019 --port 27017
    ports:
      - '27017:27017'
    depends_on:
      - config-svr-1
      - config-svr-2
      - config-svr-3

volumes:
  config1-data:
  config2-data:
  config3-data:
  shard1-1-data:
  shard1-2-data:
  shard2-1-data:
  shard2-2-data:

Initialization Script

#!/bin/bash
# init-sharding.sh

# 1. Initialize Config Server Replica Set
docker exec -it config-svr-1 mongosh --port 27019 --eval '
rs.initiate({
  _id: "configRS",
  configsvr: true,
  members: [
    { _id: 0, host: "config-svr-1:27019" },
    { _id: 1, host: "config-svr-2:27019" },
    { _id: 2, host: "config-svr-3:27019" }
  ]
})'

# 2. Initialize Shard 1 Replica Set
docker exec -it shard1-1 mongosh --port 27018 --eval '
rs.initiate({
  _id: "shard1RS",
  members: [
    { _id: 0, host: "shard1-1:27018" },
    { _id: 1, host: "shard1-2:27018" }
  ]
})'

# 3. Initialize Shard 2 Replica Set
docker exec -it shard2-1 mongosh --port 27018 --eval '
rs.initiate({
  _id: "shard2RS",
  members: [
    { _id: 0, host: "shard2-1:27018" },
    { _id: 1, host: "shard2-2:27018" }
  ]
})'

sleep 10

# 4. Add Shards to the Cluster
docker exec -it mongos mongosh --eval '
sh.addShard("shard1RS/shard1-1:27018,shard1-2:27018");
sh.addShard("shard2RS/shard2-1:27018,shard2-2:27018");
'

Shard Key Strategies

Shard key selection determines 80% of sharding performance.

1. Range Sharding

// Range-based sharding - advantageous for continuous data queries
use mydb;
sh.enableSharding("mydb");

// Using created_at as shard key
db.orders.createIndex({ created_at: 1 });
sh.shardCollection("mydb.orders", { created_at: 1 });

// Advantage: Date range queries are processed on a single shard
db.orders.find({
  created_at: {
    $gte: ISODate("2026-03-01"),
    $lt: ISODate("2026-04-01")
  }
});

// Disadvantage: Latest data concentrates on a single shard (hotspot)

2. Hashed Sharding

// Hash-based sharding - advantageous for even distribution
db.users.createIndex({ user_id: 'hashed' })
sh.shardCollection('mydb.users', { user_id: 'hashed' })

// Advantage: Writes are evenly distributed across all shards
// Disadvantage: Range queries scan all shards (scatter-gather)

// Pre-split chunks to prevent initial imbalance
sh.shardCollection('mydb.events', { event_id: 'hashed' }, false, {
  numInitialChunks: 64,
})

3. Compound Shard Key

// Compound shard key - the most recommended pattern
db.orders.createIndex({ customer_id: 1, order_date: 1 })
sh.shardCollection('mydb.orders', { customer_id: 1, order_date: 1 })

// customer_id ensures cardinality
// order_date optimizes range queries

// This query is targeted (single shard)
db.orders.find({
  customer_id: 'C12345',
  order_date: { $gte: ISODate('2026-01-01') },
})

4. Zone Sharding (Region-Based)

// Zone Sharding - ensures data locality
sh.addShardToZone('shard1RS', 'APAC')
sh.addShardToZone('shard2RS', 'EU')

// Set zone ranges
sh.updateZoneKeyRange(
  'mydb.users',
  { region: 'KR' },
  { region: 'KS' }, // Next string after KR
  'APAC'
)

sh.updateZoneKeyRange('mydb.users', { region: 'DE' }, { region: 'DF' }, 'EU')

// Korean user data is stored only on the APAC shard
// German user data is stored only on the EU shard

Shard Key Selection Guide

CriteriaGood Shard KeyBad Shard Key
CardinalityHigh (user_id)Low (status: active/inactive)
DistributionEven distributionSkewed (concentrated on specific values)
Query PatternFrequently in conditionsRarely used
MonotonicAvoid or use hashcreated_at (hotspot)
ChangeableMongoDB 5.0+ possibleNot changeable in earlier versions

Shard Key Change (MongoDB 5.0+)

// resharding - changing the shard key
db.adminCommand({
  reshardCollection: 'mydb.orders',
  key: { customer_id: 1, order_date: 1 },
})

// Check progress
db.getSiblingDB('admin').aggregate([
  { $currentOp: { allUsers: true, localOps: false } },
  { $match: { type: 'op', 'originatingCommand.reshardCollection': { $exists: true } } },
])

Chunk Management

Chunk Splitting and Migration

// Check chunk status
use config;
db.chunks.find({ ns: "mydb.orders" }).sort({ min: 1 });

// Check chunk count
db.chunks.countDocuments({ ns: "mydb.orders" });

// Chunk distribution per shard
db.chunks.aggregate([
  { $match: { ns: "mydb.orders" } },
  { $group: { _id: "$shard", count: { $sum: 1 } } }
]);

// Manual chunk split
sh.splitAt("mydb.orders", { customer_id: "C50000", order_date: ISODate("2026-01-01") });

// Manual chunk move
sh.moveChunk("mydb.orders",
  { customer_id: "C50000", order_date: ISODate("2026-01-01") },
  "shard2RS"
);

Balancer Management

// Check balancer status
sh.getBalancerState()
sh.isBalancerRunning()

// Limit balancer to run only outside business hours
db.settings.updateOne(
  { _id: 'balancer' },
  {
    $set: {
      activeWindow: { start: '02:00', stop: '06:00' },
    },
  },
  { upsert: true }
)

// Disable balancing for a specific collection (during migration)
sh.disableBalancing('mydb.orders')
// Re-enable
sh.enableBalancing('mydb.orders')

Understanding Query Routing

// Targeted Query - processed on a single shard (fast)
// Shard key is included in query conditions
db.orders.find({ customer_id: 'C12345' }).explain('executionStats')
// "winningPlan": { "stage": "SINGLE_SHARD" }

// Scatter-Gather Query - processed on all shards (slow)
// Shard key is NOT included in query conditions
db.orders.find({ product: 'iPhone' }).explain('executionStats')
// "winningPlan": { "stage": "SHARD_MERGE" }

// Broadcast Query - sent to all shards
db.orders.aggregate([{ $group: { _id: '$status', total: { $sum: '$amount' } } }])

Monitoring

// Full sharding status check
sh.status();
sh.status(true);  // Including detailed information

// Chunk migration history
use config;
db.changelog.find({ what: "moveChunk.commit" }).sort({ time: -1 }).limit(10);

// Currently ongoing migrations
db.locks.find({ _id: "balancer" });

// Data size per shard
db.adminCommand({ listDatabases: 1, nameOnly: false });

mongosh Utilities

// Collection statistics
db.orders.stats()
db.orders.stats().sharded // true means sharded

// Document count per shard
db.orders.getShardDistribution()
// Shard shard1RS: data 2.5GB, docs 5,000,000, chunks 32
// Shard shard2RS: data 2.3GB, docs 4,800,000, chunks 30

Operational Considerations

1. Adding a Shard

// Add a new shard (no service interruption)
sh.addShard('shard3RS/shard3-1:27018,shard3-2:27018')

// The balancer automatically moves chunks to the new shard
// It takes time to complete the migration

2. Removing a Shard

// Remove a shard (draining)
db.adminCommand({ removeShard: 'shard2RS' })

// Check progress (run repeatedly)
db.adminCommand({ removeShard: 'shard2RS' })
// "state": "ongoing", "remaining": { "chunks": 15, "dbs": 0 }

// Check repeatedly until complete

3. Backup Strategy

# Full cluster backup with mongodump (through mongos)
mongodump --host mongos:27017 --out /backup/$(date +%Y%m%d)

# Or backup each shard's Replica Set individually
mongodump --host shard1-1:27018 --out /backup/shard1/$(date +%Y%m%d)
mongodump --host shard2-1:27018 --out /backup/shard2/$(date +%Y%m%d)

Summary

StrategySuitable CasesConsiderations
RangeRange query-centricPossible hotspot occurrence
HashedNeed for even distributionInefficient range queries
CompoundVarious query patternsKey design requires care
ZoneNeed for data localityPossible uneven distribution

Shard key selection is a decision that is difficult to reverse, so make sure to thoroughly analyze your actual data and query patterns before deciding.


Quiz: MongoDB Sharding Comprehension Check (7 Questions)

Q1. What are the three components of a Sharded Cluster?

Shard (data storage), Config Server (metadata), mongos (router).

Q2. What is the hotspot problem in Range Sharding?

When using monotonically increasing keys (like created_at), the latest data always concentrates on the last shard.

Q3. What is the difference between Targeted Query and Scatter-Gather Query?

Targeted queries include the shard key in the query conditions and are processed on a single shard, while Scatter-Gather queries scan all shards.

Q4. Why set numInitialChunks?

In Hashed Sharding, pre-splitting initial chunks prevents imbalance during data loading.

Q5. What is the main use case for Zone Sharding?

Used when data residency regulations (GDPR, etc.) require storing data from specific regions only on specific shards.

Q6. Why is the balancer's activeWindow setting needed?

Chunk migration uses heavy I/O, so limiting it to non-business hours minimizes service impact.

Q7. What is the significance of the resharding feature added in MongoDB 5.0+?

It enables changing shard keys online, which was previously impossible.