Split View: MongoDB Sharding과 Replica Set 운영 가이드: 클러스터 설계부터 트러블슈팅까지

MongoDB Sharding과 Replica Set 운영 가이드: 클러스터 설계부터 트러블슈팅까지

들어가며
Sharding 아키텍처 개요
- Sharded Cluster 초기 구성
Shard Key 선택 전략
Replica Set 구성과 페일오버
- Replica Set 초기화
- 페일오버 메커니즘
Chunk 마이그레이션과 밸런서
- 밸런서 동작 원리
- 청크 관리
읽기/쓰기 관심사 설정
- Read Preference 모드 비교
- Write Concern 레벨 비교
모니터링과 성능 튜닝
- 핵심 모니터링 쿼리
- 성능 튜닝 체크리스트
백업과 복구
- mongodump/mongorestore를 이용한 백업
- 시점 복구(PITR)와 Oplog
트러블슈팅 사례
운영 체크리스트
참고자료

들어가며

MongoDB는 대규모 데이터 처리와 고가용성을 위해 Sharding과 Replica Set이라는 두 가지 핵심 분산 메커니즘을 제공한다. Sharding은 데이터를 여러 서버에 수평으로 분산하여 저장 용량과 처리량을 확장하고, Replica Set은 데이터를 복제하여 장애 발생 시 자동 페일오버를 보장한다. 프로덕션 환경에서 이 두 메커니즘을 올바르게 조합하는 것이 안정적인 MongoDB 운영의 핵심이다.

이 글에서는 Sharding 아키텍처의 구성 요소부터 Shard Key 선택 전략, Replica Set 페일오버 메커니즘, Chunk 마이그레이션과 밸런서 관리, 읽기/쓰기 관심사 설정, 모니터링과 성능 튜닝, 백업/복구, 그리고 실전 트러블슈팅 사례까지 프로덕션 클러스터 운영에 필요한 전체 지식을 다룬다.

Sharding 아키텍처 개요

MongoDB Sharded Cluster는 세 가지 구성 요소로 이루어진다.

mongos (Query Router): 클라이언트 요청을 적절한 샤드로 라우팅하는 프록시 역할을 한다. 상태를 가지지 않으므로(stateless) 여러 대를 배포하여 부하를 분산할 수 있다.

Config Server: 클러스터의 메타데이터(샤드 목록, 청크 범위, 데이터 분포 정보)를 저장하는 Replica Set이다. MongoDB 3.4 이후 반드시 Replica Set으로 구성해야 한다.

Shard: 실제 데이터를 저장하는 Replica Set이다. 각 샤드는 전체 데이터의 일부(청크)를 담당한다.

# Sharded Cluster 기본 구조 (최소 구성)
#
# Client
#   |
#   v
# mongos (Router) x 2  -- 로드 밸런서 뒤에 배치
#   |
#   v
# Config Server Replica Set (3 nodes)
#   |
#   +--- Shard 1 Replica Set (Primary + 2 Secondary)
#   +--- Shard 2 Replica Set (Primary + 2 Secondary)
#   +--- Shard 3 Replica Set (Primary + 2 Secondary)
#
# 총 노드 수: 2 (mongos) + 3 (config) + 9 (shard) = 14

Sharded Cluster 초기 구성

// mongos에 접속하여 Shard 추가
sh.addShard('shard1/mongo-shard1-a:27018,mongo-shard1-b:27018,mongo-shard1-c:27018')
sh.addShard('shard2/mongo-shard2-a:27018,mongo-shard2-b:27018,mongo-shard2-c:27018')
sh.addShard('shard3/mongo-shard3-a:27018,mongo-shard3-b:27018,mongo-shard3-c:27018')

// 데이터베이스에 Sharding 활성화
sh.enableSharding('myapp')

// 컬렉션에 Shard Key 지정 (Hashed)
sh.shardCollection('myapp.orders', { customerId: 'hashed' })

// 컬렉션에 Shard Key 지정 (Ranged)
sh.shardCollection('myapp.logs', { timestamp: 1 })

// 클러스터 상태 확인
sh.status()

Shard Key 선택 전략

Shard Key는 한 번 설정하면 변경이 매우 어렵기 때문에(MongoDB 5.0부터 resharding 지원) 설계 단계에서 신중하게 선택해야 한다. 좋은 Shard Key는 높은 카디널리티(cardinality), 낮은 빈도(frequency), 비단조(non-monotonic) 특성을 가져야 한다.

Shard Key 전략 비교표

전략	설명	장점	단점	적합한 사례
Hashed Sharding	필드 값의 해시로 분산	데이터 균등 분배, 핫스팟 방지	범위 쿼리 비효율, 모든 샤드 스캔 필요	고빈도 쓰기, 범위 쿼리 불필요
Ranged Sharding	필드 값의 범위로 분산	범위 쿼리 효율적, 특정 샤드만 접근	핫스팟 가능, 단조 증가 키 문제	시간 기반 로그, 범위 쿼리 빈번
Zone Sharding	지역/조건별 데이터 배치	데이터 로컬리티, 규정 준수	설정 복잡, 불균형 가능	멀티 리전, 데이터 주권 요구
Compound Key	복합 필드 조합	카디널리티 향상, 쿼리 최적화	설계 복잡도 증가	단일 필드로 부족할 때

Shard Key 설정 예제

// 1. Hashed Shard Key - 균등 분배가 최우선일 때
sh.shardCollection('myapp.users', { _id: 'hashed' })

// 2. Compound Shard Key - 카디널리티와 쿼리 패턴 동시 충족
sh.shardCollection('myapp.events', { tenantId: 1, createdAt: 1 })

// 3. Zone Sharding - 지역별 데이터 분리
sh.addShardToZone('shard1', 'KR')
sh.addShardToZone('shard2', 'US')
sh.addShardToZone('shard3', 'EU')

sh.updateZoneKeyRange(
  'myapp.users',
  { region: 'KR', _id: MinKey },
  { region: 'KR', _id: MaxKey },
  'KR'
)
sh.updateZoneKeyRange(
  'myapp.users',
  { region: 'US', _id: MinKey },
  { region: 'US', _id: MaxKey },
  'US'
)

// 4. MongoDB 8.0+ resharding - Shard Key 변경이 필요할 때
sh.reshardCollection('myapp.orders', { customerId: 1, orderId: 1 })

Shard Key 안티패턴

단조 증가하는 필드(예: ObjectId, timestamp)를 Ranged Shard Key로 사용하면 새 데이터가 항상 마지막 청크에 집중되어 핫 샤드(Hot Shard)가 발생한다. 이 경우 Hashed Sharding을 사용하거나 복합 키를 조합해야 한다. 또한 카디널리티가 너무 낮은 필드(예: boolean, status 같은 열거형)는 데이터를 고르게 분산할 수 없어 점보 청크(Jumbo Chunk)를 유발할 수 있다.

Replica Set 구성과 페일오버

Replica Set 초기화

// Primary에서 Replica Set 초기화
rs.initiate({
  _id: 'shard1',
  members: [
    { _id: 0, host: 'mongo-shard1-a:27018', priority: 10 },
    { _id: 1, host: 'mongo-shard1-b:27018', priority: 5 },
    { _id: 2, host: 'mongo-shard1-c:27018', priority: 1 },
  ],
  settings: {
    electionTimeoutMillis: 10000, // 기본 10초
    heartbeatTimeoutSecs: 10, // 하트비트 타임아웃
    chainingAllowed: true, // 세컨더리 간 체이닝 허용
  },
})

// Replica Set 상태 확인
rs.status()

// 복제 지연 확인
rs.printReplicationInfo()
rs.printSecondaryReplicationInfo()

페일오버 메커니즘

MongoDB Replica Set의 페일오버는 Raft 기반 합의 알고리즘으로 동작한다. Primary가 응답하지 않으면 나머지 멤버가 선거를 시작하고, 과반수 이상의 투표를 얻은 멤버가 새 Primary로 선출된다.

선거 조건:

과반수(majority) 멤버가 서로 통신 가능해야 한다
priority가 0인 멤버는 Primary가 될 수 없다
hidden 멤버나 delayed 멤버는 투표만 가능하고 Primary 후보에서 제외된다
electionTimeoutMillis(기본 10초) 이내에 Primary와 통신이 안 되면 선거 시작

// 수동 페일오버 실행 (유지보수 시)
rs.stepDown(60) // 60초 동안 Primary 역할 포기

// 특정 멤버를 Primary로 승격시키려면 priority 조정
cfg = rs.conf()
cfg.members[1].priority = 100
rs.reconfig(cfg)

// Hidden 멤버 설정 (백업 전용)
cfg = rs.conf()
cfg.members[2].hidden = true
cfg.members[2].priority = 0
rs.reconfig(cfg)

// Delayed 멤버 설정 (데이터 복구용, 1시간 지연)
cfg = rs.conf()
cfg.members[2].secondaryDelaySecs = 3600
cfg.members[2].hidden = true
cfg.members[2].priority = 0
rs.reconfig(cfg)

Chunk 마이그레이션과 밸런서

밸런서 동작 원리

밸런서는 Config Server의 Primary에서 실행되며, 샤드 간 청크 수 차이가 임계값(migration threshold)을 초과하면 청크를 이동시킨다. n개의 샤드가 있는 클러스터에서 최대 n/2개의 동시 마이그레이션이 가능하다.

// 밸런서 상태 확인
sh.getBalancerState()
sh.isBalancerRunning()

// 밸런서 중지/시작
sh.stopBalancer()
sh.startBalancer()

// 밸런서 시간 윈도우 설정 (새벽 2-6시에만 동작)
db.adminCommand({
  balancerStart: 1,
  condition: {
    activeWindow: { start: '02:00', stop: '06:00' },
  },
})

// 특정 컬렉션 밸런싱 비활성화
sh.disableBalancing('myapp.orders')

// 청크 수동 이동
sh.moveChunk('myapp.orders', { customerId: 'user123' }, 'shard3')

// 현재 마이그레이션 상태 확인
db.adminCommand({ currentOp: true, desc: /migrate/ })

청크 관리

// 청크 크기 변경 (기본 128MB, MongoDB 7.0+)
use config
db.settings.updateOne(
  { _id: "chunksize" },
  { $set: { value: 64 } },
  { upsert: true }
)

// 점보 청크 확인
db.chunks.find({ jumbo: true }).forEach(function(chunk) {
  print("Jumbo chunk: " + chunk.ns + " on " + chunk.shard)
})

// 점보 청크 강제 분할 시도
sh.splitAt("myapp.orders", { customerId: "splitPoint" })

// 청크 분포 확인
db.chunks.aggregate([
  { $group: { _id: "$shard", count: { $sum: 1 } } },
  { $sort: { count: -1 } }
])

읽기/쓰기 관심사 설정

Read Preference 모드 비교

모드	읽기 대상	일관성	지연 시간	적합한 사례
primary	Primary만	강한 일관성	기준값	실시간 데이터 필요
primaryPreferred	Primary 우선, 불가 시 Secondary	대부분 강한 일관성	기준값	Primary 장애 대비
secondary	Secondary만	최종 일관성	낮음 (가까운 노드)	보고서, 분석 쿼리
secondaryPreferred	Secondary 우선, 불가 시 Primary	대부분 최종 일관성	낮음	읽기 부하 분산
nearest	네트워크 지연 최소 노드	최종 일관성	최소	지역 분산 배포

Write Concern 레벨 비교

Write Concern	설명	내구성	성능	적합한 사례
w: 0	응답 대기 없음	매우 낮음	최고	로그, 메트릭 (유실 가능)
w: 1	Primary 확인만	보통	높음	일반 쓰기
w: "majority"	과반수 확인	높음	보통	중요 데이터 (권장 기본값)
w: N	N개 노드 확인	매우 높음	낮음	금융 등 엄격한 요구
j: true	저널 기록 확인	최고	낮음	내구성 최우선

// 읽기/쓰기 관심사 설정 예제

// 1. 컬렉션 수준에서 Read Preference 설정
db.orders.find({ status: 'pending' }).readPref('secondaryPreferred')

// 2. 연결 문자열에서 설정
// mongodb://mongos1:27017,mongos2:27017/myapp?readPreference=secondaryPreferred&w=majority

// 3. Write Concern 지정
db.orders.insertOne(
  { customerId: 'user123', amount: 50000 },
  { writeConcern: { w: 'majority', j: true, wtimeout: 5000 } }
)

// 4. 전역 기본 Write Concern 설정
db.adminCommand({
  setDefaultRWConcern: 1,
  defaultWriteConcern: { w: 'majority', j: true },
  defaultReadConcern: { level: 'majority' },
})

모니터링과 성능 튜닝

핵심 모니터링 쿼리

// 1. 샤드별 데이터 분포 확인
db.orders.getShardDistribution()

// 2. 실행 중인 오퍼레이션 확인
db.currentOp({ secs_running: { $gt: 10 } })

// 3. 슬로우 쿼리 프로파일링 활성화
db.setProfilingLevel(1, { slowms: 100 })

// 4. 슬로우 쿼리 분석
db.system.profile
  .find({
    millis: { $gt: 100 },
    ns: /myapp\.orders/,
  })
  .sort({ ts: -1 })
  .limit(10)

// 5. 서버 상태 확인
db.serverStatus().opcounters // 연산 카운터
db.serverStatus().connections // 연결 수
db.serverStatus().wiredTiger.cache // 캐시 상태

// 6. Replica Set 복제 지연 확인
db.adminCommand({ replSetGetStatus: 1 }).members.forEach(function (m) {
  if (m.stateStr === 'SECONDARY') {
    var lag = (new Date() - m.optimeDate) / 1000
    print(m.name + ' lag: ' + lag + 's')
  }
})

// 7. 인덱스 사용 통계
db.orders.aggregate([{ $indexStats: {} }])

// 8. Config Server 메타데이터 일관성 검증 (MongoDB 7.0+)
db.adminCommand({ checkMetadataConsistency: 1 })

성능 튜닝 체크리스트

인덱스 커버링: Shard Key를 포함하는 복합 인덱스로 쿼리가 단일 샤드에서 처리되도록 한다
Scatter-Gather 최소화: Shard Key를 쿼리 필터에 포함하여 특정 샤드만 대상으로 쿼리한다
Connection Pooling: mongos 연결 풀 크기를 적절히 설정한다 (기본 maxPoolSize=100)
Read Preference 활용: 분석 쿼리는 Secondary로 분산한다
청크 크기 조정: 쓰기가 많은 워크로드는 청크 크기를 줄여 마이그레이션 영향을 최소화한다
WiredTiger 캐시: cacheSizeGB를 시스템 RAM의 50-60%로 설정한다

백업과 복구

mongodump/mongorestore를 이용한 백업

# 1. 백업 전 밸런서 중지 (샤드 클러스터)
mongosh --host mongos1:27017 --eval "sh.stopBalancer()"

# 2. 개별 샤드 백업 (--oplog로 시점 일관성 확보)
mongodump --host shard1/mongo-shard1-a:27018 \
  --oplog --out /backup/shard1-$(date +%Y%m%d)

mongodump --host shard2/mongo-shard2-a:27018 \
  --oplog --out /backup/shard2-$(date +%Y%m%d)

# 3. Config Server 백업
mongodump --host configRS/config1:27019 \
  --oplog --out /backup/config-$(date +%Y%m%d)

# 4. 밸런서 재시작
mongosh --host mongos1:27017 --eval "sh.startBalancer()"

# 5. 복원 (--oplogReplay로 시점 복구)
mongorestore --host shard1/mongo-shard1-a:27018 \
  --oplogReplay /backup/shard1-20260313

시점 복구(PITR)와 Oplog

// Oplog 크기 및 보관 시간 확인
db.getReplicationInfo()
// 출력 예시:
// configured oplog size:   2048MB
// log length start to end: 172800secs (48hrs)
// oplog first event time:  ...
// oplog last event time:   ...

// Oplog 크기 변경 (최소 48시간 이상 보관 권장)
db.adminCommand({ replSetResizeOplog: 1, size: 4096 }) // 4GB

프로덕션 백업 권장사항:

소규모 클러스터는 mongodump로 충분하지만, 대규모 환경에서는 파일시스템 스냅샷(LVM/EBS) 또는 MongoDB Atlas Backup 사용을 권장한다
Oplog 크기는 최소 48시간 이상의 쓰기를 보관할 수 있도록 설정한다
Delayed Secondary 멤버를 활용하면 논리적 오류(실수로 데이터 삭제 등)로부터 빠르게 복구할 수 있다
백업 복원 테스트를 정기적으로 수행한다 (최소 월 1회)

트러블슈팅 사례

1. Shard Key 핫스팟

증상: 특정 샤드의 CPU/디스크 I/O만 높고, 나머지 샤드는 유휴 상태

원인: 단조 증가 필드(ObjectId, timestamp)를 Ranged Shard Key로 사용하여 모든 새 쓰기가 마지막 청크에 집중

해결:

Hashed Shard Key로 resharding 수행 (MongoDB 5.0+)
복합 Shard Key를 사용하여 카디널리티를 높임
MongoDB 8.0의 sh.shardAndDistributeCollection()으로 빠른 재분배

2. Split-brain (네트워크 파티션)

증상: 두 개 이상의 노드가 동시에 Primary로 동작하여 데이터 불일치 발생 가능

원인: 네트워크 파티션으로 Replica Set이 두 그룹으로 분리되고, 각 그룹이 별도로 Primary를 선출

해결:

Replica Set 멤버를 홀수(3, 5, 7)로 유지하여 항상 과반수 그룹이 하나만 존재하도록 구성
네트워크 파티션 시 과반수에 속하지 못한 Primary는 자동으로 Secondary로 강등됨
w: "majority" Write Concern을 사용하여 과반수에 복제되지 않은 쓰기를 방지

3. Chunk 마이그레이션 실패

증상: 밸런서 로그에 "Data transfer error" 또는 "WriteConcernFailed" 오류

원인: 복제 지연이 심하거나 네트워크 대역폭 부족, 대상 샤드의 디스크 공간 부족

해결:

// 마이그레이션 속도 제한으로 복제 부하 완화
db.adminCommand({
  configureFailPoint: "moveChunkHangAtStep4",
  mode: "off"
})

// _secondaryThrottle 활성화로 복제 동기화 보장
use config
db.settings.updateOne(
  { _id: "balancer" },
  { $set: { "_secondaryThrottle": { w: 2 } } },
  { upsert: true }
)

// rangeDeleter 배치 크기 줄이기
db.adminCommand({
  setParameter: 1,
  rangeDeleterBatchSize: 32,
  rangeDeleterBatchDelayMS: 100
})

4. Replica Lag (복제 지연)

증상: Secondary의 optimeDate가 Primary보다 수십 초 이상 뒤처짐

원인: 대량 쓰기 부하, 느린 디스크 I/O, 인덱스 빌드, 네트워크 병목

해결:

WiredTiger 캐시 크기를 확인하고 필요 시 증가
Secondary에서 실행 중인 무거운 쿼리를 확인하고 중단
네트워크 대역폭을 확인하고 Oplog 전송 경로 최적화
인덱스 빌드는 Rolling 방식(한 노드씩 순차적)으로 수행

운영 체크리스트

배포 전:

Shard Key를 쿼리 패턴, 카디널리티, 쓰기 분포를 기반으로 선택했는가
각 Replica Set이 최소 3멤버(홀수)로 구성되어 있는가
Config Server가 별도 Replica Set(3노드)으로 구성되어 있는가
mongos가 2대 이상이고 로드 밸런서 뒤에 있는가
인증(keyFile 또는 x.509)과 네트워크 암호화(TLS)가 설정되어 있는가

일상 운영:

밸런서가 정상 동작하고 청크가 균등 분포되어 있는가
Replica Lag이 임계값(예: 10초) 이내인가
커넥션 수가 maxIncomingConnections 한도 대비 적절한가
슬로우 쿼리 로그를 정기적으로 분석하고 있는가
디스크 사용률이 70% 이하인가

백업/복구:

자동 백업이 일일 단위로 실행되고 있는가
Oplog 크기가 최소 48시간 이상의 데이터를 보관하는가
백업 복원 테스트를 최근 30일 이내에 수행했는가
Delayed Secondary가 구성되어 논리적 오류에 대비하고 있는가

장애 대비:

자동 페일오버 테스트를 최근 분기 내에 수행했는가
모니터링 알림이 Replica Lag, 디스크 사용률, 연결 수에 대해 설정되어 있는가
Split-brain 시나리오에 대한 복구 절차가 문서화되어 있는가

참고자료

MongoDB Sharding and Replica Set Operations Guide: From Cluster Design to Troubleshooting

Introduction
Sharding Architecture Overview
- Initial Sharded Cluster Setup
Shard Key Selection Strategies
Replica Set Configuration and Failover
- Replica Set Initialization
- Failover Mechanism
Chunk Migration and Balancer
- How the Balancer Works
- Chunk Management
Read/Write Concern Configuration
- Read Preference Mode Comparison
- Write Concern Level Comparison
Monitoring and Performance Tuning
- Essential Monitoring Queries
- Performance Tuning Checklist
Backup and Recovery
- Backup with mongodump/mongorestore
- Point-in-Time Recovery (PITR) and Oplog
Troubleshooting Scenarios
Operations Checklist
References

Introduction

MongoDB provides two core distributed mechanisms for large-scale data processing and high availability: Sharding and Replica Sets. Sharding horizontally distributes data across multiple servers to scale storage capacity and throughput, while Replica Sets replicate data to ensure automatic failover upon failure. Properly combining these two mechanisms in production is the foundation of reliable MongoDB operations.

This guide covers the full spectrum of production cluster operations: from Sharding architecture components and Shard Key selection strategies, through Replica Set failover mechanisms, chunk migration and balancer management, read/write concern configuration, monitoring and performance tuning, backup/recovery, and real-world troubleshooting scenarios.

Sharding Architecture Overview

A MongoDB Sharded Cluster consists of three components.

mongos (Query Router): Acts as a proxy that routes client requests to the appropriate shard. Being stateless, multiple mongos instances can be deployed behind a load balancer.

Config Server: A Replica Set that stores cluster metadata (shard list, chunk ranges, data distribution information). Since MongoDB 3.4, Config Servers must be deployed as a Replica Set.

Shard: A Replica Set that stores actual data. Each shard is responsible for a subset (chunks) of the total data.

# Sharded Cluster Architecture (Minimum Configuration)
#
# Client
#   |
#   v
# mongos (Router) x 2  -- behind a load balancer
#   |
#   v
# Config Server Replica Set (3 nodes)
#   |
#   +--- Shard 1 Replica Set (Primary + 2 Secondary)
#   +--- Shard 2 Replica Set (Primary + 2 Secondary)
#   +--- Shard 3 Replica Set (Primary + 2 Secondary)
#
# Total nodes: 2 (mongos) + 3 (config) + 9 (shard) = 14

Initial Sharded Cluster Setup

// Connect to mongos and add shards
sh.addShard('shard1/mongo-shard1-a:27018,mongo-shard1-b:27018,mongo-shard1-c:27018')
sh.addShard('shard2/mongo-shard2-a:27018,mongo-shard2-b:27018,mongo-shard2-c:27018')
sh.addShard('shard3/mongo-shard3-a:27018,mongo-shard3-b:27018,mongo-shard3-c:27018')

// Enable sharding for the database
sh.enableSharding('myapp')

// Shard a collection with a Hashed key
sh.shardCollection('myapp.orders', { customerId: 'hashed' })

// Shard a collection with a Ranged key
sh.shardCollection('myapp.logs', { timestamp: 1 })

// Check cluster status
sh.status()

Shard Key Selection Strategies

The Shard Key is extremely difficult to change once set (resharding is supported from MongoDB 5.0 onward), so it must be chosen carefully during the design phase. A good Shard Key should have high cardinality, low frequency, and non-monotonic characteristics.

Shard Key Strategy Comparison

Strategy	Description	Pros	Cons	Ideal Use Case
Hashed Sharding	Distributes by hash of field value	Even distribution, prevents hotspots	Inefficient range queries, requires all-shard scans	High-frequency writes, no range queries needed
Ranged Sharding	Distributes by value range	Efficient range queries, targets specific shards	Possible hotspots, monotonic key issues	Time-based logs, frequent range queries
Zone Sharding	Places data by region/condition	Data locality, regulatory compliance	Complex setup, potential imbalance	Multi-region, data sovereignty requirements
Compound Key	Combination of multiple fields	Higher cardinality, query optimization	Increased design complexity	When a single field is insufficient

Shard Key Configuration Examples

// 1. Hashed Shard Key - when even distribution is the top priority
sh.shardCollection('myapp.users', { _id: 'hashed' })

// 2. Compound Shard Key - satisfying cardinality and query patterns
sh.shardCollection('myapp.events', { tenantId: 1, createdAt: 1 })

// 3. Zone Sharding - data separation by region
sh.addShardToZone('shard1', 'KR')
sh.addShardToZone('shard2', 'US')
sh.addShardToZone('shard3', 'EU')

sh.updateZoneKeyRange(
  'myapp.users',
  { region: 'KR', _id: MinKey },
  { region: 'KR', _id: MaxKey },
  'KR'
)
sh.updateZoneKeyRange(
  'myapp.users',
  { region: 'US', _id: MinKey },
  { region: 'US', _id: MaxKey },
  'US'
)

// 4. MongoDB 8.0+ resharding - when shard key change is needed
sh.reshardCollection('myapp.orders', { customerId: 1, orderId: 1 })

Shard Key Anti-Patterns

Using a monotonically increasing field (e.g., ObjectId, timestamp) as a Ranged Shard Key causes all new writes to concentrate on the last chunk, creating a Hot Shard. In this case, use Hashed Sharding or combine with a compound key. Fields with very low cardinality (e.g., boolean, status enums) cannot distribute data evenly and may cause Jumbo Chunks.

Replica Set Configuration and Failover

Replica Set Initialization

// Initialize Replica Set on the Primary
rs.initiate({
  _id: 'shard1',
  members: [
    { _id: 0, host: 'mongo-shard1-a:27018', priority: 10 },
    { _id: 1, host: 'mongo-shard1-b:27018', priority: 5 },
    { _id: 2, host: 'mongo-shard1-c:27018', priority: 1 },
  ],
  settings: {
    electionTimeoutMillis: 10000, // default 10 seconds
    heartbeatTimeoutSecs: 10, // heartbeat timeout
    chainingAllowed: true, // allow secondary-to-secondary chaining
  },
})

// Check Replica Set status
rs.status()

// Check replication lag
rs.printReplicationInfo()
rs.printSecondaryReplicationInfo()

Failover Mechanism

MongoDB Replica Set failover operates on a Raft-based consensus algorithm. When the Primary becomes unresponsive, the remaining members initiate an election, and the member that receives a majority of votes becomes the new Primary.

Election conditions:

A majority of members must be able to communicate with each other
Members with priority 0 cannot become Primary
Hidden and delayed members can only vote but are excluded from Primary candidacy
An election starts if the Primary is unreachable within electionTimeoutMillis (default 10 seconds)

// Manual failover (for maintenance)
rs.stepDown(60) // relinquish Primary role for 60 seconds

// Promote a specific member by adjusting priority
cfg = rs.conf()
cfg.members[1].priority = 100
rs.reconfig(cfg)

// Hidden member setup (for backup)
cfg = rs.conf()
cfg.members[2].hidden = true
cfg.members[2].priority = 0
rs.reconfig(cfg)

// Delayed member setup (for data recovery, 1-hour delay)
cfg = rs.conf()
cfg.members[2].secondaryDelaySecs = 3600
cfg.members[2].hidden = true
cfg.members[2].priority = 0
rs.reconfig(cfg)

Chunk Migration and Balancer

How the Balancer Works

The balancer runs on the Config Server Primary and moves chunks when the chunk count difference between shards exceeds the migration threshold. In a cluster with n shards, a maximum of n/2 concurrent migrations are possible.

// Check balancer status
sh.getBalancerState()
sh.isBalancerRunning()

// Stop/start balancer
sh.stopBalancer()
sh.startBalancer()

// Set balancer time window (run only between 2-6 AM)
db.adminCommand({
  balancerStart: 1,
  condition: {
    activeWindow: { start: '02:00', stop: '06:00' },
  },
})

// Disable balancing for a specific collection
sh.disableBalancing('myapp.orders')

// Manual chunk move
sh.moveChunk('myapp.orders', { customerId: 'user123' }, 'shard3')

// Check current migration status
db.adminCommand({ currentOp: true, desc: /migrate/ })

Chunk Management

// Change chunk size (default 128MB in MongoDB 7.0+)
use config
db.settings.updateOne(
  { _id: "chunksize" },
  { $set: { value: 64 } },
  { upsert: true }
)

// Find jumbo chunks
db.chunks.find({ jumbo: true }).forEach(function(chunk) {
  print("Jumbo chunk: " + chunk.ns + " on " + chunk.shard)
})

// Force split a jumbo chunk
sh.splitAt("myapp.orders", { customerId: "splitPoint" })

// Check chunk distribution
db.chunks.aggregate([
  { $group: { _id: "$shard", count: { $sum: 1 } } },
  { $sort: { count: -1 } }
])

Read/Write Concern Configuration

Read Preference Mode Comparison

Mode	Read Target	Consistency	Latency	Ideal Use Case
primary	Primary only	Strong consistency	Baseline	Real-time data required
primaryPreferred	Primary first, fallback to Secondary	Mostly strong	Baseline	Primary failure fallback
secondary	Secondary only	Eventual consistency	Low (nearest node)	Reports, analytics queries
secondaryPreferred	Secondary first, fallback to Primary	Mostly eventual	Low	Read load distribution
nearest	Lowest network latency node	Eventual consistency	Minimum	Geo-distributed deployments

Write Concern Level Comparison

Write Concern	Description	Durability	Performance	Ideal Use Case
w: 0	No acknowledgment	Very low	Highest	Logs, metrics (loss acceptable)
w: 1	Primary acknowledgment only	Moderate	High	General writes
w: "majority"	Majority acknowledgment	High	Moderate	Critical data (recommended default)
w: N	N node acknowledgment	Very high	Low	Finance, strict requirements
j: true	Journal write confirmation	Highest	Low	Durability first

// Read/Write Concern configuration examples

// 1. Collection-level Read Preference
db.orders.find({ status: 'pending' }).readPref('secondaryPreferred')

// 2. Connection string configuration
// mongodb://mongos1:27017,mongos2:27017/myapp?readPreference=secondaryPreferred&w=majority

// 3. Write Concern specification
db.orders.insertOne(
  { customerId: 'user123', amount: 50000 },
  { writeConcern: { w: 'majority', j: true, wtimeout: 5000 } }
)

// 4. Set global default Read/Write Concern
db.adminCommand({
  setDefaultRWConcern: 1,
  defaultWriteConcern: { w: 'majority', j: true },
  defaultReadConcern: { level: 'majority' },
})

Monitoring and Performance Tuning

Essential Monitoring Queries

// 1. Check shard data distribution
db.orders.getShardDistribution()

// 2. Find long-running operations
db.currentOp({ secs_running: { $gt: 10 } })

// 3. Enable slow query profiling
db.setProfilingLevel(1, { slowms: 100 })

// 4. Analyze slow queries
db.system.profile
  .find({
    millis: { $gt: 100 },
    ns: /myapp\.orders/,
  })
  .sort({ ts: -1 })
  .limit(10)

// 5. Server status overview
db.serverStatus().opcounters // operation counters
db.serverStatus().connections // connection count
db.serverStatus().wiredTiger.cache // cache status

// 6. Check Replica Set replication lag
db.adminCommand({ replSetGetStatus: 1 }).members.forEach(function (m) {
  if (m.stateStr === 'SECONDARY') {
    var lag = (new Date() - m.optimeDate) / 1000
    print(m.name + ' lag: ' + lag + 's')
  }
})

// 7. Index usage statistics
db.orders.aggregate([{ $indexStats: {} }])

// 8. Config Server metadata consistency check (MongoDB 7.0+)
db.adminCommand({ checkMetadataConsistency: 1 })

Performance Tuning Checklist

Index Covering: Use compound indexes that include the Shard Key so queries are processed on a single shard
Minimize Scatter-Gather: Include the Shard Key in query filters to target specific shards
Connection Pooling: Set appropriate mongos connection pool sizes (default maxPoolSize=100)
Read Preference: Route analytics queries to Secondary nodes
Chunk Size Adjustment: For write-heavy workloads, reduce chunk size to minimize migration impact
WiredTiger Cache: Set cacheSizeGB to 50-60% of system RAM

Backup and Recovery

Backup with mongodump/mongorestore

# 1. Stop balancer before backup (sharded cluster)
mongosh --host mongos1:27017 --eval "sh.stopBalancer()"

# 2. Back up individual shards (--oplog for point-in-time consistency)
mongodump --host shard1/mongo-shard1-a:27018 \
  --oplog --out /backup/shard1-$(date +%Y%m%d)

mongodump --host shard2/mongo-shard2-a:27018 \
  --oplog --out /backup/shard2-$(date +%Y%m%d)

# 3. Back up Config Server
mongodump --host configRS/config1:27019 \
  --oplog --out /backup/config-$(date +%Y%m%d)

# 4. Restart balancer
mongosh --host mongos1:27017 --eval "sh.startBalancer()"

# 5. Restore (--oplogReplay for point-in-time recovery)
mongorestore --host shard1/mongo-shard1-a:27018 \
  --oplogReplay /backup/shard1-20260313

Point-in-Time Recovery (PITR) and Oplog

// Check oplog size and retention
db.getReplicationInfo()
// Example output:
// configured oplog size:   2048MB
// log length start to end: 172800secs (48hrs)
// oplog first event time:  ...
// oplog last event time:   ...

// Resize oplog (recommend minimum 48 hours of retention)
db.adminCommand({ replSetResizeOplog: 1, size: 4096 }) // 4GB

Production backup recommendations:

For small clusters, mongodump is sufficient; for large-scale environments, use filesystem snapshots (LVM/EBS) or MongoDB Atlas Backup
Configure oplog size to retain at least 48 hours of writes
Use Delayed Secondary members for quick recovery from logical errors (accidental data deletion)
Perform backup restoration tests regularly (at least monthly)

Troubleshooting Scenarios

1. Shard Key Hotspot

Symptoms: Only one shard shows high CPU/disk I/O while the rest remain idle

Cause: Using a monotonically increasing field (ObjectId, timestamp) as a Ranged Shard Key, concentrating all new writes on the last chunk

Resolution:

Perform resharding with a Hashed Shard Key (MongoDB 5.0+)
Use a Compound Shard Key to increase cardinality
Use sh.shardAndDistributeCollection() in MongoDB 8.0 for fast redistribution

2. Split-brain (Network Partition)

Symptoms: Two or more nodes acting as Primary simultaneously, risking data inconsistency

Cause: Network partition splits the Replica Set into two groups, each electing its own Primary

Resolution:

Maintain an odd number (3, 5, 7) of Replica Set members so only one majority group can exist
During a network partition, the Primary outside the majority group is automatically demoted to Secondary
Use w: "majority" Write Concern to prevent writes not replicated to the majority

3. Chunk Migration Failure

Symptoms: Balancer logs show "Data transfer error" or "WriteConcernFailed" errors

Cause: Severe replication lag, insufficient network bandwidth, or lack of disk space on the target shard

Resolution:

// Enable _secondaryThrottle to ensure replication sync
use config
db.settings.updateOne(
  { _id: "balancer" },
  { $set: { "_secondaryThrottle": { w: 2 } } },
  { upsert: true }
)

// Reduce rangeDeleter batch size
db.adminCommand({
  setParameter: 1,
  rangeDeleterBatchSize: 32,
  rangeDeleterBatchDelayMS: 100
})

4. Replica Lag

Symptoms: Secondary optimeDate falls tens of seconds or more behind the Primary

Cause: Heavy write load, slow disk I/O, index builds, network bottlenecks

Resolution:

Check and increase WiredTiger cache size if needed
Identify and terminate heavy queries running on Secondaries
Verify network bandwidth and optimize oplog transfer paths
Perform index builds in a rolling fashion (one node at a time)

Operations Checklist

Pre-deployment:

Was the Shard Key chosen based on query patterns, cardinality, and write distribution?
Is each Replica Set configured with at least 3 members (odd number)?
Is the Config Server deployed as a separate Replica Set (3 nodes)?
Are there 2+ mongos instances behind a load balancer?
Are authentication (keyFile or x.509) and network encryption (TLS) configured?

Daily operations:

Is the balancer running normally with even chunk distribution?
Is Replica Lag within threshold (e.g., 10 seconds)?
Is the connection count appropriate relative to maxIncomingConnections?
Are slow query logs analyzed regularly?
Is disk usage below 70%?

Backup/Recovery:

Is automated backup running daily?
Does the oplog size retain at least 48 hours of data?
Was a backup restoration test performed within the last 30 days?
Is a Delayed Secondary configured for logical error protection?

Disaster preparedness:

Was an automatic failover test performed within the last quarter?
Are monitoring alerts set for Replica Lag, disk usage, and connection count?
Is a recovery procedure for split-brain scenarios documented?

MongoDB Sharding과 Replica Set 운영 가이드: 클러스터 설계부터 트러블슈팅까지

들어가며

Sharding 아키텍처 개요

Sharded Cluster 초기 구성

Shard Key 선택 전략

Shard Key 전략 비교표

Shard Key 설정 예제

Shard Key 안티패턴

Replica Set 구성과 페일오버

Replica Set 초기화

페일오버 메커니즘

Chunk 마이그레이션과 밸런서

밸런서 동작 원리

청크 관리

읽기/쓰기 관심사 설정

Read Preference 모드 비교

Write Concern 레벨 비교

모니터링과 성능 튜닝

핵심 모니터링 쿼리

성능 튜닝 체크리스트

백업과 복구

mongodump/mongorestore를 이용한 백업

시점 복구(PITR)와 Oplog

트러블슈팅 사례

1. Shard Key 핫스팟

2. Split-brain (네트워크 파티션)

3. Chunk 마이그레이션 실패

4. Replica Lag (복제 지연)

운영 체크리스트

참고자료

MongoDB Sharding and Replica Set Operations Guide: From Cluster Design to Troubleshooting

Introduction

Sharding Architecture Overview

Initial Sharded Cluster Setup

Shard Key Selection Strategies

Shard Key Strategy Comparison

Shard Key Configuration Examples

Shard Key Anti-Patterns

Replica Set Configuration and Failover

Replica Set Initialization

Failover Mechanism

Chunk Migration and Balancer

How the Balancer Works

Chunk Management

Read/Write Concern Configuration

Read Preference Mode Comparison

Write Concern Level Comparison

Monitoring and Performance Tuning

Essential Monitoring Queries

Performance Tuning Checklist

Backup and Recovery

Backup with mongodump/mongorestore

Point-in-Time Recovery (PITR) and Oplog

Troubleshooting Scenarios

1. Shard Key Hotspot

2. Split-brain (Network Partition)

3. Chunk Migration Failure

4. Replica Lag

Operations Checklist

References