Split View: MySQL 8.4 InnoDB 클러스터 운영과 쿼리 최적화 핸드북

MySQL 8.4 InnoDB 클러스터 운영과 쿼리 최적화 핸드북

MySQL 8.4 LTS가 의미하는 것
- 주요 변경점 요약
InnoDB 클러스터 아키텍처
- InnoDB Cluster vs NDB Cluster 비교
InnoDB 클러스터 구축 실전
클러스터 운영과 모니터링
장애 시나리오와 복구 절차
쿼리 최적화 전략
InnoDB 버퍼 풀 튜닝
운영 체크리스트
성능 벤치마크 참고값
마무리
References

MySQL 8.4 LTS가 의미하는 것

MySQL 8.4는 Oracle이 공식 지정한 Long-Term Support(LTS) 릴리스다. 2024년 4월 GA 이후 2025~2026년에 걸쳐 8.4.4부터 8.4.8까지 꾸준히 패치가 이어지고 있으며, MySQL 8.0은 2026년 EOL을 앞두고 있다. 운영 환경에서 8.4로의 마이그레이션은 더 이상 선택이 아니라 필수다.

8.4에서 달라진 핵심 사항을 먼저 정리한다.

주요 변경점 요약

mysql_native_password 기본 비활성화: 8.4.0부터 기본적으로 로드되지 않는다. caching_sha2_password가 기본 인증 플러그인이다. 레거시 앱이 있다면 --mysql-native-password=ON 옵션으로 명시적으로 활성화해야 한다.
MASTER/SLAVE 용어 완전 제거: CHANGE MASTER TO, START SLAVE, SHOW SLAVE STATUS 등이 문법 오류를 발생시킨다. CHANGE REPLICATION SOURCE TO, START REPLICA, SHOW REPLICA STATUS로 전환해야 한다.
mysqlpump 제거: mysqldump 또는 MySQL Shell의 dump 유틸리티를 사용한다.
InnoDB Adaptive Hash Index(AHI) 기본 비활성화: 워크로드에 따라 성능 변화가 있을 수 있다. 필요하면 innodb_adaptive_hash_index=ON으로 직접 활성화한다.
InnoDB Change Buffer 기본 비활성화: SSD 기반 스토리지에서는 비활성화가 오히려 유리하다.
자동 히스토그램 업데이트: 옵티마이저 통계의 정확도를 높여 쿼리 실행 계획을 개선한다. 서버 재시작 후에도 히스토그램 메타데이터가 유지된다.
restrict_fk_on_non_standard_key=ON 기본값: 비고유 또는 부분 키를 FK로 사용하는 것이 기본적으로 차단된다.

InnoDB 클러스터 아키텍처

InnoDB 클러스터는 세 가지 핵심 컴포넌트로 구성된다.

MySQL Group Replication: Paxos 기반 합의 프로토콜을 사용하는 동기 복제 레이어. 데이터 일관성과 자동 페일오버를 보장한다.
MySQL Shell (AdminAPI): 클러스터의 생성, 노드 추가/제거, 상태 모니터링을 위한 관리 인터페이스.
MySQL Router: 애플리케이션 트래픽을 적절한 노드로 라우팅하는 미들웨어. Primary로 쓰기를, Secondary로 읽기를 분산한다.

┌─────────────────────────────────────────────────┐
│                  Application                     │
└──────────────────────┬──────────────────────────┘
                       │
              ┌────────▼────────┐
              │  MySQL Router   │
              │  (R/W split)    │
              └───┬────────┬────┘
         Write    │        │   Read
         ┌────────▼──┐  ┌──▼────────┐
         │  Primary   │  │ Secondary │
         │  (node-1)  │  │ (node-2)  │
         └────────────┘  └───────────┘
                  │           │
              ┌───▼───────────▼───┐
              │    Secondary      │
              │    (node-3)       │
              └───────────────────┘
         ← Group Replication (Paxos) →

최소 3노드 구성이 필수이며, 최대 9노드까지 확장 가능하다. 홀수 노드(3, 5, 7)를 사용해야 쿼럼(quorum) 계산에서 split-brain을 방지할 수 있다.

InnoDB Cluster vs NDB Cluster 비교

운영 환경에서 어떤 클러스터링 솔루션을 선택할지 결정할 때 참고할 수 있는 비교표다.

항목	InnoDB Cluster	NDB Cluster
스토리지 엔진	InnoDB (디스크 기반)	NDB (인메모리 + 디스크)
복제 방식	Group Replication (Paxos)	동기 2-phase commit
샤딩	미지원 (수동 파티셔닝 필요)	자동 샤딩 (Node Group)
최대 노드 수	9	255 (데이터 노드 144)
쿼리 실행	단일 스레드	데이터 노드로 push-down 가능
장애 허용	N노드 중 과반수 생존 필요	Node Group당 1 replica로 운영 가능
적합 워크로드	범용 OLTP, 읽기 스케일링	초저지연, 쓰기 집중, 통신/금융
운영 복잡도	낮음 (MySQL Shell로 관리)	높음 (Management Node 별도 필요)
외래 키 지원	완전 지원	제한적
트랜잭션 격리	모든 레벨 지원	READ COMMITTED만 지원

대부분의 웹 서비스, SaaS 백엔드, 일반적인 OLTP 워크로드에서는 InnoDB Cluster가 운영 복잡도와 기능 면에서 합리적인 선택이다. NDB Cluster는 수십만 TPS 이상의 초저지연이 필요한 통신, 금융 실시간 시스템에 적합하다.

InnoDB 클러스터 구축 실전

사전 준비

모든 노드에서 동일한 MySQL 8.4 버전을 설치하고, 아래 조건을 확인한다.

모든 테이블이 InnoDB 엔진을 사용하는지 확인 (MyISAM 등은 사전에 변환)
모든 테이블에 PRIMARY KEY가 존재하는지 확인 (Group Replication 필수 조건)
GTID 활성화 (gtid_mode=ON, enforce_gtid_consistency=ON)
각 노드에 고유한 server_id 설정
/etc/hosts 파일에 모든 노드의 호스트명 등록 (DNS 해석 일관성 보장)

# 각 노드에서 MySQL 8.4 설치 (Ubuntu/Debian 예시)
sudo apt-get update
sudo apt-get install mysql-server-8.4 mysql-shell mysql-router

# InnoDB 엔진이 아닌 테이블 확인
mysql -u root -p -e "
  SELECT TABLE_SCHEMA, TABLE_NAME, ENGINE
  FROM information_schema.TABLES
  WHERE ENGINE != 'InnoDB'
    AND TABLE_SCHEMA NOT IN ('mysql','information_schema','performance_schema','sys');
"

# PRIMARY KEY 없는 테이블 확인
mysql -u root -p -e "
  SELECT t.TABLE_SCHEMA, t.TABLE_NAME
  FROM information_schema.TABLES t
  LEFT JOIN information_schema.TABLE_CONSTRAINTS c
    ON t.TABLE_SCHEMA = c.TABLE_SCHEMA
    AND t.TABLE_NAME = c.TABLE_NAME
    AND c.CONSTRAINT_TYPE = 'PRIMARY KEY'
  WHERE c.TABLE_NAME IS NULL
    AND t.TABLE_SCHEMA NOT IN ('mysql','information_schema','performance_schema','sys')
    AND t.TABLE_TYPE = 'BASE TABLE';
"

my.cnf 설정 (각 노드 공통)

[mysqld]
# 기본 설정
server_id=1                          # 노드마다 고유값 (1, 2, 3)
bind-address=0.0.0.0
port=3306
datadir=/var/lib/mysql

# GTID 활성화
gtid_mode=ON
enforce_gtid_consistency=ON

# Binary Log
log_bin=mysql-bin
binlog_format=ROW
binlog_transaction_dependency_tracking=WRITESET
log_replica_updates=ON

# Group Replication 기본 설정
plugin_load_add=group_replication.so
group_replication_group_name="aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee"
group_replication_start_on_boot=OFF
group_replication_local_address="node1:33061"
group_replication_group_seeds="node1:33061,node2:33061,node3:33061"
group_replication_bootstrap_group=OFF

# InnoDB 튜닝
innodb_buffer_pool_size=4G           # 물리 메모리의 70~80%
innodb_buffer_pool_instances=4       # buffer_pool_size / 1G
innodb_log_file_size=1G
innodb_flush_log_at_trx_commit=1     # 데이터 안정성 최우선
innodb_flush_method=O_DIRECT

# 비활성화된 기본값 중 워크로드에 따라 활성화 검토
# innodb_adaptive_hash_index=ON      # OLTP 핫스팟 쿼리가 많으면 활성화
# innodb_change_buffering=all        # HDD 사용 시 활성화

# 복제 관련
replica_parallel_type=LOGICAL_CLOCK
replica_parallel_workers=4           # 4~8 범위 권장
replica_preserve_commit_order=ON

# 스토리지 엔진 제한 (InnoDB 외 사용 방지)
disabled_storage_engines="MyISAM,BLACKHOLE,FEDERATED,ARCHIVE,MEMORY"

# 트랜잭션 격리 수준 (Multi-Primary 사용 시)
# transaction_isolation=READ-COMMITTED

MySQL Shell로 클러스터 생성

// MySQL Shell (mysqlsh) 접속
// mysqlsh root@node1:3306

// 1. 각 인스턴스 설정 확인 및 자동 구성
dba.configureInstance('root@node1:3306', {
  clusterAdmin: 'clusteradmin',
  clusterAdminPassword: 'SecureP@ss!2026',
})
dba.configureInstance('root@node2:3306', {
  clusterAdmin: 'clusteradmin',
  clusterAdminPassword: 'SecureP@ss!2026',
})
dba.configureInstance('root@node3:3306', {
  clusterAdmin: 'clusteradmin',
  clusterAdminPassword: 'SecureP@ss!2026',
})

// 2. 첫 번째 노드에서 클러스터 생성 (Bootstrap)
shell.connect('clusteradmin@node1:3306')
var cluster = dba.createCluster('prodCluster', {
  multiPrimary: false, // Single-Primary 모드 권장
  memberWeight: 50,
  expelTimeout: 5,
  autoRejoinTries: 3,
  consistency: 'BEFORE_ON_PRIMARY_FAILOVER',
})

// 3. 나머지 노드 추가 (Clone 방식 권장)
cluster.addInstance('clusteradmin@node2:3306', {
  recoveryMethod: 'clone',
})
cluster.addInstance('clusteradmin@node3:3306', {
  recoveryMethod: 'clone',
})

// 4. 클러스터 상태 확인
cluster.status()

recoveryMethod: 'clone'을 지정하면 기존 클러스터 멤버의 물리적 스냅샷을 사용하여 새 노드의 데이터를 완전히 동기화한다. 데이터 양이 많거나 바이너리 로그 보존 기간이 짧은 경우 Clone 방식이 안전하다.

MySQL Router 설정

# Router 부트스트랩 (클러스터 메타데이터 자동 설정)
mysqlrouter --bootstrap clusteradmin@node1:3306 \
  --directory /opt/mysqlrouter \
  --user=mysqlrouter \
  --conf-use-sockets \
  --conf-bind-address=0.0.0.0

# Router 시작
/opt/mysqlrouter/start.sh

# 또는 systemd로 관리
sudo systemctl enable mysqlrouter
sudo systemctl start mysqlrouter

Router가 부트스트랩되면 기본적으로 다음 포트를 사용한다.

6446: R/W 포트 (Primary로 라우팅)
6447: R/O 포트 (Secondary로 라우팅, round-robin)
6448: R/W X Protocol 포트
6449: R/O X Protocol 포트

애플리케이션의 DB 연결 문자열을 Router가 동작하는 호스트의 6446(쓰기) 또는 6447(읽기) 포트로 변경하면 된다.

클러스터 운영과 모니터링

일상 운영 명령어

// MySQL Shell에서 클러스터 객체 가져오기
shell.connect('clusteradmin@node1:3306')
var cluster = dba.getCluster()

// 클러스터 전체 상태 확인
cluster.status({ extended: 1 })

// 특정 노드 상태 상세 확인
cluster.status({ extended: 2 })

// 노드 일시 제거 (유지보수)
cluster.removeInstance('clusteradmin@node2:3306', { force: false })

// 노드 재추가
cluster.addInstance('clusteradmin@node2:3306', { recoveryMethod: 'clone' })

// 클러스터 옵션 조회
cluster.options()

// Primary 수동 전환 (graceful switchover)
cluster.setPrimaryInstance('clusteradmin@node2:3306')

// Router 연결 상태 확인
cluster.listRouters()

핵심 모니터링 쿼리

운영 중 반드시 주기적으로 확인해야 할 메트릭이다.

-- Group Replication 멤버 상태 확인
SELECT MEMBER_ID, MEMBER_HOST, MEMBER_PORT, MEMBER_STATE, MEMBER_ROLE
FROM performance_schema.replication_group_members;

-- 트랜잭션 적용 지연 확인 (각 Secondary에서 실행)
SELECT
  CHANNEL_NAME,
  COUNT_TRANSACTIONS_IN_QUEUE AS trx_in_queue,
  COUNT_TRANSACTIONS_CHECKED AS trx_checked,
  COUNT_TRANSACTIONS_REMOTE_IN_APPLIER_QUEUE AS remote_queue,
  LAST_CONFLICT_FREE_TRANSACTION
FROM performance_schema.replication_group_member_stats
WHERE CHANNEL_NAME = 'group_replication_applier';

-- Applier 워커 상태 확인
SELECT
  WORKER_ID, LAST_SEEN_TRANSACTION,
  APPLYING_TRANSACTION, LAST_APPLIED_TRANSACTION
FROM performance_schema.replication_applier_status_by_worker
WHERE CHANNEL_NAME = 'group_replication_applier';

-- InnoDB 클러스터 메타데이터 확인
SELECT * FROM mysql_innodb_cluster_metadata.clusters;
SELECT * FROM mysql_innodb_cluster_metadata.instances;

경고: 운영 시 반드시 주의할 사항

DDL 실행 주의: 대규모 테이블의 ALTER TABLE은 Group Replication의 인증(certification) 단계에서 전체 클러스터를 블로킹할 수 있다. pt-online-schema-change 또는 gh-ost를 사용한다.
대량 트랜잭션 제한: group_replication_transaction_size_limit(기본 150MB)을 초과하는 트랜잭션은 롤백된다. 대량 INSERT/UPDATE는 배치 단위로 분할한다.
네트워크 지연: 노드 간 네트워크 왕복 지연이 쓰기 성능에 직접 영향을 미친다. 같은 데이터센터 내 배치를 권장하며, 크로스 리전 배치 시 ClusterSet을 검토한다.
바이너리 로그 관리: binlog_expire_logs_seconds를 적절히 설정하여 디스크 공간을 관리한다. 기본값은 2592000초(30일)이다.
백업은 Secondary에서: Primary의 부하를 줄이기 위해 물리 백업(xtrabackup, mysqlbackup)은 Secondary 노드에서 수행한다.

장애 시나리오와 복구 절차

시나리오 1: Secondary 노드 1대 장애

가장 흔한 상황이다. 3노드 클러스터에서 Secondary 1대가 다운되면 클러스터는 2/3 쿼럼으로 정상 운영을 계속한다.

// 장애 노드 복구 후 자동 재합류 확인
cluster.status()
// MEMBER_STATE가 RECOVERING -> ONLINE으로 변경되는지 확인

// 자동 재합류가 실패한 경우
cluster.rejoinInstance('clusteradmin@node3:3306')

// 데이터 격차가 크면 Clone으로 재합류
cluster.removeInstance('clusteradmin@node3:3306', { force: true })
cluster.addInstance('clusteradmin@node3:3306', { recoveryMethod: 'clone' })

시나리오 2: Primary 노드 장애

Primary가 다운되면 Group Replication이 자동으로 새 Primary를 선출한다. member_weight가 높은 노드가 우선 선출된다.

// 새 Primary 확인
cluster.status()

// 기존 Primary 복구 후 Secondary로 재합류
cluster.rejoinInstance('clusteradmin@node1:3306')

MySQL Router는 Primary 변경을 자동 감지하고 쓰기 트래픽을 새 Primary로 라우팅한다. 애플리케이션 레벨에서 재연결 로직(connection retry)이 필요하다.

시나리오 3: 쿼럼 상실 (과반수 노드 다운)

3노드 중 2대가 동시에 다운되면 남은 1대가 쿼럼을 잃어 읽기만 가능하고 쓰기가 불가능해진다.

// 살아 있는 노드에서 강제 쿼럼 복구
// 주의: 데이터 손실 가능성 존재
cluster.forceQuorumUsingPartitionOf('clusteradmin@node1:3306')

// 이후 다운된 노드 복구 후 재합류
cluster.rejoinInstance('clusteradmin@node2:3306')
cluster.rejoinInstance('clusteradmin@node3:3306')

forceQuorumUsingPartitionOf는 마지막 수단이다. 이 명령은 지정한 노드의 데이터를 기준으로 클러스터를 재구성하므로, 다른 노드에서 커밋되었으나 아직 전파되지 않은 트랜잭션이 유실될 수 있다.

시나리오 4: 전체 클러스터 재시작

계획된 전체 정지(인프라 점검 등) 후 재시작 시 순서가 중요하다.

// 1. 가장 최신 GTID를 가진 노드를 먼저 시작
// 각 노드에서 확인:
// SELECT @@gtid_executed;

// 2. 해당 노드에서 클러스터 재부팅
shell.connect('clusteradmin@node1:3306')
var cluster = dba.rebootClusterFromCompleteOutage('prodCluster')

// 3. 나머지 노드는 자동 재합류 또는 수동 합류
cluster.rejoinInstance('clusteradmin@node2:3306')
cluster.rejoinInstance('clusteradmin@node3:3306')

쿼리 최적화 전략

InnoDB 클러스터 환경에서의 쿼리 최적화는 단일 인스턴스와 동일한 원칙을 따르되, Group Replication의 특성을 추가로 고려해야 한다.

슬로우 쿼리 분석 파이프라인

-- 1. 슬로우 쿼리 로그 활성화
SET GLOBAL slow_query_log = 1;
SET GLOBAL long_query_time = 1;        -- 1초 이상 쿼리 기록
SET GLOBAL log_queries_not_using_indexes = 1;
SET GLOBAL slow_query_log_file = '/var/log/mysql/slow.log';

-- 2. performance_schema로 상위 슬로우 쿼리 식별
SELECT
  SCHEMA_NAME,
  DIGEST_TEXT,
  COUNT_STAR AS exec_count,
  ROUND(SUM_TIMER_WAIT / 1000000000000, 3) AS total_sec,
  ROUND(AVG_TIMER_WAIT / 1000000000000, 3) AS avg_sec,
  SUM_ROWS_EXAMINED AS rows_examined,
  SUM_ROWS_SENT AS rows_sent,
  ROUND(SUM_ROWS_EXAMINED / NULLIF(SUM_ROWS_SENT, 0), 1) AS exam_to_sent_ratio,
  FIRST_SEEN,
  LAST_SEEN
FROM performance_schema.events_statements_summary_by_digest
WHERE SCHEMA_NAME NOT IN ('mysql', 'performance_schema', 'information_schema', 'sys')
ORDER BY SUM_TIMER_WAIT DESC
LIMIT 20;

exam_to_sent_ratio(조회 행 대비 반환 행 비율)가 100 이상이면 인덱스 추가 또는 쿼리 재작성이 필요하다는 강력한 신호다.

# 3. pt-query-digest로 슬로우 로그 분석 (Percona Toolkit)
pt-query-digest /var/log/mysql/slow.log \
  --limit=20 \
  --order-by=Query_time:sum \
  --output report > /tmp/slow_report.txt

EXPLAIN ANALYZE 활용

MySQL 8.4에서 EXPLAIN ANALYZE는 실제 실행 통계를 포함한 실행 계획을 제공한다.

EXPLAIN ANALYZE
SELECT o.order_id, o.order_date, c.customer_name, SUM(oi.quantity * oi.unit_price) AS total
FROM orders o
JOIN customers c ON o.customer_id = c.customer_id
JOIN order_items oi ON o.order_id = oi.order_id
WHERE o.order_date BETWEEN '2026-01-01' AND '2026-03-01'
  AND c.region = 'APAC'
GROUP BY o.order_id, o.order_date, c.customer_name
ORDER BY total DESC
LIMIT 50;

출력에서 주목해야 할 항목은 다음과 같다.

actual time: 실제 소요 시간 (첫 행 반환 ~ 마지막 행 반환)
rows: 실제 처리된 행 수 (예상치와 큰 차이가 있으면 통계가 부정확하다는 의미)
loops: 해당 단계가 반복 실행된 횟수
Table scan / Index scan: 풀 테이블 스캔 여부

인덱스 전략

Covering Index (커버링 인덱스)

쿼리에 필요한 모든 컬럼을 인덱스에 포함시켜 테이블 데이터 페이지 접근을 제거하는 전략이다.

-- 자주 실행되는 쿼리 패턴
SELECT customer_id, order_date, status
FROM orders
WHERE customer_id = 12345
  AND status = 'SHIPPED'
ORDER BY order_date DESC;

-- 커버링 인덱스 생성
CREATE INDEX idx_orders_covering
  ON orders (customer_id, status, order_date DESC, order_id);

-- EXPLAIN에서 Extra: Using index 확인
EXPLAIN SELECT customer_id, order_date, status
FROM orders
WHERE customer_id = 12345
  AND status = 'SHIPPED'
ORDER BY order_date DESC;

Invisible Index로 안전한 인덱스 관리

MySQL 8.4에서 인덱스를 삭제하기 전에 invisible로 전환하여 영향을 테스트할 수 있다.

-- 인덱스를 invisible로 변경 (옵티마이저가 무시)
ALTER TABLE orders ALTER INDEX idx_old_index INVISIBLE;

-- 일정 기간 모니터링 후 문제가 없으면 삭제
DROP INDEX idx_old_index ON orders;

-- 문제가 발생하면 즉시 복원
ALTER TABLE orders ALTER INDEX idx_old_index VISIBLE;

히스토그램 활용

MySQL 8.4의 자동 히스토그램 업데이트를 활용하되, 필요한 경우 수동으로도 생성할 수 있다.

-- 컬럼에 히스토그램 생성
ANALYZE TABLE orders UPDATE HISTOGRAM ON status, region WITH 100 BUCKETS;

-- 히스토그램 정보 확인
SELECT
  SCHEMA_NAME, TABLE_NAME, COLUMN_NAME,
  JSON_EXTRACT(HISTOGRAM, '$.\"number-of-buckets-specified\"') AS buckets,
  JSON_EXTRACT(HISTOGRAM, '$.\"sampling-rate\"') AS sampling_rate,
  JSON_EXTRACT(HISTOGRAM, '$.\"histogram-type\"') AS hist_type
FROM information_schema.COLUMN_STATISTICS
WHERE TABLE_NAME = 'orders';

히스토그램은 인덱스가 없는 컬럼의 선택도(selectivity)를 옵티마이저에 알려줘서, 조인 순서와 실행 계획 결정에 도움을 준다. 카디널리티가 낮은 컬럼(status, region, type 등)에 특히 효과적이다.

Group Replication 환경 특화 최적화

InnoDB 클러스터에서 추가로 고려해야 할 최적화 포인트가 있다.

쓰기 최적화: Group Replication은 모든 쓰기 트랜잭션을 그룹 전체에서 인증(certification)한다. 트랜잭션이 작을수록 인증 충돌(conflict) 가능성이 낮다. 대량 INSERT는 1000~5000행 단위의 배치로 분할한다.
읽기 분산: MySQL Router의 R/O 포트(6447)를 활용하여 읽기 쿼리를 Secondary로 분산한다. 단, consistency: 'BEFORE_ON_PRIMARY_FAILOVER' 설정 시 페일오버 직후 잠시 읽기 지연이 발생할 수 있다.
Multi-Primary 모드의 격리 수준: Multi-Primary 모드를 사용한다면 transaction_isolation=READ-COMMITTED로 설정한다. REPEATABLE READ는 Multi-Primary에서 인증 충돌을 증가시킬 수 있다.
핫스팟 테이블 회피: 같은 행에 대한 동시 업데이트는 인증 단계에서 충돌을 발생시킨다. 카운터 테이블 등은 애플리케이션 레벨에서 분산 처리(Redis 등)를 고려한다.

InnoDB 버퍼 풀 튜닝

쿼리 성능에 가장 큰 영향을 미치는 단일 설정은 innodb_buffer_pool_size다.

-- 현재 버퍼 풀 히트율 확인
SELECT
  (1 - (
    (SELECT VARIABLE_VALUE FROM performance_schema.global_status
     WHERE VARIABLE_NAME = 'Innodb_buffer_pool_reads') /
    (SELECT VARIABLE_VALUE FROM performance_schema.global_status
     WHERE VARIABLE_NAME = 'Innodb_buffer_pool_read_requests')
  )) * 100 AS buffer_pool_hit_rate;

-- 99% 이하이면 buffer_pool_size 증가 검토

-- 버퍼 풀 상세 상태
SELECT
  POOL_ID,
  POOL_SIZE,
  FREE_BUFFERS,
  DATABASE_PAGES,
  MODIFIED_DB_PAGES,
  PAGES_MADE_YOUNG,
  PAGES_NOT_MADE_YOUNG
FROM information_schema.INNODB_BUFFER_POOL_STATS;

-- 온라인으로 버퍼 풀 크기 변경 (재시작 불필요)
SET GLOBAL innodb_buffer_pool_size = 8 * 1024 * 1024 * 1024;  -- 8GB

버퍼 풀 히트율이 99% 이상을 유지하는 것이 목표다. 히트율이 95% 이하로 떨어지면 디스크 I/O가 급증하여 전체 쿼리 성능이 급격히 저하된다.

운영 체크리스트

일상 운영과 장애 대응을 위한 체크리스트다.

일일 점검

cluster.status()로 전체 노드 ONLINE 상태 확인
replication_group_member_stats에서 COUNT_TRANSACTIONS_IN_QUEUE가 비정상적으로 높지 않은지 확인 (100 이상이면 경고)
슬로우 쿼리 로그에 새로운 패턴이 있는지 확인
디스크 사용량 확인 (바이너리 로그, 릴레이 로그, 데이터 디렉토리)
버퍼 풀 히트율 99% 이상 유지 확인

주간 점검

pt-query-digest로 슬로우 쿼리 트렌드 분석
사용되지 않는 인덱스 확인 (sys.schema_unused_indexes)
중복 인덱스 확인 (sys.schema_redundant_indexes)
테이블 통계 업데이트 (ANALYZE TABLE)
바이너리 로그 정리 상태 확인

장애 대응 전 준비

forceQuorumUsingPartitionOf 사용 절차를 팀 전체가 숙지
rebootClusterFromCompleteOutage 실행 순서(최신 GTID 노드 우선) 문서화
MySQL Router 재부트스트랩 절차 숙지
물리 백업(xtrabackup)이 최소 일 1회 Secondary에서 수행되는지 확인
복구 시간 목표(RTO)에 맞는 복구 절차 리허설

업그레이드 전 확인

util.checkForServerUpgrade()로 호환성 검증
mysql_native_password 의존성 제거 확인
MASTER/SLAVE 용어 사용 코드 검색 및 교체
mysqlpump 사용 스크립트를 mysqldump 또는 MySQL Shell dump로 교체
비고유 키 FK 사용 여부 확인 (restrict_fk_on_non_standard_key 대비)
Rolling Upgrade 순서: Secondary -> Secondary -> Primary (switchover 후)

성능 벤치마크 참고값

8.4 LTS 환경에서의 대략적인 성능 기대치를 정리한다. 하드웨어와 워크로드에 따라 큰 차이가 있으므로 참고 수준으로만 활용한다.

구성	단일 인스턴스	InnoDB Cluster 3노드 (Single-Primary)
쓰기 TPS (sysbench oltp_write_only)	~15,000	~~10,000~~12,000 (인증 오버헤드)
읽기 QPS (sysbench oltp_read_only)	~50,000	~120,000 (3노드 합산, R/O 분산)
페일오버 시간	N/A	5~30초 (expelTimeout + 선출 시간)
Clone 복구 시간 (100GB 데이터)	N/A	10~30분 (네트워크 대역폭 의존)

Group Replication의 인증 오버헤드로 단일 노드 대비 쓰기 TPS가 20~30% 감소하는 것은 정상이다. 읽기 성능은 노드 추가로 선형에 가깝게 확장할 수 있다.

마무리

MySQL 8.4 InnoDB 클러스터는 운영 복잡도와 가용성 사이의 균형이 잘 잡힌 솔루션이다. 핵심은 세 가지다.

첫째, 클러스터 구성 전에 모든 사전 조건(InnoDB 엔진, PK, GTID)을 완벽히 충족시킨다. 둘째, Group Replication의 특성(인증 기반 합의)을 이해하고 트랜잭션 크기를 제어한다. 셋째, performance_schema와 슬로우 쿼리 분석을 통해 80/20 법칙에 따라 상위 20%의 문제 쿼리를 집중 최적화한다.

8.0에서 8.4로의 업그레이드를 고려하고 있다면, mysql_native_password 의존성 제거와 MASTER/SLAVE 용어 교체를 먼저 진행한 뒤 Rolling Upgrade 방식으로 진행할 것을 권장한다.

References

MySQL 8.4 Release Notes - MySQL 8.4 LTS 전체 릴리스 노트
MySQL 8.4 Reference Manual - What Is New in MySQL 8.4 - 8.0 대비 변경 사항 공식 문서
MySQL Shell 9.5 - InnoDB Cluster - InnoDB Cluster 공식 가이드
MySQL 8.4 Reference Manual - Group Replication - Group Replication 공식 레퍼런스
MySQL 8.4 Reference Manual - Optimizing SELECT Statements - 쿼리 최적화 공식 가이드
MySQL 8.4 - InnoDB vs NDB Cluster Comparison - InnoDB와 NDB 스토리지 엔진 비교
Percona - InnoDB Cluster Setup: Building a 3-Node HA Architecture - 실전 구축 가이드
MySQL Replication Best Practices (Percona, 2025) - 복제 운영 베스트 프랙티스

MySQL 8.4 InnoDB Cluster Operations and Query Optimization Handbook

What MySQL 8.4 LTS Means
- Summary of Major Changes
InnoDB Cluster Architecture
- InnoDB Cluster vs NDB Cluster Comparison
InnoDB Cluster Setup in Practice
Cluster Operations and Monitoring
Failure Scenarios and Recovery Procedures
Query Optimization Strategies
InnoDB Buffer Pool Tuning
Operations Checklist
Performance Benchmark Reference Values
Conclusion
References
Quiz

What MySQL 8.4 LTS Means

MySQL 8.4 is Oracle's officially designated Long-Term Support (LTS) release. Since its GA in April 2024, patches from 8.4.4 through 8.4.8 have been steadily released throughout 2025-2026, and MySQL 8.0 is approaching its EOL in 2026. Migrating to 8.4 in production environments is no longer optional -- it is essential.

Let's first summarize the key changes in 8.4.

Summary of Major Changes

mysql_native_password disabled by default: Starting from 8.4.0, it is no longer loaded by default. caching_sha2_password is the default authentication plugin. If you have legacy apps, you must explicitly enable it with --mysql-native-password=ON.
Complete removal of MASTER/SLAVE terminology: CHANGE MASTER TO, START SLAVE, SHOW SLAVE STATUS, etc. now produce syntax errors. You must switch to CHANGE REPLICATION SOURCE TO, START REPLICA, SHOW REPLICA STATUS.
mysqlpump removed: Use mysqldump or MySQL Shell's dump utilities instead.
InnoDB Adaptive Hash Index (AHI) disabled by default: Performance may vary depending on workload. Enable it manually with innodb_adaptive_hash_index=ON if needed.
InnoDB Change Buffer disabled by default: Disabling it is actually beneficial for SSD-based storage.
Automatic histogram updates: Improves optimizer statistics accuracy for better query execution plans. Histogram metadata persists even after server restarts.
restrict_fk_on_non_standard_key=ON by default: Using non-unique or partial keys as foreign keys is now blocked by default.

InnoDB Cluster Architecture

An InnoDB Cluster consists of three core components:

MySQL Group Replication: A synchronous replication layer using a Paxos-based consensus protocol. It ensures data consistency and automatic failover.
MySQL Shell (AdminAPI): A management interface for creating clusters, adding/removing nodes, and monitoring status.
MySQL Router: Middleware that routes application traffic to the appropriate nodes. It distributes writes to the Primary and reads to Secondaries.

┌─────────────────────────────────────────────────┐
│                  Application                     │
└──────────────────────┬──────────────────────────┘
                       │
              ┌────────▼────────┐
              │  MySQL Router   │
              │  (R/W split)    │
              └───┬────────┬────┘
         Write    │        │   Read
         ┌────────▼──┐  ┌──▼────────┐
         │  Primary   │  │ Secondary │
         │  (node-1)  │  │ (node-2)  │
         └────────────┘  └───────────┘
                  │           │
              ┌───▼───────────▼───┐
              │    Secondary      │
              │    (node-3)       │
              └───────────────────┘
         ← Group Replication (Paxos) →

A minimum of 3 nodes is required, with support for up to 9 nodes. Using an odd number of nodes (3, 5, 7) is necessary to prevent split-brain during quorum calculations.

InnoDB Cluster vs NDB Cluster Comparison

Here is a comparison table to help decide which clustering solution to choose for your production environment.

Item	InnoDB Cluster	NDB Cluster
Storage Engine	InnoDB (disk-based)	NDB (in-memory + disk)
Replication Method	Group Replication (Paxos)	Synchronous 2-phase commit
Sharding	Not supported (manual partitioning required)	Automatic sharding (Node Group)
Max Nodes	9	255 (144 data nodes)
Query Execution	Single-threaded	Push-down to data nodes possible
Fault Tolerance	Majority of N nodes must survive	Can operate with 1 replica per Node Group
Suitable Workloads	General OLTP, read scaling	Ultra-low latency, write-intensive, telecom/finance
Operational Complexity	Low (managed via MySQL Shell)	High (requires separate Management Node)
Foreign Key Support	Full support	Limited
Transaction Isolation	All levels supported	READ COMMITTED only

For most web services, SaaS backends, and general OLTP workloads, InnoDB Cluster is the practical choice in terms of operational complexity and features. NDB Cluster is suited for telecom and financial real-time systems requiring ultra-low latency with hundreds of thousands of TPS or more.

InnoDB Cluster Setup in Practice

Prerequisites

Install the same MySQL 8.4 version on all nodes and verify the following conditions:

Confirm all tables use the InnoDB engine (convert any MyISAM tables beforehand)
Confirm all tables have a PRIMARY KEY (required for Group Replication)
Enable GTID (gtid_mode=ON, enforce_gtid_consistency=ON)
Set a unique server_id on each node
Register all node hostnames in /etc/hosts (to ensure consistent DNS resolution)

# Install MySQL 8.4 on each node (Ubuntu/Debian example)
sudo apt-get update
sudo apt-get install mysql-server-8.4 mysql-shell mysql-router

# Check for tables not using InnoDB engine
mysql -u root -p -e "
  SELECT TABLE_SCHEMA, TABLE_NAME, ENGINE
  FROM information_schema.TABLES
  WHERE ENGINE != 'InnoDB'
    AND TABLE_SCHEMA NOT IN ('mysql','information_schema','performance_schema','sys');
"

# Check for tables without PRIMARY KEY
mysql -u root -p -e "
  SELECT t.TABLE_SCHEMA, t.TABLE_NAME
  FROM information_schema.TABLES t
  LEFT JOIN information_schema.TABLE_CONSTRAINTS c
    ON t.TABLE_SCHEMA = c.TABLE_SCHEMA
    AND t.TABLE_NAME = c.TABLE_NAME
    AND c.CONSTRAINT_TYPE = 'PRIMARY KEY'
  WHERE c.TABLE_NAME IS NULL
    AND t.TABLE_SCHEMA NOT IN ('mysql','information_schema','performance_schema','sys')
    AND t.TABLE_TYPE = 'BASE TABLE';
"

my.cnf Configuration (common across all nodes)

[mysqld]
# Basic settings
server_id=1                          # Unique value per node (1, 2, 3)
bind-address=0.0.0.0
port=3306
datadir=/var/lib/mysql

# Enable GTID
gtid_mode=ON
enforce_gtid_consistency=ON

# Binary Log
log_bin=mysql-bin
binlog_format=ROW
binlog_transaction_dependency_tracking=WRITESET
log_replica_updates=ON

# Group Replication basic settings
plugin_load_add=group_replication.so
group_replication_group_name="aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee"
group_replication_start_on_boot=OFF
group_replication_local_address="node1:33061"
group_replication_group_seeds="node1:33061,node2:33061,node3:33061"
group_replication_bootstrap_group=OFF

# InnoDB tuning
innodb_buffer_pool_size=4G           # 70~80% of physical memory
innodb_buffer_pool_instances=4       # buffer_pool_size / 1G
innodb_log_file_size=1G
innodb_flush_log_at_trx_commit=1     # Data safety first
innodb_flush_method=O_DIRECT

# Default-disabled options to consider enabling based on workload
# innodb_adaptive_hash_index=ON      # Enable for OLTP hotspot queries
# innodb_change_buffering=all        # Enable for HDD usage

# Replication-related
replica_parallel_type=LOGICAL_CLOCK
replica_parallel_workers=4           # 4~8 recommended
replica_preserve_commit_order=ON

# Storage engine restriction (prevent non-InnoDB usage)
disabled_storage_engines="MyISAM,BLACKHOLE,FEDERATED,ARCHIVE,MEMORY"

# Transaction isolation level (for Multi-Primary mode)
# transaction_isolation=READ-COMMITTED

Creating the Cluster with MySQL Shell

// Connect to MySQL Shell (mysqlsh)
// mysqlsh root@node1:3306

// 1. Verify and auto-configure each instance
dba.configureInstance('root@node1:3306', {
  clusterAdmin: 'clusteradmin',
  clusterAdminPassword: 'SecureP@ss!2026',
})
dba.configureInstance('root@node2:3306', {
  clusterAdmin: 'clusteradmin',
  clusterAdminPassword: 'SecureP@ss!2026',
})
dba.configureInstance('root@node3:3306', {
  clusterAdmin: 'clusteradmin',
  clusterAdminPassword: 'SecureP@ss!2026',
})

// 2. Create the cluster on the first node (Bootstrap)
shell.connect('clusteradmin@node1:3306')
var cluster = dba.createCluster('prodCluster', {
  multiPrimary: false, // Single-Primary mode recommended
  memberWeight: 50,
  expelTimeout: 5,
  autoRejoinTries: 3,
  consistency: 'BEFORE_ON_PRIMARY_FAILOVER',
})

// 3. Add remaining nodes (Clone method recommended)
cluster.addInstance('clusteradmin@node2:3306', {
  recoveryMethod: 'clone',
})
cluster.addInstance('clusteradmin@node3:3306', {
  recoveryMethod: 'clone',
})

// 4. Check cluster status
cluster.status()

Specifying recoveryMethod: 'clone' uses a physical snapshot from an existing cluster member to fully synchronize data on the new node. The Clone method is safer when data volume is large or binary log retention period is short.

MySQL Router Configuration

# Router bootstrap (auto-configures cluster metadata)
mysqlrouter --bootstrap clusteradmin@node1:3306 \
  --directory /opt/mysqlrouter \
  --user=mysqlrouter \
  --conf-use-sockets \
  --conf-bind-address=0.0.0.0

# Start Router
/opt/mysqlrouter/start.sh

# Or manage via systemd
sudo systemctl enable mysqlrouter
sudo systemctl start mysqlrouter

Once the Router is bootstrapped, it uses the following default ports:

6446: R/W port (routes to Primary)
6447: R/O port (routes to Secondaries, round-robin)
6448: R/W X Protocol port
6449: R/O X Protocol port

Simply change your application's DB connection string to point to the Router host's port 6446 (write) or 6447 (read).

Cluster Operations and Monitoring

Daily Operations Commands

// Get the cluster object in MySQL Shell
shell.connect('clusteradmin@node1:3306')
var cluster = dba.getCluster()

// Check overall cluster status
cluster.status({ extended: 1 })

// Detailed status for a specific node
cluster.status({ extended: 2 })

// Temporarily remove a node (for maintenance)
cluster.removeInstance('clusteradmin@node2:3306', { force: false })

// Re-add a node
cluster.addInstance('clusteradmin@node2:3306', { recoveryMethod: 'clone' })

// View cluster options
cluster.options()

// Manual Primary switchover (graceful)
cluster.setPrimaryInstance('clusteradmin@node2:3306')

// Check Router connection status
cluster.listRouters()

Essential Monitoring Queries

These are metrics that must be checked periodically during operations.

-- Check Group Replication member status
SELECT MEMBER_ID, MEMBER_HOST, MEMBER_PORT, MEMBER_STATE, MEMBER_ROLE
FROM performance_schema.replication_group_members;

-- Check transaction apply lag (run on each Secondary)
SELECT
  CHANNEL_NAME,
  COUNT_TRANSACTIONS_IN_QUEUE AS trx_in_queue,
  COUNT_TRANSACTIONS_CHECKED AS trx_checked,
  COUNT_TRANSACTIONS_REMOTE_IN_APPLIER_QUEUE AS remote_queue,
  LAST_CONFLICT_FREE_TRANSACTION
FROM performance_schema.replication_group_member_stats
WHERE CHANNEL_NAME = 'group_replication_applier';

-- Check Applier worker status
SELECT
  WORKER_ID, LAST_SEEN_TRANSACTION,
  APPLYING_TRANSACTION, LAST_APPLIED_TRANSACTION
FROM performance_schema.replication_applier_status_by_worker
WHERE CHANNEL_NAME = 'group_replication_applier';

-- Check InnoDB Cluster metadata
SELECT * FROM mysql_innodb_cluster_metadata.clusters;
SELECT * FROM mysql_innodb_cluster_metadata.instances;

Warning: Critical Operational Considerations

Be cautious with DDL: ALTER TABLE on large tables can block the entire cluster during Group Replication's certification stage. Use pt-online-schema-change or gh-ost.
Limit large transactions: Transactions exceeding group_replication_transaction_size_limit (default 150MB) will be rolled back. Split bulk INSERT/UPDATE operations into batches.
Network latency: Network round-trip latency between nodes directly impacts write performance. Co-locating nodes within the same data center is recommended; consider ClusterSet for cross-region deployments.
Binary log management: Set binlog_expire_logs_seconds appropriately to manage disk space. The default is 2592000 seconds (30 days).
Run backups on Secondaries: To reduce load on the Primary, perform physical backups (xtrabackup, mysqlbackup) on Secondary nodes.

Failure Scenarios and Recovery Procedures

Scenario 1: Single Secondary Node Failure

This is the most common situation. When one Secondary goes down in a 3-node cluster, the cluster continues normal operation with 2/3 quorum.

// Verify auto-rejoin after the failed node recovers
cluster.status()
// Confirm MEMBER_STATE changes from RECOVERING to ONLINE

// If auto-rejoin fails
cluster.rejoinInstance('clusteradmin@node3:3306')

// If the data gap is large, rejoin via Clone
cluster.removeInstance('clusteradmin@node3:3306', { force: true })
cluster.addInstance('clusteradmin@node3:3306', { recoveryMethod: 'clone' })

Scenario 2: Primary Node Failure

When the Primary goes down, Group Replication automatically elects a new Primary. The node with the higher member_weight is elected first.

// Check the new Primary
cluster.status()

// Rejoin the former Primary as a Secondary after recovery
cluster.rejoinInstance('clusteradmin@node1:3306')

MySQL Router automatically detects the Primary change and routes write traffic to the new Primary. Connection retry logic at the application level is required.

Scenario 3: Quorum Loss (Majority of Nodes Down)

When 2 out of 3 nodes go down simultaneously, the remaining node loses quorum -- it can only serve reads, not writes.

// Force quorum recovery from the surviving node
// Warning: Data loss is possible
cluster.forceQuorumUsingPartitionOf('clusteradmin@node1:3306')

// After recovering the downed nodes, rejoin them
cluster.rejoinInstance('clusteradmin@node2:3306')
cluster.rejoinInstance('clusteradmin@node3:3306')

forceQuorumUsingPartitionOf is a last resort. This command reconstructs the cluster based on the specified node's data, so transactions that were committed on other nodes but not yet propagated may be lost.

Scenario 4: Full Cluster Restart

The order matters when restarting after a planned full shutdown (e.g., infrastructure maintenance).

// 1. Start the node with the most recent GTID first
// Check on each node:
// SELECT @@gtid_executed;

// 2. Reboot the cluster from that node
shell.connect('clusteradmin@node1:3306')
var cluster = dba.rebootClusterFromCompleteOutage('prodCluster')

// 3. Remaining nodes auto-rejoin or manual rejoin
cluster.rejoinInstance('clusteradmin@node2:3306')
cluster.rejoinInstance('clusteradmin@node3:3306')

Query Optimization Strategies

Query optimization in an InnoDB Cluster environment follows the same principles as a single instance, but must additionally consider the characteristics of Group Replication.

Slow Query Analysis Pipeline

-- 1. Enable slow query log
SET GLOBAL slow_query_log = 1;
SET GLOBAL long_query_time = 1;        -- Log queries taking over 1 second
SET GLOBAL log_queries_not_using_indexes = 1;
SET GLOBAL slow_query_log_file = '/var/log/mysql/slow.log';

-- 2. Identify top slow queries via performance_schema
SELECT
  SCHEMA_NAME,
  DIGEST_TEXT,
  COUNT_STAR AS exec_count,
  ROUND(SUM_TIMER_WAIT / 1000000000000, 3) AS total_sec,
  ROUND(AVG_TIMER_WAIT / 1000000000000, 3) AS avg_sec,
  SUM_ROWS_EXAMINED AS rows_examined,
  SUM_ROWS_SENT AS rows_sent,
  ROUND(SUM_ROWS_EXAMINED / NULLIF(SUM_ROWS_SENT, 0), 1) AS exam_to_sent_ratio,
  FIRST_SEEN,
  LAST_SEEN
FROM performance_schema.events_statements_summary_by_digest
WHERE SCHEMA_NAME NOT IN ('mysql', 'performance_schema', 'information_schema', 'sys')
ORDER BY SUM_TIMER_WAIT DESC
LIMIT 20;

An exam_to_sent_ratio (ratio of rows examined to rows returned) of 100 or more is a strong signal that an index needs to be added or the query needs to be rewritten.

# 3. Analyze slow log with pt-query-digest (Percona Toolkit)
pt-query-digest /var/log/mysql/slow.log \
  --limit=20 \
  --order-by=Query_time:sum \
  --output report > /tmp/slow_report.txt

Using EXPLAIN ANALYZE

In MySQL 8.4, EXPLAIN ANALYZE provides execution plans with actual runtime statistics.

EXPLAIN ANALYZE
SELECT o.order_id, o.order_date, c.customer_name, SUM(oi.quantity * oi.unit_price) AS total
FROM orders o
JOIN customers c ON o.customer_id = c.customer_id
JOIN order_items oi ON o.order_id = oi.order_id
WHERE o.order_date BETWEEN '2026-01-01' AND '2026-03-01'
  AND c.region = 'APAC'
GROUP BY o.order_id, o.order_date, c.customer_name
ORDER BY total DESC
LIMIT 50;

Key items to focus on in the output:

actual time: Actual elapsed time (from first row returned to last row returned)
rows: Actual rows processed (a large difference from estimates means statistics are inaccurate)
loops: Number of times that stage was executed repeatedly
Table scan / Index scan: Whether a full table scan occurred

Index Strategies

Covering Index

A strategy that includes all columns needed by the query in the index, eliminating access to table data pages.

-- Frequently executed query pattern
SELECT customer_id, order_date, status
FROM orders
WHERE customer_id = 12345
  AND status = 'SHIPPED'
ORDER BY order_date DESC;

-- Create a covering index
CREATE INDEX idx_orders_covering
  ON orders (customer_id, status, order_date DESC, order_id);

-- Verify "Using index" in EXPLAIN output
EXPLAIN SELECT customer_id, order_date, status
FROM orders
WHERE customer_id = 12345
  AND status = 'SHIPPED'
ORDER BY order_date DESC;

Safe Index Management with Invisible Indexes

In MySQL 8.4, you can switch an index to invisible to test the impact before dropping it.

-- Change an index to invisible (optimizer ignores it)
ALTER TABLE orders ALTER INDEX idx_old_index INVISIBLE;

-- Drop it after monitoring for a period with no issues
DROP INDEX idx_old_index ON orders;

-- Restore immediately if problems arise
ALTER TABLE orders ALTER INDEX idx_old_index VISIBLE;

Using Histograms

Leverage MySQL 8.4's automatic histogram updates, with the option to create them manually when needed.

-- Create histograms on columns
ANALYZE TABLE orders UPDATE HISTOGRAM ON status, region WITH 100 BUCKETS;

-- Check histogram information
SELECT
  SCHEMA_NAME, TABLE_NAME, COLUMN_NAME,
  JSON_EXTRACT(HISTOGRAM, '$.\"number-of-buckets-specified\"') AS buckets,
  JSON_EXTRACT(HISTOGRAM, '$.\"sampling-rate\"') AS sampling_rate,
  JSON_EXTRACT(HISTOGRAM, '$.\"histogram-type\"') AS hist_type
FROM information_schema.COLUMN_STATISTICS
WHERE TABLE_NAME = 'orders';

Histograms inform the optimizer about the selectivity of columns without indexes, helping with join order and execution plan decisions. They are particularly effective for low-cardinality columns (status, region, type, etc.).

Group Replication-Specific Optimizations

There are additional optimization points to consider in an InnoDB Cluster environment.

Write optimization: Group Replication certifies all write transactions across the entire group. Smaller transactions mean lower chances of certification conflicts. Split bulk INSERTs into batches of 1,000 to 5,000 rows.
Read distribution: Use MySQL Router's R/O port (6447) to distribute read queries to Secondaries. Note that with consistency: 'BEFORE_ON_PRIMARY_FAILOVER', a brief read delay may occur immediately after failover.
Isolation level in Multi-Primary mode: If using Multi-Primary mode, set transaction_isolation=READ-COMMITTED. REPEATABLE READ can increase certification conflicts in Multi-Primary.
Avoid hotspot tables: Concurrent updates to the same rows cause conflicts during the certification stage. For counter tables, consider distributed processing at the application level (e.g., Redis).

InnoDB Buffer Pool Tuning

The single setting with the greatest impact on query performance is innodb_buffer_pool_size.

-- Check current buffer pool hit rate
SELECT
  (1 - (
    (SELECT VARIABLE_VALUE FROM performance_schema.global_status
     WHERE VARIABLE_NAME = 'Innodb_buffer_pool_reads') /
    (SELECT VARIABLE_VALUE FROM performance_schema.global_status
     WHERE VARIABLE_NAME = 'Innodb_buffer_pool_read_requests')
  )) * 100 AS buffer_pool_hit_rate;

-- Consider increasing buffer_pool_size if under 99%

-- Detailed buffer pool status
SELECT
  POOL_ID,
  POOL_SIZE,
  FREE_BUFFERS,
  DATABASE_PAGES,
  MODIFIED_DB_PAGES,
  PAGES_MADE_YOUNG,
  PAGES_NOT_MADE_YOUNG
FROM information_schema.INNODB_BUFFER_POOL_STATS;

-- Change buffer pool size online (no restart required)
SET GLOBAL innodb_buffer_pool_size = 8 * 1024 * 1024 * 1024;  -- 8GB

The goal is to maintain a buffer pool hit rate of 99% or higher. If the hit rate drops below 95%, disk I/O spikes dramatically, causing sharp degradation in overall query performance.

Operations Checklist

A checklist for daily operations and incident response.

Daily Checks

Verify all nodes are ONLINE via cluster.status()
Check that COUNT_TRANSACTIONS_IN_QUEUE in replication_group_member_stats is not abnormally high (alert if over 100)
Check for new patterns in the slow query log
Check disk usage (binary logs, relay logs, data directory)
Verify buffer pool hit rate stays at 99% or above

Weekly Checks

Analyze slow query trends with pt-query-digest
Check for unused indexes (sys.schema_unused_indexes)
Check for redundant indexes (sys.schema_redundant_indexes)
Update table statistics (ANALYZE TABLE)
Check binary log purge status

Incident Response Preparation

Ensure the entire team is familiar with the forceQuorumUsingPartitionOf procedure
Document the rebootClusterFromCompleteOutage execution order (most recent GTID node first)
Be familiar with the MySQL Router re-bootstrap procedure
Verify physical backups (xtrabackup) are performed at least daily on Secondaries
Rehearse recovery procedures aligned with Recovery Time Objective (RTO)

Pre-Upgrade Checks

Validate compatibility with util.checkForServerUpgrade()
Confirm removal of mysql_native_password dependencies
Search for and replace code using MASTER/SLAVE terminology
Replace mysqlpump scripts with mysqldump or MySQL Shell dump
Check for non-unique key FK usage (prepare for restrict_fk_on_non_standard_key)
Rolling Upgrade order: Secondary -> Secondary -> Primary (after switchover)

Performance Benchmark Reference Values

Here are approximate performance expectations in an 8.4 LTS environment. These vary significantly based on hardware and workload, so use them only as reference.

Configuration	Single Instance	InnoDB Cluster 3-Node (Single-Primary)
Write TPS (sysbench oltp_write_only)	~15,000	~10,000-12,000 (certification overhead)
Read QPS (sysbench oltp_read_only)	~50,000	~120,000 (sum of 3 nodes, R/O distributed)
Failover Time	N/A	5-30 seconds (expelTimeout + election time)
Clone Recovery Time (100GB data)	N/A	10-30 minutes (depends on network bandwidth)

It is normal for write TPS to decrease by 20-30% compared to a single node due to Group Replication certification overhead. Read performance can scale near-linearly by adding nodes.

Conclusion

MySQL 8.4 InnoDB Cluster is a solution with a well-balanced trade-off between operational complexity and availability. Three things are key.

First, fully satisfy all prerequisites (InnoDB engine, PK, GTID) before cluster setup. Second, understand the characteristics of Group Replication (certification-based consensus) and control transaction sizes. Third, focus on optimizing the top 20% of problem queries following the 80/20 rule through performance_schema and slow query analysis.

If you are considering an upgrade from 8.0 to 8.4, it is recommended to first remove mysql_native_password dependencies and replace MASTER/SLAVE terminology, then proceed with a Rolling Upgrade approach.

References

MySQL 8.4 Release Notes - Complete release notes for MySQL 8.4 LTS
MySQL 8.4 Reference Manual - What Is New in MySQL 8.4 - Official documentation of changes from 8.0
MySQL Shell 9.5 - InnoDB Cluster - Official InnoDB Cluster guide
MySQL 8.4 Reference Manual - Group Replication - Official Group Replication reference
MySQL 8.4 Reference Manual - Optimizing SELECT Statements - Official query optimization guide
MySQL 8.4 - InnoDB vs NDB Cluster Comparison - InnoDB and NDB storage engine comparison
Percona - InnoDB Cluster Setup: Building a 3-Node HA Architecture - Practical setup guide
MySQL Replication Best Practices (Percona, 2025) - Replication operations best practices

Quiz

Q1: What is the main topic covered in "MySQL 8.4 InnoDB Cluster Operations and Query Optimization Handbook"?

A practical operations handbook covering MySQL 8.4 InnoDB Cluster setup, Group Replication operations, query optimization, index strategies, slow query analysis, and disaster recovery.

Q2: What MySQL 8.4 LTS Means?

Q3: Describe the InnoDB Cluster Architecture.

An InnoDB Cluster consists of three core components: MySQL Group Replication: A synchronous replication layer using a Paxos-based consensus protocol. It ensures data consistency and automatic failover.

Q4: What are the key steps for InnoDB Cluster Setup in Practice?

Prerequisites Install the same MySQL 8.4 version on all nodes and verify the following conditions: Confirm all tables use the InnoDB engine (convert any MyISAM tables beforehand) Confirm all tables have a PRIMARY KEY (required for Group Replication) Enable GTID (gtid_mode=ON, enf...

Q5: How does Cluster Operations and Monitoring work?

Daily Operations Commands Essential Monitoring Queries These are metrics that must be checked periodically during operations.