Split View: 백엔드 성능 엔지니어링 완전 가이드 2025: 프로파일링, 부하 테스트, 병목 분석, 최적화
백엔드 성능 엔지니어링 완전 가이드 2025: 프로파일링, 부하 테스트, 병목 분석, 최적화
목차
1. 성능 엔지니어링 마인드셋
1.1 측정 먼저, 최적화는 나중에
성능 엔지니어링의 황금률은 "추측하지 말고, 측정하라"입니다. 직감에 의한 최적화는 대부분 잘못된 곳에 시간을 낭비합니다.
성능 최적화의 3단계:
- 측정(Measure): 현재 성능을 정량적으로 측정
- 분석(Analyze): 병목 지점을 정확히 식별
- 최적화(Optimize): 가장 영향력 큰 병목부터 해결
1.2 암달의 법칙(Amdahl's Law)
시스템 전체 성능 향상은 개선 가능한 부분의 비율에 의해 제한됩니다.
전체 속도 향상 = 1 / ((1 - P) + P / S)
P = 개선 가능한 부분의 비율
S = 해당 부분의 속도 향상 배수
예시: 전체의 20%를 차지하는 코드를 10배 빠르게 만들면
= 1 / ((1 - 0.2) + 0.2 / 10)
= 1 / (0.8 + 0.02)
= 1.22배 (22% 향상)
반면, 전체의 80%를 차지하는 코드를 2배 빠르게 만들면
= 1 / ((1 - 0.8) + 0.8 / 2)
= 1 / (0.2 + 0.4)
= 1.67배 (67% 향상)
핵심: 작은 부분을 극적으로 개선하는 것보다, 큰 부분을 적당히 개선하는 것이 효과적입니다.
1.3 성능 예산(Performance Budget)
# 성능 예산 정의 예시
performance_budget:
api_endpoints:
p50_latency_ms: 50
p95_latency_ms: 200
p99_latency_ms: 500
max_latency_ms: 2000
error_rate_percent: 0.1
throughput_rps: 1000
database:
query_p95_ms: 50
query_p99_ms: 200
connection_pool_utilization: 70
slow_query_threshold_ms: 100
external_services:
p95_latency_ms: 300
timeout_ms: 5000
retry_count: 3
circuit_breaker_threshold: 50
2. 프로파일링
2.1 CPU 프로파일링과 Flame Graph
Flame Graph는 CPU 시간이 어디에 소비되는지를 시각적으로 보여주는 강력한 도구입니다.
Node.js CPU 프로파일링:
// Node.js - 내장 프로파일러 사용
// 실행: node --prof app.js
// 분석: node --prof-process isolate-*.log > profile.txt
// 또는 v8-profiler-next 사용
const v8Profiler = require('v8-profiler-next');
function startProfiling(durationMs = 30000) {
const title = `cpu-profile-${Date.now()}`;
v8Profiler.startProfiling(title, true);
setTimeout(() => {
const profile = v8Profiler.stopProfiling(title);
profile.export((error, result) => {
if (!error) {
require('fs').writeFileSync(
`./profiles/${title}.cpuprofile`,
result
);
}
profile.delete();
});
}, durationMs);
}
// 미들웨어로 특정 요청 프로파일링
function profilingMiddleware(req, res, next) {
if (req.headers['x-profile'] !== 'true') {
return next();
}
const title = `req-${req.method}-${req.path}-${Date.now()}`;
v8Profiler.startProfiling(title, true);
const originalEnd = res.end;
res.end = function (...args) {
const profile = v8Profiler.stopProfiling(title);
profile.export((error, result) => {
if (!error) {
require('fs').writeFileSync(
`./profiles/${title}.cpuprofile`,
result
);
}
profile.delete();
});
originalEnd.apply(res, args);
};
next();
}
Go CPU 프로파일링:
package main
import (
"net/http"
_ "net/http/pprof"
"runtime"
)
func main() {
// pprof 엔드포인트 활성화
go func() {
http.ListenAndServe("localhost:6060", nil)
}()
// CPU 프로파일 수집: go tool pprof http://localhost:6060/debug/pprof/profile?seconds=30
// Flame Graph 생성: go tool pprof -http=:8080 profile.pb.gz
// 또는 프로그래밍 방식으로
// runtime.SetCPUProfileRate(100)
// pprof.StartCPUProfile(f)
// defer pprof.StopCPUProfile()
runtime.SetBlockProfileRate(1)
runtime.SetMutexProfileFraction(1)
// 애플리케이션 로직
startServer()
}
Python CPU 프로파일링:
import cProfile
import pstats
from pyinstrument import Profiler
# cProfile 사용
def profile_with_cprofile(func):
def wrapper(*args, **kwargs):
profiler = cProfile.Profile()
profiler.enable()
result = func(*args, **kwargs)
profiler.disable()
stats = pstats.Stats(profiler)
stats.sort_stats('cumulative')
stats.print_stats(20) # 상위 20개 함수
return result
return wrapper
# pyinstrument 사용 (더 읽기 쉬운 출력)
def profile_with_pyinstrument(func):
def wrapper(*args, **kwargs):
profiler = Profiler()
profiler.start()
result = func(*args, **kwargs)
profiler.stop()
print(profiler.output_text(unicode=True))
return result
return wrapper
# Django 미들웨어
class ProfilingMiddleware:
def __init__(self, get_response):
self.get_response = get_response
def __call__(self, request):
if request.META.get('HTTP_X_PROFILE') == 'true':
profiler = Profiler()
profiler.start()
response = self.get_response(request)
profiler.stop()
response['X-Profile-Duration'] = str(profiler.last_session.duration)
# HTML 프로파일 결과를 파일로 저장
profiler.open_in_browser()
return response
return self.get_response(request)
2.2 메모리 프로파일링
// Node.js 힙 스냅샷
const v8 = require('v8');
const fs = require('fs');
function takeHeapSnapshot() {
const snapshotStream = v8.writeHeapSnapshot();
console.log(`Heap snapshot written to: ${snapshotStream}`);
return snapshotStream;
}
// 메모리 사용량 모니터링
function monitorMemory(intervalMs = 5000) {
setInterval(() => {
const usage = process.memoryUsage();
console.log({
rss_mb: Math.round(usage.rss / 1024 / 1024),
heapTotal_mb: Math.round(usage.heapTotal / 1024 / 1024),
heapUsed_mb: Math.round(usage.heapUsed / 1024 / 1024),
external_mb: Math.round(usage.external / 1024 / 1024),
arrayBuffers_mb: Math.round(usage.arrayBuffers / 1024 / 1024)
});
}, intervalMs);
}
// 메모리 누수 감지 패턴
class MemoryLeakDetector {
constructor(options = {}) {
this.samples = [];
this.maxSamples = options.maxSamples || 60;
this.threshold = options.thresholdMB || 50;
}
sample() {
const usage = process.memoryUsage();
this.samples.push({
timestamp: Date.now(),
heapUsed: usage.heapUsed
});
if (this.samples.length > this.maxSamples) {
this.samples.shift();
}
return this.detectLeak();
}
detectLeak() {
if (this.samples.length < 10) return null;
const first = this.samples[0].heapUsed;
const last = this.samples[this.samples.length - 1].heapUsed;
const diffMB = (last - first) / 1024 / 1024;
// 지속적인 메모리 증가 패턴 감지
let increasing = 0;
for (let i = 1; i < this.samples.length; i++) {
if (this.samples[i].heapUsed > this.samples[i - 1].heapUsed) {
increasing++;
}
}
const increaseRatio = increasing / (this.samples.length - 1);
if (diffMB > this.threshold && increaseRatio > 0.7) {
return {
suspected: true,
growthMB: diffMB.toFixed(2),
increaseRatio: increaseRatio.toFixed(2),
duration: this.samples[this.samples.length - 1].timestamp - this.samples[0].timestamp
};
}
return null;
}
}
2.3 I/O 프로파일링
# Python - I/O 프로파일링
import time
import functools
import logging
from contextlib import contextmanager
logger = logging.getLogger('io_profiler')
class IOProfiler:
"""I/O 작업 시간 측정 데코레이터 및 컨텍스트 매니저"""
_stats = {}
@classmethod
def track(cls, operation_name):
def decorator(func):
@functools.wraps(func)
async def async_wrapper(*args, **kwargs):
start = time.perf_counter()
try:
result = await func(*args, **kwargs)
duration = time.perf_counter() - start
cls._record(operation_name, duration, success=True)
return result
except Exception as e:
duration = time.perf_counter() - start
cls._record(operation_name, duration, success=False)
raise
@functools.wraps(func)
def sync_wrapper(*args, **kwargs):
start = time.perf_counter()
try:
result = func(*args, **kwargs)
duration = time.perf_counter() - start
cls._record(operation_name, duration, success=True)
return result
except Exception as e:
duration = time.perf_counter() - start
cls._record(operation_name, duration, success=False)
raise
import asyncio
if asyncio.iscoroutinefunction(func):
return async_wrapper
return sync_wrapper
return decorator
@classmethod
def _record(cls, name, duration, success):
if name not in cls._stats:
cls._stats[name] = {
'count': 0, 'total_time': 0,
'min_time': float('inf'), 'max_time': 0,
'errors': 0
}
stats = cls._stats[name]
stats['count'] += 1
stats['total_time'] += duration
stats['min_time'] = min(stats['min_time'], duration)
stats['max_time'] = max(stats['max_time'], duration)
if not success:
stats['errors'] += 1
@classmethod
def report(cls):
for name, stats in sorted(cls._stats.items()):
avg = stats['total_time'] / stats['count'] if stats['count'] else 0
logger.info(
f"{name}: count={stats['count']}, "
f"avg={avg*1000:.1f}ms, "
f"min={stats['min_time']*1000:.1f}ms, "
f"max={stats['max_time']*1000:.1f}ms, "
f"errors={stats['errors']}"
)
# 사용 예시
class UserRepository:
@IOProfiler.track('db.users.find_by_id')
async def find_by_id(self, user_id):
return await self.db.users.find_one({"_id": user_id})
@IOProfiler.track('db.users.search')
async def search(self, query, limit=20):
return await self.db.users.find(query).limit(limit).to_list(limit)
class ExternalAPIClient:
@IOProfiler.track('api.payment.charge')
async def charge(self, amount, token):
async with self.session.post('/charge', json={"amount": amount, "token": token}) as resp:
return await resp.json()
3. 부하 테스트(Load Testing)
3.1 부하 테스트 도구 비교
| 도구 | 언어 | 프로토콜 | 강점 | 약점 |
|---|---|---|---|---|
| k6 | JavaScript | HTTP, WebSocket, gRPC | 개발자 친화적, CI/CD 통합 | 브라우저 테스트 제한적 |
| Artillery | JavaScript | HTTP, WebSocket, Socket.io | 설정 기반, 확장성 | 복잡한 시나리오 어려움 |
| Locust | Python | HTTP | Python 스크립트, 분산 | 프로토콜 제한적 |
| Gatling | Scala/Java | HTTP, WebSocket | 상세 리포트, JVM 성능 | 학습 곡선 |
| JMeter | Java | 다양함 | GUI, 다양한 프로토콜 | 리소스 소비 큼, 구식 |
3.2 k6 스크립트 예시
import http from 'k6/http';
import { check, sleep, group } from 'k6';
import { Rate, Trend, Counter } from 'k6/metrics';
// 커스텀 메트릭
const errorRate = new Rate('errors');
const apiDuration = new Trend('api_duration', true);
const requestCount = new Counter('requests');
// 테스트 옵션
export const options = {
scenarios: {
// 시나리오 1: 일반 부하 테스트
normal_load: {
executor: 'ramping-vus',
startVUs: 0,
stages: [
{ duration: '2m', target: 50 }, // 2분간 50 VU까지 증가
{ duration: '5m', target: 50 }, // 5분간 50 VU 유지
{ duration: '2m', target: 100 }, // 2분간 100 VU까지 증가
{ duration: '5m', target: 100 }, // 5분간 100 VU 유지
{ duration: '2m', target: 0 }, // 2분간 0으로 감소
],
},
// 시나리오 2: 스파이크 테스트
spike_test: {
executor: 'ramping-vus',
startVUs: 0,
startTime: '16m',
stages: [
{ duration: '10s', target: 500 }, // 급격한 스파이크
{ duration: '1m', target: 500 }, // 유지
{ duration: '10s', target: 0 }, // 급격한 감소
],
},
},
thresholds: {
http_req_duration: ['p(95)<200', 'p(99)<500'],
errors: ['rate<0.01'],
http_req_failed: ['rate<0.01'],
},
};
const BASE_URL = __ENV.BASE_URL || 'http://localhost:3000';
export default function () {
const authToken = login();
group('API Operations', () => {
group('List Products', () => {
const res = http.get(`${BASE_URL}/api/products?page=1&limit=20`, {
headers: { Authorization: `Bearer ${authToken}` },
tags: { name: 'GET /api/products' },
});
check(res, {
'status is 200': (r) => r.status === 200,
'response time OK': (r) => r.timings.duration < 200,
'has products': (r) => JSON.parse(r.body).data.length > 0,
});
errorRate.add(res.status !== 200);
apiDuration.add(res.timings.duration);
requestCount.add(1);
});
group('Get Product Detail', () => {
const productId = Math.floor(Math.random() * 1000) + 1;
const res = http.get(`${BASE_URL}/api/products/${productId}`, {
headers: { Authorization: `Bearer ${authToken}` },
tags: { name: 'GET /api/products/:id' },
});
check(res, {
'status is 200': (r) => r.status === 200,
'has product data': (r) => {
const body = JSON.parse(r.body);
return body.data && body.data.id;
},
});
errorRate.add(res.status !== 200);
apiDuration.add(res.timings.duration);
});
group('Create Order', () => {
const payload = JSON.stringify({
productId: Math.floor(Math.random() * 1000) + 1,
quantity: Math.floor(Math.random() * 5) + 1,
shippingAddress: '123 Test Street',
});
const res = http.post(`${BASE_URL}/api/orders`, payload, {
headers: {
Authorization: `Bearer ${authToken}`,
'Content-Type': 'application/json',
},
tags: { name: 'POST /api/orders' },
});
check(res, {
'order created': (r) => r.status === 201,
'has order id': (r) => JSON.parse(r.body).data.orderId,
});
errorRate.add(res.status !== 201);
apiDuration.add(res.timings.duration);
});
});
sleep(Math.random() * 3 + 1); // 1-4초 사이 대기
}
function login() {
const res = http.post(`${BASE_URL}/api/auth/login`, JSON.stringify({
email: `user${__VU}@test.com`,
password: 'testpassword',
}), {
headers: { 'Content-Type': 'application/json' },
tags: { name: 'POST /api/auth/login' },
});
return res.status === 200 ? JSON.parse(res.body).token : '';
}
3.3 Artillery 설정 예시
# artillery-config.yml
config:
target: "http://localhost:3000"
phases:
- duration: 120
arrivalRate: 10
name: "Warm up"
- duration: 300
arrivalRate: 50
name: "Normal load"
- duration: 120
arrivalRate: 100
name: "Peak load"
defaults:
headers:
Content-Type: "application/json"
plugins:
expect: {}
metrics-by-endpoint: {}
ensure:
thresholds:
- http.response_time.p95: 200
- http.response_time.p99: 500
scenarios:
- name: "User browsing flow"
weight: 70
flow:
- post:
url: "/api/auth/login"
json:
email: "user@test.com"
password: "password123"
capture:
- json: "$.token"
as: "authToken"
expect:
- statusCode: 200
- get:
url: "/api/products?page=1&limit=20"
headers:
Authorization: "Bearer {{ authToken }}"
expect:
- statusCode: 200
- hasProperty: "data"
- think: 2
- get:
url: "/api/products/{{ $randomNumber(1, 1000) }}"
headers:
Authorization: "Bearer {{ authToken }}"
expect:
- statusCode: 200
- name: "Order creation flow"
weight: 30
flow:
- post:
url: "/api/auth/login"
json:
email: "buyer@test.com"
password: "password123"
capture:
- json: "$.token"
as: "authToken"
- post:
url: "/api/orders"
headers:
Authorization: "Bearer {{ authToken }}"
json:
productId: "{{ $randomNumber(1, 100) }}"
quantity: "{{ $randomNumber(1, 5) }}"
expect:
- statusCode: 201
3.4 부하 테스트 유형
| 유형 | 목적 | VU 패턴 | 기간 |
|---|---|---|---|
| Smoke | 기본 동작 확인 | 1-5 | 1-5분 |
| Load | 예상 트래픽 처리 확인 | 예상치 | 15-60분 |
| Stress | 한계점 탐색 | 예상치 초과 | 30-60분 |
| Spike | 급격한 트래픽 대응 | 갑작스런 급증 | 5-10분 |
| Soak | 장시간 안정성 확인 | 일정 수준 유지 | 2-24시간 |
| Breakpoint | 시스템 파괴점 탐색 | 지속적 증가 | 가변적 |
4. 핵심 성능 메트릭
4.1 RED Method
# RED Method 모니터링 구현
from prometheus_client import Counter, Histogram, Gauge
import time
# Rate: 초당 요청 수
request_count = Counter(
'http_requests_total',
'Total HTTP requests',
['method', 'endpoint', 'status']
)
# Errors: 에러 비율
error_count = Counter(
'http_errors_total',
'Total HTTP errors',
['method', 'endpoint', 'error_type']
)
# Duration: 응답 시간 분포
request_duration = Histogram(
'http_request_duration_seconds',
'HTTP request duration',
['method', 'endpoint'],
buckets=[0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1.0, 2.5, 5.0, 10.0]
)
# 미들웨어 구현
class REDMetricsMiddleware:
def __init__(self, app):
self.app = app
async def __call__(self, scope, receive, send):
if scope['type'] != 'http':
return await self.app(scope, receive, send)
method = scope.get('method', 'UNKNOWN')
path = scope.get('path', '/')
status_code = 500
start = time.perf_counter()
try:
# 응답 상태 코드 캡처
async def send_wrapper(message):
nonlocal status_code
if message['type'] == 'http.response.start':
status_code = message['status']
await send(message)
await self.app(scope, receive, send_wrapper)
except Exception as e:
error_count.labels(method=method, endpoint=path, error_type=type(e).__name__).inc()
raise
finally:
duration = time.perf_counter() - start
request_count.labels(method=method, endpoint=path, status=str(status_code)).inc()
request_duration.labels(method=method, endpoint=path).observe(duration)
if status_code >= 400:
error_count.labels(method=method, endpoint=path, error_type=f'http_{status_code}').inc()
4.2 지연시간 백분위(Percentiles)
평균(Mean) p50 p95 p99 p99.9 Max
사용자 영향도 낮음 중간 높음 높음 매우높음 극단적
p50 (중앙값): 50%의 요청이 이 시간 이내에 완료
p95: 95%의 요청이 이 시간 이내에 완료 (20개 중 1개가 이보다 느림)
p99: 99%의 요청이 이 시간 이내에 완료 (100개 중 1개가 이보다 느림)
왜 평균은 위험한가?
- 평균 50ms여도 p99가 5000ms일 수 있음
- 매 100번째 요청마다 사용자가 5초를 대기
- 헤비 유저일수록 높은 백분위에 노출될 확률 증가
5. 일반적인 병목 지점
5.1 데이터베이스 병목
N+1 쿼리 문제:
# BAD: N+1 쿼리 - 주문 100개면 101번 쿼리 실행
orders = Order.objects.all()[:100]
for order in orders:
# 각 주문마다 별도 쿼리로 사용자 정보 조회
print(f"Order {order.id} by {order.user.name}")
# GOOD: Eager loading - 2번의 쿼리로 해결
orders = Order.objects.select_related('user').all()[:100]
for order in orders:
print(f"Order {order.id} by {order.user.name}")
# GOOD: Prefetch (M:N 관계)
orders = Order.objects.prefetch_related('items__product').all()[:100]
for order in orders:
for item in order.items.all():
print(f" - {item.product.name}")
// Node.js + Prisma - N+1 해결
// BAD: N+1
const orders = await prisma.order.findMany({ take: 100 });
for (const order of orders) {
const user = await prisma.user.findUnique({
where: { id: order.userId }
});
}
// GOOD: Include (Join)
const orders = await prisma.order.findMany({
take: 100,
include: {
user: true,
items: {
include: { product: true }
}
}
});
// GOOD: DataLoader 패턴
const DataLoader = require('dataloader');
const userLoader = new DataLoader(async (userIds) => {
const users = await prisma.user.findMany({
where: { id: { in: [...userIds] } }
});
const userMap = new Map(users.map(u => [u.id, u]));
return userIds.map(id => userMap.get(id));
});
// 여러 번 호출해도 자동으로 배치 처리
const user1 = await userLoader.load(1);
const user2 = await userLoader.load(2);
5.2 인덱스 부재와 전체 테이블 스캔
-- 느린 쿼리 탐지 (PostgreSQL)
SELECT
query,
calls,
mean_exec_time,
total_exec_time,
rows
FROM pg_stat_statements
ORDER BY mean_exec_time DESC
LIMIT 20;
-- 실행 계획 분석
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT o.*, u.name
FROM orders o
JOIN users u ON o.user_id = u.id
WHERE o.status = 'pending'
AND o.created_at > NOW() - INTERVAL '7 days'
ORDER BY o.created_at DESC
LIMIT 50;
-- 복합 인덱스 생성 (쿼리 패턴에 맞게)
CREATE INDEX CONCURRENTLY idx_orders_status_created
ON orders (status, created_at DESC)
WHERE status IN ('pending', 'processing');
-- 인덱스 사용률 확인
SELECT
schemaname,
tablename,
indexname,
idx_scan,
idx_tup_read,
idx_tup_fetch
FROM pg_stat_user_indexes
ORDER BY idx_scan ASC;
5.3 커넥션 풀 고갈
// HikariCP 최적 설정 (Java/Spring Boot)
// application.yml
/*
spring:
datasource:
hikari:
maximum-pool-size: 20
minimum-idle: 5
idle-timeout: 300000
max-lifetime: 600000
connection-timeout: 30000
leak-detection-threshold: 60000
pool-name: "MainPool"
*/
// 커넥션 풀 모니터링
import com.zaxxer.hikari.HikariDataSource;
import com.zaxxer.hikari.HikariPoolMXBean;
public class ConnectionPoolMonitor {
private final HikariDataSource dataSource;
public PoolStats getStats() {
HikariPoolMXBean poolBean = dataSource.getHikariPoolMXBean();
return new PoolStats(
poolBean.getTotalConnections(),
poolBean.getActiveConnections(),
poolBean.getIdleConnections(),
poolBean.getThreadsAwaitingConnection()
);
}
public void logWarningIfNeeded() {
PoolStats stats = getStats();
double utilization = (double) stats.active / stats.total;
if (utilization > 0.8) {
log.warn("Connection pool utilization HIGH: {}% ({}/{})",
Math.round(utilization * 100),
stats.active, stats.total);
}
if (stats.waiting > 0) {
log.error("Threads waiting for connection: {}", stats.waiting);
}
}
}
# PgBouncer 설정 (PostgreSQL 커넥션 풀러)
# pgbouncer.ini
"""
[databases]
mydb = host=127.0.0.1 port=5432 dbname=mydb
[pgbouncer]
listen_port = 6432
listen_addr = 0.0.0.0
auth_type = md5
auth_file = /etc/pgbouncer/userlist.txt
pool_mode = transaction
default_pool_size = 25
min_pool_size = 5
reserve_pool_size = 5
reserve_pool_timeout = 3
max_client_conn = 1000
max_db_connections = 50
server_idle_timeout = 600
server_lifetime = 3600
client_idle_timeout = 0
log_connections = 1
log_disconnections = 1
log_pooler_errors = 1
stats_period = 60
"""
5.4 잠금 경합(Lock Contention)
// Go - 잠금 경합 프로파일링
package main
import (
"runtime"
"sync"
"time"
)
// BAD: 글로벌 뮤텍스로 전체 맵 잠금
type BadCache struct {
mu sync.Mutex
items map[string]interface{}
}
func (c *BadCache) Get(key string) interface{} {
c.mu.Lock()
defer c.mu.Unlock()
return c.items[key]
}
// GOOD: 샤딩으로 잠금 경합 분산
type ShardedCache struct {
shards [256]shard
shardMask uint8
}
type shard struct {
mu sync.RWMutex
items map[string]interface{}
}
func NewShardedCache() *ShardedCache {
c := &ShardedCache{shardMask: 255}
for i := range c.shards {
c.shards[i].items = make(map[string]interface{})
}
return c
}
func (c *ShardedCache) getShard(key string) *shard {
hash := fnv32(key)
return &c.shards[hash&uint32(c.shardMask)]
}
func (c *ShardedCache) Get(key string) (interface{}, bool) {
s := c.getShard(key)
s.mu.RLock()
defer s.mu.RUnlock()
val, ok := s.items[key]
return val, ok
}
func (c *ShardedCache) Set(key string, value interface{}) {
s := c.getShard(key)
s.mu.Lock()
defer s.mu.Unlock()
s.items[key] = value
}
func fnv32(key string) uint32 {
hash := uint32(2166136261)
for i := 0; i < len(key); i++ {
hash *= 16777619
hash ^= uint32(key[i])
}
return hash
}
6. 데이터베이스 최적화
6.1 쿼리 최적화 전략
-- 1. 서브쿼리를 JOIN으로 변환
-- BAD
SELECT * FROM orders
WHERE user_id IN (SELECT id FROM users WHERE status = 'active');
-- GOOD
SELECT o.* FROM orders o
INNER JOIN users u ON o.user_id = u.id
WHERE u.status = 'active';
-- 2. EXISTS vs IN (대량 데이터)
-- GOOD: EXISTS (서브쿼리 결과가 큰 경우)
SELECT * FROM orders o
WHERE EXISTS (
SELECT 1 FROM users u
WHERE u.id = o.user_id AND u.status = 'active'
);
-- 3. 페이지네이션 최적화
-- BAD: OFFSET 기반 (깊은 페이지에서 느림)
SELECT * FROM products ORDER BY id LIMIT 20 OFFSET 10000;
-- GOOD: 커서 기반 (일정한 성능)
SELECT * FROM products
WHERE id > 10000
ORDER BY id
LIMIT 20;
-- 4. 집계 쿼리 최적화
-- BAD: COUNT(*)를 자주 호출
SELECT COUNT(*) FROM orders WHERE status = 'pending';
-- GOOD: 대략적인 카운트 사용 (PostgreSQL)
SELECT reltuples::bigint AS estimate
FROM pg_class WHERE relname = 'orders';
-- 5. 파티셔닝
CREATE TABLE orders (
id BIGSERIAL,
user_id BIGINT NOT NULL,
status VARCHAR(20) NOT NULL,
created_at TIMESTAMP NOT NULL,
total_amount DECIMAL(10,2)
) PARTITION BY RANGE (created_at);
CREATE TABLE orders_2025_q1 PARTITION OF orders
FOR VALUES FROM ('2025-01-01') TO ('2025-04-01');
CREATE TABLE orders_2025_q2 PARTITION OF orders
FOR VALUES FROM ('2025-04-01') TO ('2025-07-01');
6.2 읽기 복제본(Read Replica)
# SQLAlchemy - 읽기/쓰기 분리
from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker
class DatabaseRouter:
def __init__(self):
self.writer = create_engine(
'postgresql://writer:pass@primary:5432/mydb',
pool_size=10,
max_overflow=20
)
self.readers = [
create_engine(
f'postgresql://reader:pass@replica{i}:5432/mydb',
pool_size=10,
max_overflow=20
)
for i in range(1, 4) # 3개의 읽기 복제본
]
self._reader_index = 0
def get_writer_session(self):
Session = sessionmaker(bind=self.writer)
return Session()
def get_reader_session(self):
# 라운드 로빈으로 읽기 복제본 선택
reader = self.readers[self._reader_index % len(self.readers)]
self._reader_index += 1
Session = sessionmaker(bind=reader)
return Session()
# 사용 예시
db = DatabaseRouter()
# 쓰기 작업
with db.get_writer_session() as session:
new_order = Order(user_id=1, total=99.99)
session.add(new_order)
session.commit()
# 읽기 작업 (복제본 사용)
with db.get_reader_session() as session:
orders = session.query(Order).filter_by(status='pending').all()
7. 캐싱 전략
7.1 Cache-Aside 패턴
import redis
import json
from functools import wraps
redis_client = redis.Redis(host='localhost', port=6379, db=0)
class CacheAside:
"""Cache-Aside (Lazy Loading) 패턴 구현"""
@staticmethod
def cached(key_prefix, ttl_seconds=300):
def decorator(func):
@wraps(func)
async def wrapper(*args, **kwargs):
# 캐시 키 생성
cache_key = f"{key_prefix}:{':'.join(str(a) for a in args)}"
# 1. 캐시에서 조회
cached = redis_client.get(cache_key)
if cached:
return json.loads(cached)
# 2. 캐시 미스 - DB에서 조회
result = await func(*args, **kwargs)
# 3. 결과를 캐시에 저장
if result is not None:
redis_client.setex(
cache_key,
ttl_seconds,
json.dumps(result, default=str)
)
return result
return wrapper
return decorator
@staticmethod
def invalidate(key_pattern):
"""패턴 기반 캐시 무효화"""
keys = redis_client.keys(key_pattern)
if keys:
redis_client.delete(*keys)
# 사용 예시
class ProductService:
@CacheAside.cached('product', ttl_seconds=600)
async def get_product(self, product_id):
return await self.db.products.find_one({"_id": product_id})
@CacheAside.cached('product:list', ttl_seconds=120)
async def list_products(self, category, page):
return await self.db.products.find(
{"category": category}
).skip((page - 1) * 20).limit(20).to_list(20)
async def update_product(self, product_id, data):
await self.db.products.update_one(
{"_id": product_id},
{"$set": data}
)
# 관련 캐시 무효화
CacheAside.invalidate(f'product:{product_id}')
CacheAside.invalidate('product:list:*')
7.2 Write-Through와 Write-Behind
class WriteThrough:
"""Write-Through: 캐시와 DB를 동시에 업데이트"""
async def update(self, key, value, ttl=300):
# 1. DB에 쓰기
await self.db.update(key, value)
# 2. 캐시 업데이트 (DB 쓰기 성공 후)
redis_client.setex(f"wt:{key}", ttl, json.dumps(value, default=str))
async def get(self, key):
# 캐시에서 조회 (항상 최신 데이터)
cached = redis_client.get(f"wt:{key}")
if cached:
return json.loads(cached)
# 캐시 미스 시 DB 조회 후 캐시 저장
value = await self.db.get(key)
if value:
redis_client.setex(f"wt:{key}", 300, json.dumps(value, default=str))
return value
class WriteBehind:
"""Write-Behind (Write-Back): 캐시에 먼저 쓰고, 비동기로 DB에 반영"""
def __init__(self):
self.write_queue = asyncio.Queue()
self.batch_size = 100
self.flush_interval = 5 # 초
async def update(self, key, value, ttl=300):
# 1. 캐시에 즉시 쓰기 (빠른 응답)
redis_client.setex(f"wb:{key}", ttl, json.dumps(value, default=str))
# 2. 큐에 추가 (비동기 DB 쓰기)
await self.write_queue.put((key, value))
async def flush_worker(self):
"""백그라운드 워커: 큐에서 꺼내서 DB에 배치 쓰기"""
while True:
batch = []
try:
while len(batch) < self.batch_size:
item = await asyncio.wait_for(
self.write_queue.get(),
timeout=self.flush_interval
)
batch.append(item)
except asyncio.TimeoutError:
pass
if batch:
try:
await self.db.bulk_update(batch)
except Exception as e:
# 실패 시 재시도 큐에 추가
for item in batch:
await self.write_queue.put(item)
await asyncio.sleep(1)
7.3 TTL 전략과 캐시 무효화
# 다층 TTL 전략
class TieredTTLCache:
TTL_CONFIG = {
# 자주 변경되는 데이터
'user:session': 1800, # 30분
'cart:items': 900, # 15분
# 주기적으로 변경되는 데이터
'product:detail': 3600, # 1시간
'product:list': 600, # 10분
'search:results': 300, # 5분
# 거의 변경되지 않는 데이터
'category:list': 86400, # 24시간
'config:settings': 86400, # 24시간
'static:content': 604800, # 7일
}
@classmethod
def get_ttl(cls, key_type):
return cls.TTL_CONFIG.get(key_type, 300) # 기본 5분
@staticmethod
def stale_while_revalidate(key, ttl, stale_ttl):
"""Stale-While-Revalidate 패턴"""
cached = redis_client.get(key)
if cached:
data = json.loads(cached)
if data['_cached_at'] + ttl > time.time():
return data['value'], False # 신선한 데이터
if data['_cached_at'] + stale_ttl > time.time():
return data['value'], True # 부실하지만 사용 가능
return None, True # 캐시 미스
# 캐시 워밍(Pre-warming)
class CacheWarmer:
async def warm_popular_products(self):
"""인기 상품 캐시 사전 로딩"""
popular = await self.db.products.find(
{"popular": True}
).limit(100).to_list(100)
pipe = redis_client.pipeline()
for product in popular:
key = f"product:{product['_id']}"
pipe.setex(key, 3600, json.dumps(product, default=str))
pipe.execute()
8. 비동기 처리
8.1 메시지 큐 기반 비동기 처리
# Celery를 사용한 비동기 태스크 처리
from celery import Celery, chain, group, chord
app = Celery('tasks', broker='redis://localhost:6379/0')
# 설정
app.conf.update(
task_serializer='json',
accept_content=['json'],
result_serializer='json',
timezone='UTC',
task_acks_late=True,
worker_prefetch_multiplier=1,
task_reject_on_worker_lost=True,
task_routes={
'tasks.send_email': {'queue': 'email'},
'tasks.process_image': {'queue': 'image'},
'tasks.generate_report': {'queue': 'report'},
}
)
@app.task(bind=True, max_retries=3, default_retry_delay=60)
def send_email(self, to, subject, body):
try:
email_service.send(to=to, subject=subject, body=body)
except Exception as exc:
self.retry(exc=exc)
@app.task(bind=True, max_retries=3)
def process_order(self, order_id):
"""주문 처리 파이프라인"""
try:
order = Order.objects.get(id=order_id)
# 체인으로 순차 처리
workflow = chain(
validate_inventory.s(order_id),
process_payment.s(order_id),
send_confirmation_email.s(order_id),
update_analytics.s(order_id)
)
workflow.apply_async()
except Exception as exc:
self.retry(exc=exc, countdown=30)
@app.task
def validate_inventory(result, order_id):
# 재고 확인
order = Order.objects.get(id=order_id)
for item in order.items.all():
if item.product.stock < item.quantity:
raise InsufficientStockError(item.product.name)
return True
@app.task
def bulk_process_orders(order_ids):
"""병렬 배치 처리"""
job = group(process_order.s(oid) for oid in order_ids)
result = job.apply_async()
return result
8.2 이벤트 드리븐 아키텍처
// Node.js - EventEmitter 기반 비동기 처리
const EventEmitter = require('events');
class OrderEventBus extends EventEmitter {
constructor() {
super();
this.setMaxListeners(20);
}
}
const orderBus = new OrderEventBus();
// 이벤트 핸들러 등록 (관심사 분리)
orderBus.on('order.created', async (order) => {
// 재고 업데이트
await inventoryService.decrementStock(order.items);
});
orderBus.on('order.created', async (order) => {
// 확인 이메일 발송
await emailService.sendOrderConfirmation(order);
});
orderBus.on('order.created', async (order) => {
// 분석 데이터 업데이트
await analyticsService.trackOrder(order);
});
orderBus.on('order.created', async (order) => {
// 추천 시스템 업데이트
await recommendationService.recordPurchase(order.userId, order.items);
});
// 주문 생성 시 이벤트 발행
class OrderService {
async createOrder(orderData) {
const order = await this.orderRepo.create(orderData);
// 동기적으로 필수 작업만 수행
// 나머지는 이벤트로 비동기 처리
orderBus.emit('order.created', order);
return order; // 빠르게 응답
}
}
9. 배치 최적화
9.1 벌크 인서트
# SQLAlchemy 벌크 인서트 비교
import time
# BAD: 하나씩 삽입 (N번의 INSERT)
def insert_one_by_one(session, records):
start = time.time()
for record in records:
session.add(MyModel(**record))
session.commit()
print(f"One by one: {time.time() - start:.2f}s")
# GOOD: 벌크 삽입 (1번의 INSERT)
def bulk_insert(session, records):
start = time.time()
session.bulk_insert_mappings(MyModel, records)
session.commit()
print(f"Bulk insert: {time.time() - start:.2f}s")
# BETTER: execute_values (PostgreSQL, psycopg2)
def execute_values_insert(conn, records):
start = time.time()
from psycopg2.extras import execute_values
cursor = conn.cursor()
execute_values(
cursor,
"INSERT INTO my_table (col1, col2, col3) VALUES %s",
[(r['col1'], r['col2'], r['col3']) for r in records],
page_size=1000
)
conn.commit()
print(f"execute_values: {time.time() - start:.2f}s")
# 성능 비교 (10,000건 기준)
# One by one: 12.5s
# Bulk insert: 0.8s
# execute_values: 0.3s
9.2 배치 API 호출
// 외부 API 배치 호출 최적화
class BatchAPIClient {
constructor(options = {}) {
this.batchSize = options.batchSize || 50;
this.concurrency = options.concurrency || 5;
this.retryAttempts = options.retryAttempts || 3;
this.delayBetweenBatches = options.delayMs || 100;
}
async processBatch(items, processFn) {
const results = [];
const errors = [];
// 아이템을 배치로 분할
const batches = [];
for (let i = 0; i < items.length; i += this.batchSize) {
batches.push(items.slice(i, i + this.batchSize));
}
// 동시성 제한하여 배치 처리
for (let i = 0; i < batches.length; i += this.concurrency) {
const concurrentBatches = batches.slice(i, i + this.concurrency);
const batchResults = await Promise.allSettled(
concurrentBatches.map(batch => this.processWithRetry(batch, processFn))
);
for (const result of batchResults) {
if (result.status === 'fulfilled') {
results.push(...result.value);
} else {
errors.push(result.reason);
}
}
// 배치 간 딜레이 (Rate limiting 방지)
if (i + this.concurrency < batches.length) {
await new Promise(r => setTimeout(r, this.delayBetweenBatches));
}
}
return { results, errors, total: items.length, processed: results.length };
}
async processWithRetry(batch, processFn, attempt = 1) {
try {
return await processFn(batch);
} catch (error) {
if (attempt < this.retryAttempts) {
const delay = Math.pow(2, attempt) * 1000; // 지수 백오프
await new Promise(r => setTimeout(r, delay));
return this.processWithRetry(batch, processFn, attempt + 1);
}
throw error;
}
}
}
// 사용 예시
const client = new BatchAPIClient({ batchSize: 100, concurrency: 3 });
const result = await client.processBatch(userIds, async (batch) => {
const response = await fetch('/api/users/batch', {
method: 'POST',
body: JSON.stringify({ ids: batch }),
headers: { 'Content-Type': 'application/json' }
});
return response.json();
});
10. HTTP 최적화
10.1 압축과 프로토콜 최적화
// Express.js 압축 설정
const compression = require('compression');
app.use(compression({
filter: (req, res) => {
if (req.headers['x-no-compression']) return false;
return compression.filter(req, res);
},
level: 6, // 압축 레벨 (1-9, 6이 균형점)
threshold: 1024, // 1KB 이상만 압축
memLevel: 8, // 메모리 사용량 (1-9)
}));
// HTTP/2 서버 설정
const http2 = require('http2');
const fs = require('fs');
const server = http2.createSecureServer({
key: fs.readFileSync('server.key'),
cert: fs.readFileSync('server.crt'),
allowHTTP1: true,
});
server.on('stream', (stream, headers) => {
const path = headers[':path'];
// Server Push
if (path === '/index.html') {
stream.pushStream({ ':path': '/styles.css' }, (err, pushStream) => {
if (!err) {
pushStream.respond({ ':status': 200, 'content-type': 'text/css' });
pushStream.end(fs.readFileSync('styles.css'));
}
});
}
stream.respond({
':status': 200,
'content-type': 'text/html',
});
stream.end(fs.readFileSync(`.${path}`));
});
10.2 Keep-Alive와 연결 재사용
# Python requests - 세션 재사용
import requests
from requests.adapters import HTTPAdapter
from urllib3.util.retry import Retry
# BAD: 매번 새 연결
def fetch_bad(urls):
results = []
for url in urls:
response = requests.get(url) # 매번 TCP 핸드셰이크
results.append(response.json())
return results
# GOOD: 세션 재사용 (Keep-Alive)
def fetch_good(urls):
session = requests.Session()
# 재시도 설정
retry_strategy = Retry(
total=3,
backoff_factor=0.5,
status_forcelist=[500, 502, 503, 504]
)
adapter = HTTPAdapter(
max_retries=retry_strategy,
pool_connections=10,
pool_maxsize=20,
pool_block=False
)
session.mount("http://", adapter)
session.mount("https://", adapter)
results = []
for url in urls:
response = session.get(url) # 연결 재사용
results.append(response.json())
session.close()
return results
11. 애플리케이션 레벨 최적화
11.1 효율적인 직렬화
// JSON vs MessagePack vs Protobuf 비교
const msgpack = require('msgpack-lite');
// 테스트 데이터
const data = {
users: Array.from({ length: 1000 }, (_, i) => ({
id: i,
name: `User ${i}`,
email: `user${i}@example.com`,
age: 20 + (i % 50),
active: i % 3 !== 0,
tags: ['tag1', 'tag2', 'tag3'],
metadata: { loginCount: i * 10, lastLogin: new Date().toISOString() }
}))
};
// JSON
console.time('json-serialize');
const jsonStr = JSON.stringify(data);
console.timeEnd('json-serialize');
console.log(`JSON size: ${Buffer.byteLength(jsonStr)} bytes`);
// MessagePack
console.time('msgpack-serialize');
const msgpackBuf = msgpack.encode(data);
console.timeEnd('msgpack-serialize');
console.log(`MessagePack size: ${msgpackBuf.length} bytes`);
// 일반적인 결과:
// JSON size: ~120KB, serialize: ~3ms
// MessagePack size: ~85KB, serialize: ~2ms (약 30% 작음)
11.2 Object Pooling
// Apache Commons Pool2 기반 객체 풀
import org.apache.commons.pool2.BasePooledObjectFactory;
import org.apache.commons.pool2.PooledObject;
import org.apache.commons.pool2.impl.DefaultPooledObject;
import org.apache.commons.pool2.impl.GenericObjectPool;
import org.apache.commons.pool2.impl.GenericObjectPoolConfig;
public class ExpensiveObjectPool {
private final GenericObjectPool<ExpensiveObject> pool;
public ExpensiveObjectPool() {
GenericObjectPoolConfig<ExpensiveObject> config = new GenericObjectPoolConfig<>();
config.setMaxTotal(50);
config.setMaxIdle(20);
config.setMinIdle(5);
config.setTestOnBorrow(true);
config.setTestWhileIdle(true);
config.setTimeBetweenEvictionRunsMillis(30000);
pool = new GenericObjectPool<>(new ExpensiveObjectFactory(), config);
}
public ExpensiveObject borrow() throws Exception {
return pool.borrowObject();
}
public void returnObject(ExpensiveObject obj) {
pool.returnObject(obj);
}
static class ExpensiveObjectFactory extends BasePooledObjectFactory<ExpensiveObject> {
@Override
public ExpensiveObject create() {
return new ExpensiveObject(); // 비용이 큰 초기화
}
@Override
public PooledObject<ExpensiveObject> wrap(ExpensiveObject obj) {
return new DefaultPooledObject<>(obj);
}
@Override
public void passivateObject(PooledObject<ExpensiveObject> pooledObj) {
pooledObj.getObject().reset(); // 풀에 반환 시 상태 초기화
}
@Override
public boolean validateObject(PooledObject<ExpensiveObject> pooledObj) {
return pooledObj.getObject().isValid();
}
}
}
12. 프로덕션 모니터링
12.1 SLO 기반 알림 설정
# Prometheus 알림 규칙
groups:
- name: slo-alerts
rules:
# p99 지연시간 SLO 위반
- alert: HighP99Latency
expr: |
histogram_quantile(0.99,
rate(http_request_duration_seconds_bucket[5m])
) > 0.5
for: 5m
labels:
severity: warning
annotations:
summary: "p99 latency exceeds 500ms"
description: "p99 latency is at {{ $value }}s for 5 minutes"
# 에러율 SLO 위반
- alert: HighErrorRate
expr: |
sum(rate(http_requests_total{status=~"5.."}[5m]))
/
sum(rate(http_requests_total[5m])) > 0.01
for: 3m
labels:
severity: critical
annotations:
summary: "Error rate exceeds 1%"
# 처리량 급감
- alert: ThroughputDrop
expr: |
sum(rate(http_requests_total[5m]))
< 0.5 * sum(rate(http_requests_total[5m] offset 1h))
for: 5m
labels:
severity: warning
annotations:
summary: "Throughput dropped over 50% compared to 1h ago"
# 커넥션 풀 고갈 임박
- alert: ConnectionPoolExhaustion
expr: |
hikaricp_connections_active
/ hikaricp_connections_max > 0.85
for: 2m
labels:
severity: warning
annotations:
summary: "Connection pool utilization above 85%"
# GC 일시정지 시간 증가
- alert: HighGCPauseTime
expr: |
rate(jvm_gc_pause_seconds_sum[5m])
/ rate(jvm_gc_pause_seconds_count[5m]) > 0.1
for: 5m
labels:
severity: warning
annotations:
summary: "Average GC pause time exceeds 100ms"
12.2 대시보드 구성
# Grafana 대시보드 JSON 생성 (Python)
class PerformanceDashboard:
def generate_panels(self):
return {
"dashboard": {
"title": "Backend Performance",
"panels": [
# RED 메트릭
self._throughput_panel(),
self._error_rate_panel(),
self._latency_panel(),
# 리소스 사용량
self._cpu_panel(),
self._memory_panel(),
self._gc_panel(),
# 데이터베이스
self._db_query_panel(),
self._connection_pool_panel(),
self._slow_queries_panel(),
# 캐시
self._cache_hit_rate_panel(),
self._cache_latency_panel(),
# 외부 서비스
self._external_api_panel(),
]
}
}
def _latency_panel(self):
return {
"title": "API Latency Percentiles",
"type": "timeseries",
"targets": [
{
"expr": 'histogram_quantile(0.50, rate(http_request_duration_seconds_bucket[5m]))',
"legendFormat": "p50"
},
{
"expr": 'histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m]))',
"legendFormat": "p95"
},
{
"expr": 'histogram_quantile(0.99, rate(http_request_duration_seconds_bucket[5m]))',
"legendFormat": "p99"
}
]
}
13. 실전 퀴즈
Q1. 암달의 법칙이 성능 최적화에 주는 시사점은 무엇이며, 이를 실무에 어떻게 적용할 수 있나요?
암달의 법칙은 시스템 전체 성능 향상이 개선 가능한 부분의 비율에 의해 제한된다는 것을 보여줍니다.
핵심 시사점:
- 전체 실행 시간의 작은 부분(예: 5%)을 아무리 빠르게 만들어도 전체 성능 향상은 미미합니다
- 전체 실행 시간의 큰 부분(예: 80%)을 2배만 빠르게 해도 상당한 성능 향상을 얻습니다
- 따라서 프로파일링으로 가장 큰 병목을 먼저 식별한 후, 그 부분을 집중적으로 개선해야 합니다
실무 적용:
- 프로파일링(Flame Graph)으로 실행 시간 분포 파악
- 가장 큰 비율을 차지하는 병목부터 순서대로 최적화
- 각 최적화 후 다시 측정하여 새로운 병목 확인
- 성능 예산(Performance Budget) 내에 들어오면 최적화 중단
Q2. N+1 쿼리 문제란 무엇이며, ORM에서 이를 해결하는 3가지 방법을 설명하세요.
N+1 문제: 부모 엔티티 N개를 조회한 후, 각 부모의 자식 엔티티를 개별 쿼리로 조회하여 총 N+1번의 쿼리가 실행되는 문제입니다. 100개의 주문을 조회하면 101번의 DB 쿼리가 발생합니다.
해결 방법:
- Eager Loading (select_related/include): JOIN을 사용하여 부모와 자식을 한 번의 쿼리로 조회. 1:1, N:1 관계에 효과적
- Prefetch (prefetch_related): 별도 쿼리로 자식 엔티티를 일괄 조회 후 메모리에서 매핑. 1:N, M:N 관계에 효과적. IN 절 사용
- DataLoader 패턴: 여러 개별 요청을 자동으로 배치하여 하나의 쿼리로 실행. GraphQL에서 특히 유용. Facebook이 개발한 패턴
Q3. Cache-Aside와 Write-Through 캐싱 패턴의 차이점과 각각의 적합한 사용 사례를 설명하세요.
Cache-Aside (Lazy Loading):
- 읽기 시 캐시 확인, 미스 시 DB 조회 후 캐시 저장
- 애플리케이션이 캐시를 직접 관리
- 첫 번째 요청은 항상 캐시 미스 (Cold Start)
- 적합: 읽기가 많고, 모든 데이터를 캐싱할 필요가 없는 경우
Write-Through:
- 쓰기 시 캐시와 DB를 동시에 업데이트
- 캐시가 항상 최신 상태
- 쓰기 지연시간 증가 (두 곳에 쓰기)
- 적합: 데이터 일관성이 중요하고, 읽기가 쓰기보다 훨씬 많은 경우
Write-Behind는 캐시에 먼저 쓰고 DB에는 비동기로 반영하여 쓰기 성능을 극대화하지만, 데이터 손실 리스크가 있습니다.
Q4. 부하 테스트의 6가지 유형(Smoke, Load, Stress, Spike, Soak, Breakpoint)을 각각 언제 사용하는지 설명하세요.
- Smoke Test: 최소 부하(1-5 VU)로 시스템 기본 동작을 확인. 배포 후 기본 검증
- Load Test: 예상 트래픽 수준에서 성능 확인. SLO 충족 여부 검증
- Stress Test: 예상 트래픽을 초과하여 시스템 한계점 탐색. 용량 계획에 활용
- Spike Test: 갑작스러운 트래픽 급증(예: 이벤트)에 대한 시스템 반응 확인. 오토스케일링 검증
- Soak Test: 장시간(수 시간~하루) 일정 부하를 유지하여 메모리 누수, 커넥션 고갈 등 점진적 문제 발견
- Breakpoint Test: 부하를 지속적으로 증가시켜 시스템이 완전히 실패하는 지점 탐색. 절대적 한계 파악
Q5. 커넥션 풀 튜닝에서 고려해야 할 핵심 파라미터와 적절한 풀 크기를 결정하는 방법을 설명하세요.
핵심 파라미터:
- maximum-pool-size: 최대 커넥션 수 (과하면 DB 부하, 부족하면 대기)
- minimum-idle: 유휴 커넥션 최소 수 (Cold Start 방지)
- connection-timeout: 커넥션 획득 대기 시간
- idle-timeout: 유휴 커넥션 반환 시간
- max-lifetime: 커넥션 최대 수명 (DB 방화벽 타임아웃보다 짧게)
적절한 풀 크기 결정:
- HikariCP 공식:
connections = ((core_count * 2) + effective_spindle_count) - SSD의 경우:
connections = core_count * 2 + 1정도 - 일반적으로 10-20이면 충분한 경우가 많음
- 너무 큰 풀은 오히려 DB의 컨텍스트 스위칭 비용을 증가시킴
- 모니터링 기반으로 조정: 사용률이 80%를 넘으면 증가 검토, 대기 스레드가 발생하면 즉시 증가
참고 자료
- Google SRE Book - Performance Engineering
- k6 Documentation
- Artillery Documentation
- Brendan Gregg - Systems Performance
- Flame Graphs
- HikariCP - About Pool Sizing
- Redis Best Practices
- PostgreSQL Performance Tips
- Node.js Diagnostics Guide
- Go pprof Documentation
- Python cProfile Documentation
- Prometheus Monitoring
- DataLoader Pattern
Backend Performance Engineering Complete Guide 2025: Profiling, Load Testing, Bottleneck Analysis, Optimization
Table of Contents
1. Performance Engineering Mindset
1.1 Measure First, Optimize Later
The golden rule of performance engineering is "Don't guess, measure." Optimization based on intuition usually wastes time on the wrong areas.
Three Stages of Performance Optimization:
- Measure: Quantitatively measure current performance
- Analyze: Precisely identify bottleneck points
- Optimize: Resolve the most impactful bottlenecks first
1.2 Amdahl's Law
Overall system performance improvement is limited by the proportion of the improvable portion.
Overall Speedup = 1 / ((1 - P) + P / S)
P = Fraction of the program that can be improved
S = Speedup factor of the improved part
Example: Making code that accounts for 20% of total 10x faster
= 1 / ((1 - 0.2) + 0.2 / 10)
= 1 / (0.8 + 0.02)
= 1.22x (22% improvement)
Meanwhile, making code that accounts for 80% of total 2x faster
= 1 / ((1 - 0.8) + 0.8 / 2)
= 1 / (0.2 + 0.4)
= 1.67x (67% improvement)
Key takeaway: Moderately improving a large portion is more effective than dramatically improving a small portion.
1.3 Performance Budget
# Performance Budget Definition Example
performance_budget:
api_endpoints:
p50_latency_ms: 50
p95_latency_ms: 200
p99_latency_ms: 500
max_latency_ms: 2000
error_rate_percent: 0.1
throughput_rps: 1000
database:
query_p95_ms: 50
query_p99_ms: 200
connection_pool_utilization: 70
slow_query_threshold_ms: 100
external_services:
p95_latency_ms: 300
timeout_ms: 5000
retry_count: 3
circuit_breaker_threshold: 50
2. Profiling
2.1 CPU Profiling and Flame Graphs
Flame Graphs are powerful tools that visually show where CPU time is being spent.
Node.js CPU Profiling:
// Node.js - Using built-in profiler
// Run: node --prof app.js
// Analyze: node --prof-process isolate-*.log > profile.txt
// Or using v8-profiler-next
const v8Profiler = require('v8-profiler-next');
function startProfiling(durationMs = 30000) {
const title = `cpu-profile-${Date.now()}`;
v8Profiler.startProfiling(title, true);
setTimeout(() => {
const profile = v8Profiler.stopProfiling(title);
profile.export((error, result) => {
if (!error) {
require('fs').writeFileSync(
`./profiles/${title}.cpuprofile`,
result
);
}
profile.delete();
});
}, durationMs);
}
// Middleware for profiling specific requests
function profilingMiddleware(req, res, next) {
if (req.headers['x-profile'] !== 'true') {
return next();
}
const title = `req-${req.method}-${req.path}-${Date.now()}`;
v8Profiler.startProfiling(title, true);
const originalEnd = res.end;
res.end = function (...args) {
const profile = v8Profiler.stopProfiling(title);
profile.export((error, result) => {
if (!error) {
require('fs').writeFileSync(
`./profiles/${title}.cpuprofile`,
result
);
}
profile.delete();
});
originalEnd.apply(res, args);
};
next();
}
Go CPU Profiling:
package main
import (
"net/http"
_ "net/http/pprof"
"runtime"
)
func main() {
// Enable pprof endpoints
go func() {
http.ListenAndServe("localhost:6060", nil)
}()
// Collect CPU profile: go tool pprof http://localhost:6060/debug/pprof/profile?seconds=30
// Generate Flame Graph: go tool pprof -http=:8080 profile.pb.gz
runtime.SetBlockProfileRate(1)
runtime.SetMutexProfileFraction(1)
// Application logic
startServer()
}
Python CPU Profiling:
import cProfile
import pstats
from pyinstrument import Profiler
# Using cProfile
def profile_with_cprofile(func):
def wrapper(*args, **kwargs):
profiler = cProfile.Profile()
profiler.enable()
result = func(*args, **kwargs)
profiler.disable()
stats = pstats.Stats(profiler)
stats.sort_stats('cumulative')
stats.print_stats(20) # Top 20 functions
return result
return wrapper
# Using pyinstrument (more readable output)
def profile_with_pyinstrument(func):
def wrapper(*args, **kwargs):
profiler = Profiler()
profiler.start()
result = func(*args, **kwargs)
profiler.stop()
print(profiler.output_text(unicode=True))
return result
return wrapper
# Django middleware
class ProfilingMiddleware:
def __init__(self, get_response):
self.get_response = get_response
def __call__(self, request):
if request.META.get('HTTP_X_PROFILE') == 'true':
profiler = Profiler()
profiler.start()
response = self.get_response(request)
profiler.stop()
response['X-Profile-Duration'] = str(profiler.last_session.duration)
profiler.open_in_browser()
return response
return self.get_response(request)
2.2 Memory Profiling
// Node.js Heap Snapshot
const v8 = require('v8');
function takeHeapSnapshot() {
const snapshotStream = v8.writeHeapSnapshot();
console.log(`Heap snapshot written to: ${snapshotStream}`);
return snapshotStream;
}
// Memory usage monitoring
function monitorMemory(intervalMs = 5000) {
setInterval(() => {
const usage = process.memoryUsage();
console.log({
rss_mb: Math.round(usage.rss / 1024 / 1024),
heapTotal_mb: Math.round(usage.heapTotal / 1024 / 1024),
heapUsed_mb: Math.round(usage.heapUsed / 1024 / 1024),
external_mb: Math.round(usage.external / 1024 / 1024),
arrayBuffers_mb: Math.round(usage.arrayBuffers / 1024 / 1024)
});
}, intervalMs);
}
// Memory leak detection pattern
class MemoryLeakDetector {
constructor(options = {}) {
this.samples = [];
this.maxSamples = options.maxSamples || 60;
this.threshold = options.thresholdMB || 50;
}
sample() {
const usage = process.memoryUsage();
this.samples.push({
timestamp: Date.now(),
heapUsed: usage.heapUsed
});
if (this.samples.length > this.maxSamples) {
this.samples.shift();
}
return this.detectLeak();
}
detectLeak() {
if (this.samples.length < 10) return null;
const first = this.samples[0].heapUsed;
const last = this.samples[this.samples.length - 1].heapUsed;
const diffMB = (last - first) / 1024 / 1024;
let increasing = 0;
for (let i = 1; i < this.samples.length; i++) {
if (this.samples[i].heapUsed > this.samples[i - 1].heapUsed) {
increasing++;
}
}
const increaseRatio = increasing / (this.samples.length - 1);
if (diffMB > this.threshold && increaseRatio > 0.7) {
return {
suspected: true,
growthMB: diffMB.toFixed(2),
increaseRatio: increaseRatio.toFixed(2),
duration: this.samples[this.samples.length - 1].timestamp - this.samples[0].timestamp
};
}
return null;
}
}
2.3 I/O Profiling
# Python - I/O Profiling
import time
import functools
import logging
logger = logging.getLogger('io_profiler')
class IOProfiler:
"""I/O operation time measurement decorator"""
_stats = {}
@classmethod
def track(cls, operation_name):
def decorator(func):
@functools.wraps(func)
async def async_wrapper(*args, **kwargs):
start = time.perf_counter()
try:
result = await func(*args, **kwargs)
duration = time.perf_counter() - start
cls._record(operation_name, duration, success=True)
return result
except Exception as e:
duration = time.perf_counter() - start
cls._record(operation_name, duration, success=False)
raise
@functools.wraps(func)
def sync_wrapper(*args, **kwargs):
start = time.perf_counter()
try:
result = func(*args, **kwargs)
duration = time.perf_counter() - start
cls._record(operation_name, duration, success=True)
return result
except Exception as e:
duration = time.perf_counter() - start
cls._record(operation_name, duration, success=False)
raise
import asyncio
if asyncio.iscoroutinefunction(func):
return async_wrapper
return sync_wrapper
return decorator
@classmethod
def _record(cls, name, duration, success):
if name not in cls._stats:
cls._stats[name] = {
'count': 0, 'total_time': 0,
'min_time': float('inf'), 'max_time': 0,
'errors': 0
}
stats = cls._stats[name]
stats['count'] += 1
stats['total_time'] += duration
stats['min_time'] = min(stats['min_time'], duration)
stats['max_time'] = max(stats['max_time'], duration)
if not success:
stats['errors'] += 1
@classmethod
def report(cls):
for name, stats in sorted(cls._stats.items()):
avg = stats['total_time'] / stats['count'] if stats['count'] else 0
logger.info(
f"{name}: count={stats['count']}, "
f"avg={avg*1000:.1f}ms, "
f"min={stats['min_time']*1000:.1f}ms, "
f"max={stats['max_time']*1000:.1f}ms, "
f"errors={stats['errors']}"
)
3. Load Testing
3.1 Load Testing Tools Comparison
| Tool | Language | Protocols | Strengths | Weaknesses |
|---|---|---|---|---|
| k6 | JavaScript | HTTP, WebSocket, gRPC | Developer-friendly, CI/CD integration | Limited browser testing |
| Artillery | JavaScript | HTTP, WebSocket, Socket.io | Config-based, extensible | Complex scenarios difficult |
| Locust | Python | HTTP | Python scripting, distributed | Limited protocols |
| Gatling | Scala/Java | HTTP, WebSocket | Detailed reports, JVM performance | Learning curve |
| JMeter | Java | Various | GUI, various protocols | Resource-heavy, dated |
3.2 k6 Script Example
import http from 'k6/http';
import { check, sleep, group } from 'k6';
import { Rate, Trend, Counter } from 'k6/metrics';
// Custom metrics
const errorRate = new Rate('errors');
const apiDuration = new Trend('api_duration', true);
const requestCount = new Counter('requests');
// Test options
export const options = {
scenarios: {
// Scenario 1: Normal load test
normal_load: {
executor: 'ramping-vus',
startVUs: 0,
stages: [
{ duration: '2m', target: 50 }, // Ramp up to 50 VUs
{ duration: '5m', target: 50 }, // Hold at 50 VUs
{ duration: '2m', target: 100 }, // Ramp up to 100 VUs
{ duration: '5m', target: 100 }, // Hold at 100 VUs
{ duration: '2m', target: 0 }, // Ramp down
],
},
// Scenario 2: Spike test
spike_test: {
executor: 'ramping-vus',
startVUs: 0,
startTime: '16m',
stages: [
{ duration: '10s', target: 500 }, // Sudden spike
{ duration: '1m', target: 500 }, // Hold
{ duration: '10s', target: 0 }, // Rapid decrease
],
},
},
thresholds: {
http_req_duration: ['p(95)<200', 'p(99)<500'],
errors: ['rate<0.01'],
http_req_failed: ['rate<0.01'],
},
};
const BASE_URL = __ENV.BASE_URL || 'http://localhost:3000';
export default function () {
const authToken = login();
group('API Operations', () => {
group('List Products', () => {
const res = http.get(`${BASE_URL}/api/products?page=1&limit=20`, {
headers: { Authorization: `Bearer ${authToken}` },
tags: { name: 'GET /api/products' },
});
check(res, {
'status is 200': (r) => r.status === 200,
'response time OK': (r) => r.timings.duration < 200,
'has products': (r) => JSON.parse(r.body).data.length > 0,
});
errorRate.add(res.status !== 200);
apiDuration.add(res.timings.duration);
requestCount.add(1);
});
group('Create Order', () => {
const payload = JSON.stringify({
productId: Math.floor(Math.random() * 1000) + 1,
quantity: Math.floor(Math.random() * 5) + 1,
shippingAddress: '123 Test Street',
});
const res = http.post(`${BASE_URL}/api/orders`, payload, {
headers: {
Authorization: `Bearer ${authToken}`,
'Content-Type': 'application/json',
},
tags: { name: 'POST /api/orders' },
});
check(res, {
'order created': (r) => r.status === 201,
'has order id': (r) => JSON.parse(r.body).data.orderId,
});
errorRate.add(res.status !== 201);
apiDuration.add(res.timings.duration);
});
});
sleep(Math.random() * 3 + 1);
}
function login() {
const res = http.post(`${BASE_URL}/api/auth/login`, JSON.stringify({
email: `user${__VU}@test.com`,
password: 'testpassword',
}), {
headers: { 'Content-Type': 'application/json' },
tags: { name: 'POST /api/auth/login' },
});
return res.status === 200 ? JSON.parse(res.body).token : '';
}
3.3 Load Test Types
| Type | Purpose | VU Pattern | Duration |
|---|---|---|---|
| Smoke | Verify basic operation | 1-5 | 1-5 min |
| Load | Verify expected traffic handling | Expected levels | 15-60 min |
| Stress | Find breaking points | Above expected | 30-60 min |
| Spike | Test sudden traffic surges | Sudden spike | 5-10 min |
| Soak | Verify long-term stability | Steady level | 2-24 hours |
| Breakpoint | Find system failure point | Continuous increase | Variable |
4. Key Performance Metrics
4.1 RED Method
# RED Method Monitoring Implementation
from prometheus_client import Counter, Histogram
# Rate: Requests per second
request_count = Counter(
'http_requests_total',
'Total HTTP requests',
['method', 'endpoint', 'status']
)
# Errors: Error ratio
error_count = Counter(
'http_errors_total',
'Total HTTP errors',
['method', 'endpoint', 'error_type']
)
# Duration: Response time distribution
request_duration = Histogram(
'http_request_duration_seconds',
'HTTP request duration',
['method', 'endpoint'],
buckets=[0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1.0, 2.5, 5.0, 10.0]
)
4.2 Latency Percentiles
Mean p50 p95 p99 p99.9 Max
User Impact Low Med High High V.High Extreme
p50 (median): 50% of requests complete within this time
p95: 95% of requests complete within this time (1 in 20 is slower)
p99: 99% of requests complete within this time (1 in 100 is slower)
Why averages are dangerous:
- Average can be 50ms while p99 is 5000ms
- Every 100th request forces 5-second wait
- Heavy users are more likely to hit high percentiles
5. Common Bottlenecks
5.1 Database Bottlenecks
N+1 Query Problem:
# BAD: N+1 query - 101 queries for 100 orders
orders = Order.objects.all()[:100]
for order in orders:
print(f"Order {order.id} by {order.user.name}")
# GOOD: Eager loading - 2 queries total
orders = Order.objects.select_related('user').all()[:100]
for order in orders:
print(f"Order {order.id} by {order.user.name}")
# GOOD: Prefetch (M:N relationships)
orders = Order.objects.prefetch_related('items__product').all()[:100]
for order in orders:
for item in order.items.all():
print(f" - {item.product.name}")
// Node.js + Prisma - N+1 Resolution
// BAD: N+1
const orders = await prisma.order.findMany({ take: 100 });
for (const order of orders) {
const user = await prisma.user.findUnique({
where: { id: order.userId }
});
}
// GOOD: Include (Join)
const orders = await prisma.order.findMany({
take: 100,
include: {
user: true,
items: { include: { product: true } }
}
});
// GOOD: DataLoader Pattern
const DataLoader = require('dataloader');
const userLoader = new DataLoader(async (userIds) => {
const users = await prisma.user.findMany({
where: { id: { in: [...userIds] } }
});
const userMap = new Map(users.map(u => [u.id, u]));
return userIds.map(id => userMap.get(id));
});
5.2 Missing Indexes and Full Table Scans
-- Detect slow queries (PostgreSQL)
SELECT
query,
calls,
mean_exec_time,
total_exec_time,
rows
FROM pg_stat_statements
ORDER BY mean_exec_time DESC
LIMIT 20;
-- Execution plan analysis
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT o.*, u.name
FROM orders o
JOIN users u ON o.user_id = u.id
WHERE o.status = 'pending'
AND o.created_at > NOW() - INTERVAL '7 days'
ORDER BY o.created_at DESC
LIMIT 50;
-- Create composite index (matching query patterns)
CREATE INDEX CONCURRENTLY idx_orders_status_created
ON orders (status, created_at DESC)
WHERE status IN ('pending', 'processing');
-- Check index utilization
SELECT
schemaname,
tablename,
indexname,
idx_scan,
idx_tup_read,
idx_tup_fetch
FROM pg_stat_user_indexes
ORDER BY idx_scan ASC;
5.3 Connection Pool Exhaustion
# PgBouncer Configuration (PostgreSQL connection pooler)
# pgbouncer.ini
"""
[databases]
mydb = host=127.0.0.1 port=5432 dbname=mydb
[pgbouncer]
listen_port = 6432
listen_addr = 0.0.0.0
auth_type = md5
pool_mode = transaction
default_pool_size = 25
min_pool_size = 5
reserve_pool_size = 5
reserve_pool_timeout = 3
max_client_conn = 1000
max_db_connections = 50
server_idle_timeout = 600
server_lifetime = 3600
stats_period = 60
"""
5.4 Lock Contention
// Go - Lock contention resolution with sharding
package main
import "sync"
// BAD: Global mutex locking entire map
type BadCache struct {
mu sync.Mutex
items map[string]interface{}
}
// GOOD: Sharding to distribute lock contention
type ShardedCache struct {
shards [256]shard
shardMask uint8
}
type shard struct {
mu sync.RWMutex
items map[string]interface{}
}
func NewShardedCache() *ShardedCache {
c := &ShardedCache{shardMask: 255}
for i := range c.shards {
c.shards[i].items = make(map[string]interface{})
}
return c
}
func (c *ShardedCache) getShard(key string) *shard {
hash := fnv32(key)
return &c.shards[hash&uint32(c.shardMask)]
}
func (c *ShardedCache) Get(key string) (interface{}, bool) {
s := c.getShard(key)
s.mu.RLock()
defer s.mu.RUnlock()
val, ok := s.items[key]
return val, ok
}
func (c *ShardedCache) Set(key string, value interface{}) {
s := c.getShard(key)
s.mu.Lock()
defer s.mu.Unlock()
s.items[key] = value
}
func fnv32(key string) uint32 {
hash := uint32(2166136261)
for i := 0; i < len(key); i++ {
hash *= 16777619
hash ^= uint32(key[i])
}
return hash
}
6. Database Optimization
6.1 Query Optimization Strategies
-- 1. Convert subqueries to JOINs
-- BAD
SELECT * FROM orders
WHERE user_id IN (SELECT id FROM users WHERE status = 'active');
-- GOOD
SELECT o.* FROM orders o
INNER JOIN users u ON o.user_id = u.id
WHERE u.status = 'active';
-- 2. Cursor-based pagination (consistent performance)
-- BAD: OFFSET-based (slow on deep pages)
SELECT * FROM products ORDER BY id LIMIT 20 OFFSET 10000;
-- GOOD: Cursor-based
SELECT * FROM products
WHERE id > 10000
ORDER BY id
LIMIT 20;
-- 3. Partitioning
CREATE TABLE orders (
id BIGSERIAL,
user_id BIGINT NOT NULL,
status VARCHAR(20) NOT NULL,
created_at TIMESTAMP NOT NULL,
total_amount DECIMAL(10,2)
) PARTITION BY RANGE (created_at);
CREATE TABLE orders_2025_q1 PARTITION OF orders
FOR VALUES FROM ('2025-01-01') TO ('2025-04-01');
CREATE TABLE orders_2025_q2 PARTITION OF orders
FOR VALUES FROM ('2025-04-01') TO ('2025-07-01');
6.2 Read Replicas
# SQLAlchemy - Read/Write Separation
from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker
class DatabaseRouter:
def __init__(self):
self.writer = create_engine(
'postgresql://writer:pass@primary:5432/mydb',
pool_size=10, max_overflow=20
)
self.readers = [
create_engine(
f'postgresql://reader:pass@replica{i}:5432/mydb',
pool_size=10, max_overflow=20
)
for i in range(1, 4) # 3 read replicas
]
self._reader_index = 0
def get_writer_session(self):
Session = sessionmaker(bind=self.writer)
return Session()
def get_reader_session(self):
reader = self.readers[self._reader_index % len(self.readers)]
self._reader_index += 1
Session = sessionmaker(bind=reader)
return Session()
7. Caching Strategies
7.1 Cache-Aside Pattern
import redis
import json
from functools import wraps
redis_client = redis.Redis(host='localhost', port=6379, db=0)
class CacheAside:
"""Cache-Aside (Lazy Loading) pattern implementation"""
@staticmethod
def cached(key_prefix, ttl_seconds=300):
def decorator(func):
@wraps(func)
async def wrapper(*args, **kwargs):
cache_key = f"{key_prefix}:{':'.join(str(a) for a in args)}"
# 1. Check cache
cached = redis_client.get(cache_key)
if cached:
return json.loads(cached)
# 2. Cache miss - query DB
result = await func(*args, **kwargs)
# 3. Store result in cache
if result is not None:
redis_client.setex(
cache_key, ttl_seconds,
json.dumps(result, default=str)
)
return result
return wrapper
return decorator
@staticmethod
def invalidate(key_pattern):
"""Pattern-based cache invalidation"""
keys = redis_client.keys(key_pattern)
if keys:
redis_client.delete(*keys)
7.2 Write-Through and Write-Behind
class WriteThrough:
"""Write-Through: Update cache and DB simultaneously"""
async def update(self, key, value, ttl=300):
await self.db.update(key, value)
redis_client.setex(f"wt:{key}", ttl, json.dumps(value, default=str))
async def get(self, key):
cached = redis_client.get(f"wt:{key}")
if cached:
return json.loads(cached)
value = await self.db.get(key)
if value:
redis_client.setex(f"wt:{key}", 300, json.dumps(value, default=str))
return value
class WriteBehind:
"""Write-Behind: Write to cache first, async DB update"""
def __init__(self):
self.write_queue = asyncio.Queue()
self.batch_size = 100
self.flush_interval = 5 # seconds
async def update(self, key, value, ttl=300):
# 1. Write to cache immediately (fast response)
redis_client.setex(f"wb:{key}", ttl, json.dumps(value, default=str))
# 2. Add to queue (async DB write)
await self.write_queue.put((key, value))
async def flush_worker(self):
"""Background worker: batch write from queue to DB"""
while True:
batch = []
try:
while len(batch) < self.batch_size:
item = await asyncio.wait_for(
self.write_queue.get(),
timeout=self.flush_interval
)
batch.append(item)
except asyncio.TimeoutError:
pass
if batch:
try:
await self.db.bulk_update(batch)
except Exception:
for item in batch:
await self.write_queue.put(item)
await asyncio.sleep(1)
7.3 TTL Strategies and Cache Invalidation
# Tiered TTL Strategy
class TieredTTLCache:
TTL_CONFIG = {
# Frequently changing data
'user:session': 1800, # 30 min
'cart:items': 900, # 15 min
# Periodically changing data
'product:detail': 3600, # 1 hour
'product:list': 600, # 10 min
'search:results': 300, # 5 min
# Rarely changing data
'category:list': 86400, # 24 hours
'config:settings': 86400, # 24 hours
'static:content': 604800, # 7 days
}
@classmethod
def get_ttl(cls, key_type):
return cls.TTL_CONFIG.get(key_type, 300) # Default 5 min
8. Async Processing
8.1 Message Queue-Based Async Processing
# Celery async task processing
from celery import Celery, chain, group
app = Celery('tasks', broker='redis://localhost:6379/0')
app.conf.update(
task_serializer='json',
accept_content=['json'],
task_acks_late=True,
worker_prefetch_multiplier=1,
task_routes={
'tasks.send_email': {'queue': 'email'},
'tasks.process_image': {'queue': 'image'},
'tasks.generate_report': {'queue': 'report'},
}
)
@app.task(bind=True, max_retries=3, default_retry_delay=60)
def send_email(self, to, subject, body):
try:
email_service.send(to=to, subject=subject, body=body)
except Exception as exc:
self.retry(exc=exc)
@app.task(bind=True, max_retries=3)
def process_order(self, order_id):
"""Order processing pipeline"""
try:
workflow = chain(
validate_inventory.s(order_id),
process_payment.s(order_id),
send_confirmation_email.s(order_id),
update_analytics.s(order_id)
)
workflow.apply_async()
except Exception as exc:
self.retry(exc=exc, countdown=30)
@app.task
def bulk_process_orders(order_ids):
"""Parallel batch processing"""
job = group(process_order.s(oid) for oid in order_ids)
return job.apply_async()
8.2 Event-Driven Architecture
// Node.js - EventEmitter-based async processing
const EventEmitter = require('events');
class OrderEventBus extends EventEmitter {
constructor() {
super();
this.setMaxListeners(20);
}
}
const orderBus = new OrderEventBus();
// Register event handlers (separation of concerns)
orderBus.on('order.created', async (order) => {
await inventoryService.decrementStock(order.items);
});
orderBus.on('order.created', async (order) => {
await emailService.sendOrderConfirmation(order);
});
orderBus.on('order.created', async (order) => {
await analyticsService.trackOrder(order);
});
class OrderService {
async createOrder(orderData) {
const order = await this.orderRepo.create(orderData);
orderBus.emit('order.created', order);
return order; // Fast response
}
}
9. Batch Optimization
9.1 Bulk Inserts
# SQLAlchemy Bulk Insert Comparison
import time
# BAD: One by one (N INSERT statements)
def insert_one_by_one(session, records):
start = time.time()
for record in records:
session.add(MyModel(**record))
session.commit()
print(f"One by one: {time.time() - start:.2f}s")
# GOOD: Bulk insert (1 INSERT statement)
def bulk_insert(session, records):
start = time.time()
session.bulk_insert_mappings(MyModel, records)
session.commit()
print(f"Bulk insert: {time.time() - start:.2f}s")
# BETTER: execute_values (PostgreSQL, psycopg2)
def execute_values_insert(conn, records):
start = time.time()
from psycopg2.extras import execute_values
cursor = conn.cursor()
execute_values(
cursor,
"INSERT INTO my_table (col1, col2, col3) VALUES %s",
[(r['col1'], r['col2'], r['col3']) for r in records],
page_size=1000
)
conn.commit()
print(f"execute_values: {time.time() - start:.2f}s")
# Performance comparison (10,000 records)
# One by one: 12.5s
# Bulk insert: 0.8s
# execute_values: 0.3s
9.2 Batch API Calls
// Optimized batch external API calls
class BatchAPIClient {
constructor(options = {}) {
this.batchSize = options.batchSize || 50;
this.concurrency = options.concurrency || 5;
this.retryAttempts = options.retryAttempts || 3;
this.delayBetweenBatches = options.delayMs || 100;
}
async processBatch(items, processFn) {
const results = [];
const errors = [];
const batches = [];
for (let i = 0; i < items.length; i += this.batchSize) {
batches.push(items.slice(i, i + this.batchSize));
}
for (let i = 0; i < batches.length; i += this.concurrency) {
const concurrentBatches = batches.slice(i, i + this.concurrency);
const batchResults = await Promise.allSettled(
concurrentBatches.map(batch => this.processWithRetry(batch, processFn))
);
for (const result of batchResults) {
if (result.status === 'fulfilled') {
results.push(...result.value);
} else {
errors.push(result.reason);
}
}
if (i + this.concurrency < batches.length) {
await new Promise(r => setTimeout(r, this.delayBetweenBatches));
}
}
return { results, errors, total: items.length, processed: results.length };
}
async processWithRetry(batch, processFn, attempt = 1) {
try {
return await processFn(batch);
} catch (error) {
if (attempt < this.retryAttempts) {
const delay = Math.pow(2, attempt) * 1000;
await new Promise(r => setTimeout(r, delay));
return this.processWithRetry(batch, processFn, attempt + 1);
}
throw error;
}
}
}
10. HTTP Optimization
10.1 Compression and Protocol Optimization
// Express.js compression configuration
const compression = require('compression');
app.use(compression({
filter: (req, res) => {
if (req.headers['x-no-compression']) return false;
return compression.filter(req, res);
},
level: 6, // Compression level (1-9, 6 is balanced)
threshold: 1024, // Compress only above 1KB
memLevel: 8,
}));
// HTTP/2 server setup
const http2 = require('http2');
const fs = require('fs');
const server = http2.createSecureServer({
key: fs.readFileSync('server.key'),
cert: fs.readFileSync('server.crt'),
allowHTTP1: true,
});
10.2 Keep-Alive and Connection Reuse
# Python requests - Session reuse
import requests
from requests.adapters import HTTPAdapter
from urllib3.util.retry import Retry
# BAD: New connection every time
def fetch_bad(urls):
results = []
for url in urls:
response = requests.get(url) # TCP handshake each time
results.append(response.json())
return results
# GOOD: Session reuse (Keep-Alive)
def fetch_good(urls):
session = requests.Session()
retry_strategy = Retry(
total=3, backoff_factor=0.5,
status_forcelist=[500, 502, 503, 504]
)
adapter = HTTPAdapter(
max_retries=retry_strategy,
pool_connections=10,
pool_maxsize=20,
)
session.mount("http://", adapter)
session.mount("https://", adapter)
results = []
for url in urls:
response = session.get(url) # Connection reuse
results.append(response.json())
session.close()
return results
11. Application-Level Optimization
11.1 Efficient Serialization
// JSON vs MessagePack comparison
const msgpack = require('msgpack-lite');
const data = {
users: Array.from({ length: 1000 }, (_, i) => ({
id: i,
name: `User ${i}`,
email: `user${i}@example.com`,
age: 20 + (i % 50),
active: i % 3 !== 0,
tags: ['tag1', 'tag2', 'tag3'],
}))
};
// JSON
const jsonStr = JSON.stringify(data);
console.log(`JSON size: ${Buffer.byteLength(jsonStr)} bytes`);
// MessagePack
const msgpackBuf = msgpack.encode(data);
console.log(`MessagePack size: ${msgpackBuf.length} bytes`);
// Typical results:
// JSON size: ~120KB, serialize: ~3ms
// MessagePack size: ~85KB, serialize: ~2ms (about 30% smaller)
12. Production Monitoring
12.1 SLO-Based Alerting
# Prometheus alerting rules
groups:
- name: slo-alerts
rules:
# p99 latency SLO violation
- alert: HighP99Latency
expr: |
histogram_quantile(0.99,
rate(http_request_duration_seconds_bucket[5m])
) > 0.5
for: 5m
labels:
severity: warning
annotations:
summary: "p99 latency exceeds 500ms"
# Error rate SLO violation
- alert: HighErrorRate
expr: |
sum(rate(http_requests_total{status=~"5.."}[5m]))
/
sum(rate(http_requests_total[5m])) > 0.01
for: 3m
labels:
severity: critical
annotations:
summary: "Error rate exceeds 1%"
# Throughput drop
- alert: ThroughputDrop
expr: |
sum(rate(http_requests_total[5m]))
< 0.5 * sum(rate(http_requests_total[5m] offset 1h))
for: 5m
labels:
severity: warning
# Connection pool near exhaustion
- alert: ConnectionPoolExhaustion
expr: |
hikaricp_connections_active
/ hikaricp_connections_max > 0.85
for: 2m
labels:
severity: warning
13. Practice Quiz
Q1. What are the implications of Amdahl's Law for performance optimization, and how can it be applied in practice?
Amdahl's Law shows that overall system performance improvement is limited by the proportion of the improvable portion.
Key implications:
- No matter how much you speed up a small portion (e.g., 5%) of total execution time, overall improvement is minimal
- Making a large portion (e.g., 80%) just 2x faster yields significant overall improvement
- Therefore, identify the biggest bottlenecks first through profiling, then focus improvements there
Practical application:
- Use profiling (Flame Graphs) to understand execution time distribution
- Optimize bottlenecks in order of their proportion
- Re-measure after each optimization to identify new bottlenecks
- Stop optimizing when within Performance Budget
Q2. What is the N+1 query problem, and describe 3 ways to solve it in ORMs.
N+1 Problem: After querying N parent entities, each parent's child entities are queried individually, resulting in N+1 total queries. Querying 100 orders triggers 101 DB queries.
Solutions:
- Eager Loading (select_related/include): Use JOINs to fetch parent and children in one query. Effective for 1:1, N:1 relationships
- Prefetch (prefetch_related): Batch-fetch child entities in a separate query, then map in memory. Effective for 1:N, M:N relationships using IN clause
- DataLoader Pattern: Automatically batch individual requests into a single query. Especially useful in GraphQL. Pattern developed by Facebook
Q3. Explain the differences between Cache-Aside and Write-Through caching patterns and their suitable use cases.
Cache-Aside (Lazy Loading):
- Check cache on read, query DB on miss, then store in cache
- Application manages cache directly
- First request always has cache miss (Cold Start)
- Suitable for: Read-heavy scenarios where not all data needs caching
Write-Through:
- Update cache and DB simultaneously on writes
- Cache is always up-to-date
- Write latency increases (writing to two locations)
- Suitable for: When data consistency matters and reads far exceed writes
Write-Behind writes to cache first and updates DB asynchronously, maximizing write performance but risking data loss.
Q4. Explain when to use each of the 6 load test types (Smoke, Load, Stress, Spike, Soak, Breakpoint).
- Smoke Test: Minimal load (1-5 VUs) to verify basic system operation. Post-deployment verification
- Load Test: Verify performance at expected traffic levels. Check SLO compliance
- Stress Test: Exceed expected traffic to find system limits. Used for capacity planning
- Spike Test: Test system reaction to sudden traffic surges (e.g., events). Verify auto-scaling
- Soak Test: Maintain steady load for hours to find gradual issues like memory leaks or connection exhaustion
- Breakpoint Test: Continuously increase load to find absolute system failure point
Q5. What key parameters should be considered for connection pool tuning, and how do you determine the appropriate pool size?
Key parameters:
- maximum-pool-size: Maximum connections (too many overloads DB, too few causes waits)
- minimum-idle: Minimum idle connections (prevents cold start)
- connection-timeout: Wait time for connection acquisition
- idle-timeout: Time before idle connection is returned
- max-lifetime: Maximum connection lifetime (should be shorter than DB firewall timeout)
Determining appropriate pool size:
- HikariCP formula:
connections = ((core_count * 2) + effective_spindle_count) - For SSDs:
connections = core_count * 2 + 1 - Generally 10-20 is sufficient for many cases
- Too-large pools actually increase DB context switching costs
- Adjust based on monitoring: increase when utilization exceeds 80%, immediately increase when waiting threads appear
References
- Google SRE Book - Performance Engineering
- k6 Documentation
- Artillery Documentation
- Brendan Gregg - Systems Performance
- Flame Graphs
- HikariCP - About Pool Sizing
- Redis Best Practices
- PostgreSQL Performance Tips
- Node.js Diagnostics Guide
- Go pprof Documentation
- Python cProfile Documentation
- Prometheus Monitoring
- DataLoader Pattern