Split View: 백엔드 성능 엔지니어링 완전 가이드 2025: 프로파일링, 부하 테스트, 병목 분석, 최적화

✨ Learn with Quiz

백엔드 성능 엔지니어링 완전 가이드 2025: 프로파일링, 부하 테스트, 병목 분석, 최적화

1. 성능 엔지니어링 마인드셋

1.1 측정 먼저, 최적화는 나중에

성능 엔지니어링의 황금률은 "추측하지 말고, 측정하라"입니다. 직감에 의한 최적화는 대부분 잘못된 곳에 시간을 낭비합니다.

성능 최적화의 3단계:

측정(Measure): 현재 성능을 정량적으로 측정
분석(Analyze): 병목 지점을 정확히 식별
최적화(Optimize): 가장 영향력 큰 병목부터 해결

1.2 암달의 법칙(Amdahl's Law)

시스템 전체 성능 향상은 개선 가능한 부분의 비율에 의해 제한됩니다.

전체 속도 향상 = 1 / ((1 - P) + P / S)

P = 개선 가능한 부분의 비율
S = 해당 부분의 속도 향상 배수

예시: 전체의 20%를 차지하는 코드를 10배 빠르게 만들면
= 1 / ((1 - 0.2) + 0.2 / 10)
= 1 / (0.8 + 0.02)
= 1.22배 (22% 향상)

반면, 전체의 80%를 차지하는 코드를 2배 빠르게 만들면
= 1 / ((1 - 0.8) + 0.8 / 2)
= 1 / (0.2 + 0.4)
= 1.67배 (67% 향상)

핵심: 작은 부분을 극적으로 개선하는 것보다, 큰 부분을 적당히 개선하는 것이 효과적입니다.

1.3 성능 예산(Performance Budget)

# 성능 예산 정의 예시
performance_budget:
  api_endpoints:
    p50_latency_ms: 50
    p95_latency_ms: 200
    p99_latency_ms: 500
    max_latency_ms: 2000
    error_rate_percent: 0.1
    throughput_rps: 1000

  database:
    query_p95_ms: 50
    query_p99_ms: 200
    connection_pool_utilization: 70
    slow_query_threshold_ms: 100

  external_services:
    p95_latency_ms: 300
    timeout_ms: 5000
    retry_count: 3
    circuit_breaker_threshold: 50

2. 프로파일링

2.1 CPU 프로파일링과 Flame Graph

Flame Graph는 CPU 시간이 어디에 소비되는지를 시각적으로 보여주는 강력한 도구입니다.

Node.js CPU 프로파일링:

// Node.js - 내장 프로파일러 사용
// 실행: node --prof app.js
// 분석: node --prof-process isolate-*.log > profile.txt

// 또는 v8-profiler-next 사용
const v8Profiler = require('v8-profiler-next');

function startProfiling(durationMs = 30000) {
  const title = `cpu-profile-${Date.now()}`;
  v8Profiler.startProfiling(title, true);

  setTimeout(() => {
    const profile = v8Profiler.stopProfiling(title);
    profile.export((error, result) => {
      if (!error) {
        require('fs').writeFileSync(
          `./profiles/${title}.cpuprofile`,
          result
        );
      }
      profile.delete();
    });
  }, durationMs);
}

// 미들웨어로 특정 요청 프로파일링
function profilingMiddleware(req, res, next) {
  if (req.headers['x-profile'] !== 'true') {
    return next();
  }

  const title = `req-${req.method}-${req.path}-${Date.now()}`;
  v8Profiler.startProfiling(title, true);

  const originalEnd = res.end;
  res.end = function (...args) {
    const profile = v8Profiler.stopProfiling(title);
    profile.export((error, result) => {
      if (!error) {
        require('fs').writeFileSync(
          `./profiles/${title}.cpuprofile`,
          result
        );
      }
      profile.delete();
    });
    originalEnd.apply(res, args);
  };

  next();
}

Go CPU 프로파일링:

package main

import (
    "net/http"
    _ "net/http/pprof"
    "runtime"
)

func main() {
    // pprof 엔드포인트 활성화
    go func() {
        http.ListenAndServe("localhost:6060", nil)
    }()

    // CPU 프로파일 수집: go tool pprof http://localhost:6060/debug/pprof/profile?seconds=30
    // Flame Graph 생성: go tool pprof -http=:8080 profile.pb.gz

    // 또는 프로그래밍 방식으로
    // runtime.SetCPUProfileRate(100)
    // pprof.StartCPUProfile(f)
    // defer pprof.StopCPUProfile()

    runtime.SetBlockProfileRate(1)
    runtime.SetMutexProfileFraction(1)

    // 애플리케이션 로직
    startServer()
}

Python CPU 프로파일링:

import cProfile
import pstats
from pyinstrument import Profiler

# cProfile 사용
def profile_with_cprofile(func):
    def wrapper(*args, **kwargs):
        profiler = cProfile.Profile()
        profiler.enable()
        result = func(*args, **kwargs)
        profiler.disable()

        stats = pstats.Stats(profiler)
        stats.sort_stats('cumulative')
        stats.print_stats(20)  # 상위 20개 함수
        return result
    return wrapper

# pyinstrument 사용 (더 읽기 쉬운 출력)
def profile_with_pyinstrument(func):
    def wrapper(*args, **kwargs):
        profiler = Profiler()
        profiler.start()
        result = func(*args, **kwargs)
        profiler.stop()
        print(profiler.output_text(unicode=True))
        return result
    return wrapper

# Django 미들웨어
class ProfilingMiddleware:
    def __init__(self, get_response):
        self.get_response = get_response

    def __call__(self, request):
        if request.META.get('HTTP_X_PROFILE') == 'true':
            profiler = Profiler()
            profiler.start()
            response = self.get_response(request)
            profiler.stop()
            response['X-Profile-Duration'] = str(profiler.last_session.duration)
            # HTML 프로파일 결과를 파일로 저장
            profiler.open_in_browser()
            return response
        return self.get_response(request)

2.2 메모리 프로파일링

// Node.js 힙 스냅샷
const v8 = require('v8');
const fs = require('fs');

function takeHeapSnapshot() {
  const snapshotStream = v8.writeHeapSnapshot();
  console.log(`Heap snapshot written to: ${snapshotStream}`);
  return snapshotStream;
}

// 메모리 사용량 모니터링
function monitorMemory(intervalMs = 5000) {
  setInterval(() => {
    const usage = process.memoryUsage();
    console.log({
      rss_mb: Math.round(usage.rss / 1024 / 1024),
      heapTotal_mb: Math.round(usage.heapTotal / 1024 / 1024),
      heapUsed_mb: Math.round(usage.heapUsed / 1024 / 1024),
      external_mb: Math.round(usage.external / 1024 / 1024),
      arrayBuffers_mb: Math.round(usage.arrayBuffers / 1024 / 1024)
    });
  }, intervalMs);
}

// 메모리 누수 감지 패턴
class MemoryLeakDetector {
  constructor(options = {}) {
    this.samples = [];
    this.maxSamples = options.maxSamples || 60;
    this.threshold = options.thresholdMB || 50;
  }

  sample() {
    const usage = process.memoryUsage();
    this.samples.push({
      timestamp: Date.now(),
      heapUsed: usage.heapUsed
    });

    if (this.samples.length > this.maxSamples) {
      this.samples.shift();
    }

    return this.detectLeak();
  }

  detectLeak() {
    if (this.samples.length < 10) return null;

    const first = this.samples[0].heapUsed;
    const last = this.samples[this.samples.length - 1].heapUsed;
    const diffMB = (last - first) / 1024 / 1024;

    // 지속적인 메모리 증가 패턴 감지
    let increasing = 0;
    for (let i = 1; i < this.samples.length; i++) {
      if (this.samples[i].heapUsed > this.samples[i - 1].heapUsed) {
        increasing++;
      }
    }

    const increaseRatio = increasing / (this.samples.length - 1);

    if (diffMB > this.threshold && increaseRatio > 0.7) {
      return {
        suspected: true,
        growthMB: diffMB.toFixed(2),
        increaseRatio: increaseRatio.toFixed(2),
        duration: this.samples[this.samples.length - 1].timestamp - this.samples[0].timestamp
      };
    }

    return null;
  }
}

2.3 I/O 프로파일링

# Python - I/O 프로파일링
import time
import functools
import logging
from contextlib import contextmanager

logger = logging.getLogger('io_profiler')

class IOProfiler:
    """I/O 작업 시간 측정 데코레이터 및 컨텍스트 매니저"""

    _stats = {}

    @classmethod
    def track(cls, operation_name):
        def decorator(func):
            @functools.wraps(func)
            async def async_wrapper(*args, **kwargs):
                start = time.perf_counter()
                try:
                    result = await func(*args, **kwargs)
                    duration = time.perf_counter() - start
                    cls._record(operation_name, duration, success=True)
                    return result
                except Exception as e:
                    duration = time.perf_counter() - start
                    cls._record(operation_name, duration, success=False)
                    raise

            @functools.wraps(func)
            def sync_wrapper(*args, **kwargs):
                start = time.perf_counter()
                try:
                    result = func(*args, **kwargs)
                    duration = time.perf_counter() - start
                    cls._record(operation_name, duration, success=True)
                    return result
                except Exception as e:
                    duration = time.perf_counter() - start
                    cls._record(operation_name, duration, success=False)
                    raise

            import asyncio
            if asyncio.iscoroutinefunction(func):
                return async_wrapper
            return sync_wrapper
        return decorator

    @classmethod
    def _record(cls, name, duration, success):
        if name not in cls._stats:
            cls._stats[name] = {
                'count': 0, 'total_time': 0,
                'min_time': float('inf'), 'max_time': 0,
                'errors': 0
            }
        stats = cls._stats[name]
        stats['count'] += 1
        stats['total_time'] += duration
        stats['min_time'] = min(stats['min_time'], duration)
        stats['max_time'] = max(stats['max_time'], duration)
        if not success:
            stats['errors'] += 1

    @classmethod
    def report(cls):
        for name, stats in sorted(cls._stats.items()):
            avg = stats['total_time'] / stats['count'] if stats['count'] else 0
            logger.info(
                f"{name}: count={stats['count']}, "
                f"avg={avg*1000:.1f}ms, "
                f"min={stats['min_time']*1000:.1f}ms, "
                f"max={stats['max_time']*1000:.1f}ms, "
                f"errors={stats['errors']}"
            )


# 사용 예시
class UserRepository:
    @IOProfiler.track('db.users.find_by_id')
    async def find_by_id(self, user_id):
        return await self.db.users.find_one({"_id": user_id})

    @IOProfiler.track('db.users.search')
    async def search(self, query, limit=20):
        return await self.db.users.find(query).limit(limit).to_list(limit)

class ExternalAPIClient:
    @IOProfiler.track('api.payment.charge')
    async def charge(self, amount, token):
        async with self.session.post('/charge', json={"amount": amount, "token": token}) as resp:
            return await resp.json()

3. 부하 테스트(Load Testing)

3.1 부하 테스트 도구 비교

도구	언어	프로토콜	강점	약점
k6	JavaScript	HTTP, WebSocket, gRPC	개발자 친화적, CI/CD 통합	브라우저 테스트 제한적
Artillery	JavaScript	HTTP, WebSocket, Socket.io	설정 기반, 확장성	복잡한 시나리오 어려움
Locust	Python	HTTP	Python 스크립트, 분산	프로토콜 제한적
Gatling	Scala/Java	HTTP, WebSocket	상세 리포트, JVM 성능	학습 곡선
JMeter	Java	다양함	GUI, 다양한 프로토콜	리소스 소비 큼, 구식

3.2 k6 스크립트 예시

import http from 'k6/http';
import { check, sleep, group } from 'k6';
import { Rate, Trend, Counter } from 'k6/metrics';

// 커스텀 메트릭
const errorRate = new Rate('errors');
const apiDuration = new Trend('api_duration', true);
const requestCount = new Counter('requests');

// 테스트 옵션
export const options = {
  scenarios: {
    // 시나리오 1: 일반 부하 테스트
    normal_load: {
      executor: 'ramping-vus',
      startVUs: 0,
      stages: [
        { duration: '2m', target: 50 },   // 2분간 50 VU까지 증가
        { duration: '5m', target: 50 },   // 5분간 50 VU 유지
        { duration: '2m', target: 100 },  // 2분간 100 VU까지 증가
        { duration: '5m', target: 100 },  // 5분간 100 VU 유지
        { duration: '2m', target: 0 },    // 2분간 0으로 감소
      ],
    },
    // 시나리오 2: 스파이크 테스트
    spike_test: {
      executor: 'ramping-vus',
      startVUs: 0,
      startTime: '16m',
      stages: [
        { duration: '10s', target: 500 },  // 급격한 스파이크
        { duration: '1m', target: 500 },   // 유지
        { duration: '10s', target: 0 },    // 급격한 감소
      ],
    },
  },
  thresholds: {
    http_req_duration: ['p(95)<200', 'p(99)<500'],
    errors: ['rate<0.01'],
    http_req_failed: ['rate<0.01'],
  },
};

const BASE_URL = __ENV.BASE_URL || 'http://localhost:3000';

export default function () {
  const authToken = login();

  group('API Operations', () => {
    group('List Products', () => {
      const res = http.get(`${BASE_URL}/api/products?page=1&limit=20`, {
        headers: { Authorization: `Bearer ${authToken}` },
        tags: { name: 'GET /api/products' },
      });

      check(res, {
        'status is 200': (r) => r.status === 200,
        'response time OK': (r) => r.timings.duration < 200,
        'has products': (r) => JSON.parse(r.body).data.length > 0,
      });

      errorRate.add(res.status !== 200);
      apiDuration.add(res.timings.duration);
      requestCount.add(1);
    });

    group('Get Product Detail', () => {
      const productId = Math.floor(Math.random() * 1000) + 1;
      const res = http.get(`${BASE_URL}/api/products/${productId}`, {
        headers: { Authorization: `Bearer ${authToken}` },
        tags: { name: 'GET /api/products/:id' },
      });

      check(res, {
        'status is 200': (r) => r.status === 200,
        'has product data': (r) => {
          const body = JSON.parse(r.body);
          return body.data && body.data.id;
        },
      });

      errorRate.add(res.status !== 200);
      apiDuration.add(res.timings.duration);
    });

    group('Create Order', () => {
      const payload = JSON.stringify({
        productId: Math.floor(Math.random() * 1000) + 1,
        quantity: Math.floor(Math.random() * 5) + 1,
        shippingAddress: '123 Test Street',
      });

      const res = http.post(`${BASE_URL}/api/orders`, payload, {
        headers: {
          Authorization: `Bearer ${authToken}`,
          'Content-Type': 'application/json',
        },
        tags: { name: 'POST /api/orders' },
      });

      check(res, {
        'order created': (r) => r.status === 201,
        'has order id': (r) => JSON.parse(r.body).data.orderId,
      });

      errorRate.add(res.status !== 201);
      apiDuration.add(res.timings.duration);
    });
  });

  sleep(Math.random() * 3 + 1); // 1-4초 사이 대기
}

function login() {
  const res = http.post(`${BASE_URL}/api/auth/login`, JSON.stringify({
    email: `user${__VU}@test.com`,
    password: 'testpassword',
  }), {
    headers: { 'Content-Type': 'application/json' },
    tags: { name: 'POST /api/auth/login' },
  });

  return res.status === 200 ? JSON.parse(res.body).token : '';
}

3.3 Artillery 설정 예시

# artillery-config.yml
config:
  target: "http://localhost:3000"
  phases:
    - duration: 120
      arrivalRate: 10
      name: "Warm up"
    - duration: 300
      arrivalRate: 50
      name: "Normal load"
    - duration: 120
      arrivalRate: 100
      name: "Peak load"
  defaults:
    headers:
      Content-Type: "application/json"
  plugins:
    expect: {}
    metrics-by-endpoint: {}
  ensure:
    thresholds:
      - http.response_time.p95: 200
      - http.response_time.p99: 500

scenarios:
  - name: "User browsing flow"
    weight: 70
    flow:
      - post:
          url: "/api/auth/login"
          json:
            email: "user@test.com"
            password: "password123"
          capture:
            - json: "$.token"
              as: "authToken"
          expect:
            - statusCode: 200
      - get:
          url: "/api/products?page=1&limit=20"
          headers:
            Authorization: "Bearer {{ authToken }}"
          expect:
            - statusCode: 200
            - hasProperty: "data"
      - think: 2
      - get:
          url: "/api/products/{{ $randomNumber(1, 1000) }}"
          headers:
            Authorization: "Bearer {{ authToken }}"
          expect:
            - statusCode: 200

  - name: "Order creation flow"
    weight: 30
    flow:
      - post:
          url: "/api/auth/login"
          json:
            email: "buyer@test.com"
            password: "password123"
          capture:
            - json: "$.token"
              as: "authToken"
      - post:
          url: "/api/orders"
          headers:
            Authorization: "Bearer {{ authToken }}"
          json:
            productId: "{{ $randomNumber(1, 100) }}"
            quantity: "{{ $randomNumber(1, 5) }}"
          expect:
            - statusCode: 201

3.4 부하 테스트 유형

유형	목적	VU 패턴	기간
Smoke	기본 동작 확인	1-5	1-5분
Load	예상 트래픽 처리 확인	예상치	15-60분
Stress	한계점 탐색	예상치 초과	30-60분
Spike	급격한 트래픽 대응	갑작스런 급증	5-10분
Soak	장시간 안정성 확인	일정 수준 유지	2-24시간
Breakpoint	시스템 파괴점 탐색	지속적 증가	가변적

4. 핵심 성능 메트릭

4.1 RED Method

# RED Method 모니터링 구현
from prometheus_client import Counter, Histogram, Gauge
import time

# Rate: 초당 요청 수
request_count = Counter(
    'http_requests_total',
    'Total HTTP requests',
    ['method', 'endpoint', 'status']
)

# Errors: 에러 비율
error_count = Counter(
    'http_errors_total',
    'Total HTTP errors',
    ['method', 'endpoint', 'error_type']
)

# Duration: 응답 시간 분포
request_duration = Histogram(
    'http_request_duration_seconds',
    'HTTP request duration',
    ['method', 'endpoint'],
    buckets=[0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1.0, 2.5, 5.0, 10.0]
)

# 미들웨어 구현
class REDMetricsMiddleware:
    def __init__(self, app):
        self.app = app

    async def __call__(self, scope, receive, send):
        if scope['type'] != 'http':
            return await self.app(scope, receive, send)

        method = scope.get('method', 'UNKNOWN')
        path = scope.get('path', '/')
        status_code = 500

        start = time.perf_counter()
        try:
            # 응답 상태 코드 캡처
            async def send_wrapper(message):
                nonlocal status_code
                if message['type'] == 'http.response.start':
                    status_code = message['status']
                await send(message)

            await self.app(scope, receive, send_wrapper)
        except Exception as e:
            error_count.labels(method=method, endpoint=path, error_type=type(e).__name__).inc()
            raise
        finally:
            duration = time.perf_counter() - start
            request_count.labels(method=method, endpoint=path, status=str(status_code)).inc()
            request_duration.labels(method=method, endpoint=path).observe(duration)

            if status_code >= 400:
                error_count.labels(method=method, endpoint=path, error_type=f'http_{status_code}').inc()

4.2 지연시간 백분위(Percentiles)

              평균(Mean)   p50    p95    p99    p99.9   Max
사용자 영향도   낮음       중간    높음    높음    매우높음  극단적

p50 (중앙값): 50%의 요청이 이 시간 이내에 완료
p95: 95%의 요청이 이 시간 이내에 완료 (20개 중 1개가 이보다 느림)
p99: 99%의 요청이 이 시간 이내에 완료 (100개 중 1개가 이보다 느림)

왜 평균은 위험한가?
- 평균 50ms여도 p99가 5000ms일 수 있음
- 매 100번째 요청마다 사용자가 5초를 대기
- 헤비 유저일수록 높은 백분위에 노출될 확률 증가

5. 일반적인 병목 지점

5.1 데이터베이스 병목

N+1 쿼리 문제:

# BAD: N+1 쿼리 - 주문 100개면 101번 쿼리 실행
orders = Order.objects.all()[:100]
for order in orders:
    # 각 주문마다 별도 쿼리로 사용자 정보 조회
    print(f"Order {order.id} by {order.user.name}")

# GOOD: Eager loading - 2번의 쿼리로 해결
orders = Order.objects.select_related('user').all()[:100]
for order in orders:
    print(f"Order {order.id} by {order.user.name}")

# GOOD: Prefetch (M:N 관계)
orders = Order.objects.prefetch_related('items__product').all()[:100]
for order in orders:
    for item in order.items.all():
        print(f"  - {item.product.name}")

// Node.js + Prisma - N+1 해결
// BAD: N+1
const orders = await prisma.order.findMany({ take: 100 });
for (const order of orders) {
  const user = await prisma.user.findUnique({
    where: { id: order.userId }
  });
}

// GOOD: Include (Join)
const orders = await prisma.order.findMany({
  take: 100,
  include: {
    user: true,
    items: {
      include: { product: true }
    }
  }
});

// GOOD: DataLoader 패턴
const DataLoader = require('dataloader');

const userLoader = new DataLoader(async (userIds) => {
  const users = await prisma.user.findMany({
    where: { id: { in: [...userIds] } }
  });
  const userMap = new Map(users.map(u => [u.id, u]));
  return userIds.map(id => userMap.get(id));
});

// 여러 번 호출해도 자동으로 배치 처리
const user1 = await userLoader.load(1);
const user2 = await userLoader.load(2);

5.2 인덱스 부재와 전체 테이블 스캔

-- 느린 쿼리 탐지 (PostgreSQL)
SELECT
  query,
  calls,
  mean_exec_time,
  total_exec_time,
  rows
FROM pg_stat_statements
ORDER BY mean_exec_time DESC
LIMIT 20;

-- 실행 계획 분석
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT o.*, u.name
FROM orders o
JOIN users u ON o.user_id = u.id
WHERE o.status = 'pending'
  AND o.created_at > NOW() - INTERVAL '7 days'
ORDER BY o.created_at DESC
LIMIT 50;

-- 복합 인덱스 생성 (쿼리 패턴에 맞게)
CREATE INDEX CONCURRENTLY idx_orders_status_created
ON orders (status, created_at DESC)
WHERE status IN ('pending', 'processing');

-- 인덱스 사용률 확인
SELECT
  schemaname,
  tablename,
  indexname,
  idx_scan,
  idx_tup_read,
  idx_tup_fetch
FROM pg_stat_user_indexes
ORDER BY idx_scan ASC;

5.3 커넥션 풀 고갈

// HikariCP 최적 설정 (Java/Spring Boot)
// application.yml
/*
spring:
  datasource:
    hikari:
      maximum-pool-size: 20
      minimum-idle: 5
      idle-timeout: 300000
      max-lifetime: 600000
      connection-timeout: 30000
      leak-detection-threshold: 60000
      pool-name: "MainPool"
*/

// 커넥션 풀 모니터링
import com.zaxxer.hikari.HikariDataSource;
import com.zaxxer.hikari.HikariPoolMXBean;

public class ConnectionPoolMonitor {
    private final HikariDataSource dataSource;

    public PoolStats getStats() {
        HikariPoolMXBean poolBean = dataSource.getHikariPoolMXBean();
        return new PoolStats(
            poolBean.getTotalConnections(),
            poolBean.getActiveConnections(),
            poolBean.getIdleConnections(),
            poolBean.getThreadsAwaitingConnection()
        );
    }

    public void logWarningIfNeeded() {
        PoolStats stats = getStats();
        double utilization = (double) stats.active / stats.total;

        if (utilization > 0.8) {
            log.warn("Connection pool utilization HIGH: {}% ({}/{})",
                Math.round(utilization * 100),
                stats.active, stats.total);
        }

        if (stats.waiting > 0) {
            log.error("Threads waiting for connection: {}", stats.waiting);
        }
    }
}

# PgBouncer 설정 (PostgreSQL 커넥션 풀러)
# pgbouncer.ini
"""
[databases]
mydb = host=127.0.0.1 port=5432 dbname=mydb

[pgbouncer]
listen_port = 6432
listen_addr = 0.0.0.0
auth_type = md5
auth_file = /etc/pgbouncer/userlist.txt

pool_mode = transaction
default_pool_size = 25
min_pool_size = 5
reserve_pool_size = 5
reserve_pool_timeout = 3
max_client_conn = 1000
max_db_connections = 50

server_idle_timeout = 600
server_lifetime = 3600
client_idle_timeout = 0

log_connections = 1
log_disconnections = 1
log_pooler_errors = 1
stats_period = 60
"""

5.4 잠금 경합(Lock Contention)

// Go - 잠금 경합 프로파일링
package main

import (
    "runtime"
    "sync"
    "time"
)

// BAD: 글로벌 뮤텍스로 전체 맵 잠금
type BadCache struct {
    mu    sync.Mutex
    items map[string]interface{}
}

func (c *BadCache) Get(key string) interface{} {
    c.mu.Lock()
    defer c.mu.Unlock()
    return c.items[key]
}

// GOOD: 샤딩으로 잠금 경합 분산
type ShardedCache struct {
    shards    [256]shard
    shardMask uint8
}

type shard struct {
    mu    sync.RWMutex
    items map[string]interface{}
}

func NewShardedCache() *ShardedCache {
    c := &ShardedCache{shardMask: 255}
    for i := range c.shards {
        c.shards[i].items = make(map[string]interface{})
    }
    return c
}

func (c *ShardedCache) getShard(key string) *shard {
    hash := fnv32(key)
    return &c.shards[hash&uint32(c.shardMask)]
}

func (c *ShardedCache) Get(key string) (interface{}, bool) {
    s := c.getShard(key)
    s.mu.RLock()
    defer s.mu.RUnlock()
    val, ok := s.items[key]
    return val, ok
}

func (c *ShardedCache) Set(key string, value interface{}) {
    s := c.getShard(key)
    s.mu.Lock()
    defer s.mu.Unlock()
    s.items[key] = value
}

func fnv32(key string) uint32 {
    hash := uint32(2166136261)
    for i := 0; i < len(key); i++ {
        hash *= 16777619
        hash ^= uint32(key[i])
    }
    return hash
}

6. 데이터베이스 최적화

6.1 쿼리 최적화 전략

-- 1. 서브쿼리를 JOIN으로 변환
-- BAD
SELECT * FROM orders
WHERE user_id IN (SELECT id FROM users WHERE status = 'active');

-- GOOD
SELECT o.* FROM orders o
INNER JOIN users u ON o.user_id = u.id
WHERE u.status = 'active';

-- 2. EXISTS vs IN (대량 데이터)
-- GOOD: EXISTS (서브쿼리 결과가 큰 경우)
SELECT * FROM orders o
WHERE EXISTS (
  SELECT 1 FROM users u
  WHERE u.id = o.user_id AND u.status = 'active'
);

-- 3. 페이지네이션 최적화
-- BAD: OFFSET 기반 (깊은 페이지에서 느림)
SELECT * FROM products ORDER BY id LIMIT 20 OFFSET 10000;

-- GOOD: 커서 기반 (일정한 성능)
SELECT * FROM products
WHERE id > 10000
ORDER BY id
LIMIT 20;

-- 4. 집계 쿼리 최적화
-- BAD: COUNT(*)를 자주 호출
SELECT COUNT(*) FROM orders WHERE status = 'pending';

-- GOOD: 대략적인 카운트 사용 (PostgreSQL)
SELECT reltuples::bigint AS estimate
FROM pg_class WHERE relname = 'orders';

-- 5. 파티셔닝
CREATE TABLE orders (
  id BIGSERIAL,
  user_id BIGINT NOT NULL,
  status VARCHAR(20) NOT NULL,
  created_at TIMESTAMP NOT NULL,
  total_amount DECIMAL(10,2)
) PARTITION BY RANGE (created_at);

CREATE TABLE orders_2025_q1 PARTITION OF orders
  FOR VALUES FROM ('2025-01-01') TO ('2025-04-01');
CREATE TABLE orders_2025_q2 PARTITION OF orders
  FOR VALUES FROM ('2025-04-01') TO ('2025-07-01');

6.2 읽기 복제본(Read Replica)

# SQLAlchemy - 읽기/쓰기 분리
from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker

class DatabaseRouter:
    def __init__(self):
        self.writer = create_engine(
            'postgresql://writer:pass@primary:5432/mydb',
            pool_size=10,
            max_overflow=20
        )
        self.readers = [
            create_engine(
                f'postgresql://reader:pass@replica{i}:5432/mydb',
                pool_size=10,
                max_overflow=20
            )
            for i in range(1, 4)  # 3개의 읽기 복제본
        ]
        self._reader_index = 0

    def get_writer_session(self):
        Session = sessionmaker(bind=self.writer)
        return Session()

    def get_reader_session(self):
        # 라운드 로빈으로 읽기 복제본 선택
        reader = self.readers[self._reader_index % len(self.readers)]
        self._reader_index += 1
        Session = sessionmaker(bind=reader)
        return Session()


# 사용 예시
db = DatabaseRouter()

# 쓰기 작업
with db.get_writer_session() as session:
    new_order = Order(user_id=1, total=99.99)
    session.add(new_order)
    session.commit()

# 읽기 작업 (복제본 사용)
with db.get_reader_session() as session:
    orders = session.query(Order).filter_by(status='pending').all()

7. 캐싱 전략

7.1 Cache-Aside 패턴

import redis
import json
from functools import wraps

redis_client = redis.Redis(host='localhost', port=6379, db=0)

class CacheAside:
    """Cache-Aside (Lazy Loading) 패턴 구현"""

    @staticmethod
    def cached(key_prefix, ttl_seconds=300):
        def decorator(func):
            @wraps(func)
            async def wrapper(*args, **kwargs):
                # 캐시 키 생성
                cache_key = f"{key_prefix}:{':'.join(str(a) for a in args)}"

                # 1. 캐시에서 조회
                cached = redis_client.get(cache_key)
                if cached:
                    return json.loads(cached)

                # 2. 캐시 미스 - DB에서 조회
                result = await func(*args, **kwargs)

                # 3. 결과를 캐시에 저장
                if result is not None:
                    redis_client.setex(
                        cache_key,
                        ttl_seconds,
                        json.dumps(result, default=str)
                    )

                return result
            return wrapper
        return decorator

    @staticmethod
    def invalidate(key_pattern):
        """패턴 기반 캐시 무효화"""
        keys = redis_client.keys(key_pattern)
        if keys:
            redis_client.delete(*keys)


# 사용 예시
class ProductService:
    @CacheAside.cached('product', ttl_seconds=600)
    async def get_product(self, product_id):
        return await self.db.products.find_one({"_id": product_id})

    @CacheAside.cached('product:list', ttl_seconds=120)
    async def list_products(self, category, page):
        return await self.db.products.find(
            {"category": category}
        ).skip((page - 1) * 20).limit(20).to_list(20)

    async def update_product(self, product_id, data):
        await self.db.products.update_one(
            {"_id": product_id},
            {"$set": data}
        )
        # 관련 캐시 무효화
        CacheAside.invalidate(f'product:{product_id}')
        CacheAside.invalidate('product:list:*')

7.2 Write-Through와 Write-Behind

class WriteThrough:
    """Write-Through: 캐시와 DB를 동시에 업데이트"""

    async def update(self, key, value, ttl=300):
        # 1. DB에 쓰기
        await self.db.update(key, value)
        # 2. 캐시 업데이트 (DB 쓰기 성공 후)
        redis_client.setex(f"wt:{key}", ttl, json.dumps(value, default=str))

    async def get(self, key):
        # 캐시에서 조회 (항상 최신 데이터)
        cached = redis_client.get(f"wt:{key}")
        if cached:
            return json.loads(cached)
        # 캐시 미스 시 DB 조회 후 캐시 저장
        value = await self.db.get(key)
        if value:
            redis_client.setex(f"wt:{key}", 300, json.dumps(value, default=str))
        return value


class WriteBehind:
    """Write-Behind (Write-Back): 캐시에 먼저 쓰고, 비동기로 DB에 반영"""

    def __init__(self):
        self.write_queue = asyncio.Queue()
        self.batch_size = 100
        self.flush_interval = 5  # 초

    async def update(self, key, value, ttl=300):
        # 1. 캐시에 즉시 쓰기 (빠른 응답)
        redis_client.setex(f"wb:{key}", ttl, json.dumps(value, default=str))
        # 2. 큐에 추가 (비동기 DB 쓰기)
        await self.write_queue.put((key, value))

    async def flush_worker(self):
        """백그라운드 워커: 큐에서 꺼내서 DB에 배치 쓰기"""
        while True:
            batch = []
            try:
                while len(batch) < self.batch_size:
                    item = await asyncio.wait_for(
                        self.write_queue.get(),
                        timeout=self.flush_interval
                    )
                    batch.append(item)
            except asyncio.TimeoutError:
                pass

            if batch:
                try:
                    await self.db.bulk_update(batch)
                except Exception as e:
                    # 실패 시 재시도 큐에 추가
                    for item in batch:
                        await self.write_queue.put(item)
                    await asyncio.sleep(1)

7.3 TTL 전략과 캐시 무효화

# 다층 TTL 전략
class TieredTTLCache:
    TTL_CONFIG = {
        # 자주 변경되는 데이터
        'user:session': 1800,          # 30분
        'cart:items': 900,             # 15분

        # 주기적으로 변경되는 데이터
        'product:detail': 3600,        # 1시간
        'product:list': 600,           # 10분
        'search:results': 300,         # 5분

        # 거의 변경되지 않는 데이터
        'category:list': 86400,        # 24시간
        'config:settings': 86400,      # 24시간
        'static:content': 604800,      # 7일
    }

    @classmethod
    def get_ttl(cls, key_type):
        return cls.TTL_CONFIG.get(key_type, 300)  # 기본 5분

    @staticmethod
    def stale_while_revalidate(key, ttl, stale_ttl):
        """Stale-While-Revalidate 패턴"""
        cached = redis_client.get(key)
        if cached:
            data = json.loads(cached)
            if data['_cached_at'] + ttl > time.time():
                return data['value'], False  # 신선한 데이터
            if data['_cached_at'] + stale_ttl > time.time():
                return data['value'], True   # 부실하지만 사용 가능
        return None, True  # 캐시 미스

# 캐시 워밍(Pre-warming)
class CacheWarmer:
    async def warm_popular_products(self):
        """인기 상품 캐시 사전 로딩"""
        popular = await self.db.products.find(
            {"popular": True}
        ).limit(100).to_list(100)

        pipe = redis_client.pipeline()
        for product in popular:
            key = f"product:{product['_id']}"
            pipe.setex(key, 3600, json.dumps(product, default=str))
        pipe.execute()

8. 비동기 처리

8.1 메시지 큐 기반 비동기 처리

# Celery를 사용한 비동기 태스크 처리
from celery import Celery, chain, group, chord

app = Celery('tasks', broker='redis://localhost:6379/0')

# 설정
app.conf.update(
    task_serializer='json',
    accept_content=['json'],
    result_serializer='json',
    timezone='UTC',
    task_acks_late=True,
    worker_prefetch_multiplier=1,
    task_reject_on_worker_lost=True,
    task_routes={
        'tasks.send_email': {'queue': 'email'},
        'tasks.process_image': {'queue': 'image'},
        'tasks.generate_report': {'queue': 'report'},
    }
)

@app.task(bind=True, max_retries=3, default_retry_delay=60)
def send_email(self, to, subject, body):
    try:
        email_service.send(to=to, subject=subject, body=body)
    except Exception as exc:
        self.retry(exc=exc)

@app.task(bind=True, max_retries=3)
def process_order(self, order_id):
    """주문 처리 파이프라인"""
    try:
        order = Order.objects.get(id=order_id)

        # 체인으로 순차 처리
        workflow = chain(
            validate_inventory.s(order_id),
            process_payment.s(order_id),
            send_confirmation_email.s(order_id),
            update_analytics.s(order_id)
        )
        workflow.apply_async()

    except Exception as exc:
        self.retry(exc=exc, countdown=30)

@app.task
def validate_inventory(result, order_id):
    # 재고 확인
    order = Order.objects.get(id=order_id)
    for item in order.items.all():
        if item.product.stock < item.quantity:
            raise InsufficientStockError(item.product.name)
    return True

@app.task
def bulk_process_orders(order_ids):
    """병렬 배치 처리"""
    job = group(process_order.s(oid) for oid in order_ids)
    result = job.apply_async()
    return result

8.2 이벤트 드리븐 아키텍처

// Node.js - EventEmitter 기반 비동기 처리
const EventEmitter = require('events');

class OrderEventBus extends EventEmitter {
  constructor() {
    super();
    this.setMaxListeners(20);
  }
}

const orderBus = new OrderEventBus();

// 이벤트 핸들러 등록 (관심사 분리)
orderBus.on('order.created', async (order) => {
  // 재고 업데이트
  await inventoryService.decrementStock(order.items);
});

orderBus.on('order.created', async (order) => {
  // 확인 이메일 발송
  await emailService.sendOrderConfirmation(order);
});

orderBus.on('order.created', async (order) => {
  // 분석 데이터 업데이트
  await analyticsService.trackOrder(order);
});

orderBus.on('order.created', async (order) => {
  // 추천 시스템 업데이트
  await recommendationService.recordPurchase(order.userId, order.items);
});

// 주문 생성 시 이벤트 발행
class OrderService {
  async createOrder(orderData) {
    const order = await this.orderRepo.create(orderData);

    // 동기적으로 필수 작업만 수행
    // 나머지는 이벤트로 비동기 처리
    orderBus.emit('order.created', order);

    return order; // 빠르게 응답
  }
}

9. 배치 최적화

9.1 벌크 인서트

# SQLAlchemy 벌크 인서트 비교
import time

# BAD: 하나씩 삽입 (N번의 INSERT)
def insert_one_by_one(session, records):
    start = time.time()
    for record in records:
        session.add(MyModel(**record))
    session.commit()
    print(f"One by one: {time.time() - start:.2f}s")

# GOOD: 벌크 삽입 (1번의 INSERT)
def bulk_insert(session, records):
    start = time.time()
    session.bulk_insert_mappings(MyModel, records)
    session.commit()
    print(f"Bulk insert: {time.time() - start:.2f}s")

# BETTER: execute_values (PostgreSQL, psycopg2)
def execute_values_insert(conn, records):
    start = time.time()
    from psycopg2.extras import execute_values
    cursor = conn.cursor()
    execute_values(
        cursor,
        "INSERT INTO my_table (col1, col2, col3) VALUES %s",
        [(r['col1'], r['col2'], r['col3']) for r in records],
        page_size=1000
    )
    conn.commit()
    print(f"execute_values: {time.time() - start:.2f}s")

# 성능 비교 (10,000건 기준)
# One by one: 12.5s
# Bulk insert: 0.8s
# execute_values: 0.3s

9.2 배치 API 호출

// 외부 API 배치 호출 최적화
class BatchAPIClient {
  constructor(options = {}) {
    this.batchSize = options.batchSize || 50;
    this.concurrency = options.concurrency || 5;
    this.retryAttempts = options.retryAttempts || 3;
    this.delayBetweenBatches = options.delayMs || 100;
  }

  async processBatch(items, processFn) {
    const results = [];
    const errors = [];

    // 아이템을 배치로 분할
    const batches = [];
    for (let i = 0; i < items.length; i += this.batchSize) {
      batches.push(items.slice(i, i + this.batchSize));
    }

    // 동시성 제한하여 배치 처리
    for (let i = 0; i < batches.length; i += this.concurrency) {
      const concurrentBatches = batches.slice(i, i + this.concurrency);

      const batchResults = await Promise.allSettled(
        concurrentBatches.map(batch => this.processWithRetry(batch, processFn))
      );

      for (const result of batchResults) {
        if (result.status === 'fulfilled') {
          results.push(...result.value);
        } else {
          errors.push(result.reason);
        }
      }

      // 배치 간 딜레이 (Rate limiting 방지)
      if (i + this.concurrency < batches.length) {
        await new Promise(r => setTimeout(r, this.delayBetweenBatches));
      }
    }

    return { results, errors, total: items.length, processed: results.length };
  }

  async processWithRetry(batch, processFn, attempt = 1) {
    try {
      return await processFn(batch);
    } catch (error) {
      if (attempt < this.retryAttempts) {
        const delay = Math.pow(2, attempt) * 1000; // 지수 백오프
        await new Promise(r => setTimeout(r, delay));
        return this.processWithRetry(batch, processFn, attempt + 1);
      }
      throw error;
    }
  }
}

// 사용 예시
const client = new BatchAPIClient({ batchSize: 100, concurrency: 3 });

const result = await client.processBatch(userIds, async (batch) => {
  const response = await fetch('/api/users/batch', {
    method: 'POST',
    body: JSON.stringify({ ids: batch }),
    headers: { 'Content-Type': 'application/json' }
  });
  return response.json();
});

10. HTTP 최적화

10.1 압축과 프로토콜 최적화

// Express.js 압축 설정
const compression = require('compression');

app.use(compression({
  filter: (req, res) => {
    if (req.headers['x-no-compression']) return false;
    return compression.filter(req, res);
  },
  level: 6,              // 압축 레벨 (1-9, 6이 균형점)
  threshold: 1024,       // 1KB 이상만 압축
  memLevel: 8,           // 메모리 사용량 (1-9)
}));

// HTTP/2 서버 설정
const http2 = require('http2');
const fs = require('fs');

const server = http2.createSecureServer({
  key: fs.readFileSync('server.key'),
  cert: fs.readFileSync('server.crt'),
  allowHTTP1: true,
});

server.on('stream', (stream, headers) => {
  const path = headers[':path'];

  // Server Push
  if (path === '/index.html') {
    stream.pushStream({ ':path': '/styles.css' }, (err, pushStream) => {
      if (!err) {
        pushStream.respond({ ':status': 200, 'content-type': 'text/css' });
        pushStream.end(fs.readFileSync('styles.css'));
      }
    });
  }

  stream.respond({
    ':status': 200,
    'content-type': 'text/html',
  });
  stream.end(fs.readFileSync(`.${path}`));
});

10.2 Keep-Alive와 연결 재사용

# Python requests - 세션 재사용
import requests
from requests.adapters import HTTPAdapter
from urllib3.util.retry import Retry

# BAD: 매번 새 연결
def fetch_bad(urls):
    results = []
    for url in urls:
        response = requests.get(url)  # 매번 TCP 핸드셰이크
        results.append(response.json())
    return results

# GOOD: 세션 재사용 (Keep-Alive)
def fetch_good(urls):
    session = requests.Session()

    # 재시도 설정
    retry_strategy = Retry(
        total=3,
        backoff_factor=0.5,
        status_forcelist=[500, 502, 503, 504]
    )
    adapter = HTTPAdapter(
        max_retries=retry_strategy,
        pool_connections=10,
        pool_maxsize=20,
        pool_block=False
    )
    session.mount("http://", adapter)
    session.mount("https://", adapter)

    results = []
    for url in urls:
        response = session.get(url)  # 연결 재사용
        results.append(response.json())

    session.close()
    return results

11. 애플리케이션 레벨 최적화

11.1 효율적인 직렬화

// JSON vs MessagePack vs Protobuf 비교
const msgpack = require('msgpack-lite');

// 테스트 데이터
const data = {
  users: Array.from({ length: 1000 }, (_, i) => ({
    id: i,
    name: `User ${i}`,
    email: `user${i}@example.com`,
    age: 20 + (i % 50),
    active: i % 3 !== 0,
    tags: ['tag1', 'tag2', 'tag3'],
    metadata: { loginCount: i * 10, lastLogin: new Date().toISOString() }
  }))
};

// JSON
console.time('json-serialize');
const jsonStr = JSON.stringify(data);
console.timeEnd('json-serialize');
console.log(`JSON size: ${Buffer.byteLength(jsonStr)} bytes`);

// MessagePack
console.time('msgpack-serialize');
const msgpackBuf = msgpack.encode(data);
console.timeEnd('msgpack-serialize');
console.log(`MessagePack size: ${msgpackBuf.length} bytes`);

// 일반적인 결과:
// JSON size: ~120KB, serialize: ~3ms
// MessagePack size: ~85KB, serialize: ~2ms (약 30% 작음)

11.2 Object Pooling

// Apache Commons Pool2 기반 객체 풀
import org.apache.commons.pool2.BasePooledObjectFactory;
import org.apache.commons.pool2.PooledObject;
import org.apache.commons.pool2.impl.DefaultPooledObject;
import org.apache.commons.pool2.impl.GenericObjectPool;
import org.apache.commons.pool2.impl.GenericObjectPoolConfig;

public class ExpensiveObjectPool {

    private final GenericObjectPool<ExpensiveObject> pool;

    public ExpensiveObjectPool() {
        GenericObjectPoolConfig<ExpensiveObject> config = new GenericObjectPoolConfig<>();
        config.setMaxTotal(50);
        config.setMaxIdle(20);
        config.setMinIdle(5);
        config.setTestOnBorrow(true);
        config.setTestWhileIdle(true);
        config.setTimeBetweenEvictionRunsMillis(30000);

        pool = new GenericObjectPool<>(new ExpensiveObjectFactory(), config);
    }

    public ExpensiveObject borrow() throws Exception {
        return pool.borrowObject();
    }

    public void returnObject(ExpensiveObject obj) {
        pool.returnObject(obj);
    }

    static class ExpensiveObjectFactory extends BasePooledObjectFactory<ExpensiveObject> {
        @Override
        public ExpensiveObject create() {
            return new ExpensiveObject(); // 비용이 큰 초기화
        }

        @Override
        public PooledObject<ExpensiveObject> wrap(ExpensiveObject obj) {
            return new DefaultPooledObject<>(obj);
        }

        @Override
        public void passivateObject(PooledObject<ExpensiveObject> pooledObj) {
            pooledObj.getObject().reset(); // 풀에 반환 시 상태 초기화
        }

        @Override
        public boolean validateObject(PooledObject<ExpensiveObject> pooledObj) {
            return pooledObj.getObject().isValid();
        }
    }
}

12. 프로덕션 모니터링

12.1 SLO 기반 알림 설정

# Prometheus 알림 규칙
groups:
  - name: slo-alerts
    rules:
      # p99 지연시간 SLO 위반
      - alert: HighP99Latency
        expr: |
          histogram_quantile(0.99,
            rate(http_request_duration_seconds_bucket[5m])
          ) > 0.5
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "p99 latency exceeds 500ms"
          description: "p99 latency is at {{ $value }}s for 5 minutes"

      # 에러율 SLO 위반
      - alert: HighErrorRate
        expr: |
          sum(rate(http_requests_total{status=~"5.."}[5m]))
          /
          sum(rate(http_requests_total[5m])) > 0.01
        for: 3m
        labels:
          severity: critical
        annotations:
          summary: "Error rate exceeds 1%"

      # 처리량 급감
      - alert: ThroughputDrop
        expr: |
          sum(rate(http_requests_total[5m]))
          < 0.5 * sum(rate(http_requests_total[5m] offset 1h))
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "Throughput dropped over 50% compared to 1h ago"

      # 커넥션 풀 고갈 임박
      - alert: ConnectionPoolExhaustion
        expr: |
          hikaricp_connections_active
          / hikaricp_connections_max > 0.85
        for: 2m
        labels:
          severity: warning
        annotations:
          summary: "Connection pool utilization above 85%"

      # GC 일시정지 시간 증가
      - alert: HighGCPauseTime
        expr: |
          rate(jvm_gc_pause_seconds_sum[5m])
          / rate(jvm_gc_pause_seconds_count[5m]) > 0.1
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "Average GC pause time exceeds 100ms"

12.2 대시보드 구성

# Grafana 대시보드 JSON 생성 (Python)
class PerformanceDashboard:
    def generate_panels(self):
        return {
            "dashboard": {
                "title": "Backend Performance",
                "panels": [
                    # RED 메트릭
                    self._throughput_panel(),
                    self._error_rate_panel(),
                    self._latency_panel(),

                    # 리소스 사용량
                    self._cpu_panel(),
                    self._memory_panel(),
                    self._gc_panel(),

                    # 데이터베이스
                    self._db_query_panel(),
                    self._connection_pool_panel(),
                    self._slow_queries_panel(),

                    # 캐시
                    self._cache_hit_rate_panel(),
                    self._cache_latency_panel(),

                    # 외부 서비스
                    self._external_api_panel(),
                ]
            }
        }

    def _latency_panel(self):
        return {
            "title": "API Latency Percentiles",
            "type": "timeseries",
            "targets": [
                {
                    "expr": 'histogram_quantile(0.50, rate(http_request_duration_seconds_bucket[5m]))',
                    "legendFormat": "p50"
                },
                {
                    "expr": 'histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m]))',
                    "legendFormat": "p95"
                },
                {
                    "expr": 'histogram_quantile(0.99, rate(http_request_duration_seconds_bucket[5m]))',
                    "legendFormat": "p99"
                }
            ]
        }

13. 실전 퀴즈

Q1. 암달의 법칙이 성능 최적화에 주는 시사점은 무엇이며, 이를 실무에 어떻게 적용할 수 있나요?

암달의 법칙은 시스템 전체 성능 향상이 개선 가능한 부분의 비율에 의해 제한된다는 것을 보여줍니다.

핵심 시사점:

전체 실행 시간의 작은 부분(예: 5%)을 아무리 빠르게 만들어도 전체 성능 향상은 미미합니다
전체 실행 시간의 큰 부분(예: 80%)을 2배만 빠르게 해도 상당한 성능 향상을 얻습니다
따라서 프로파일링으로 가장 큰 병목을 먼저 식별한 후, 그 부분을 집중적으로 개선해야 합니다

실무 적용:

프로파일링(Flame Graph)으로 실행 시간 분포 파악
가장 큰 비율을 차지하는 병목부터 순서대로 최적화
각 최적화 후 다시 측정하여 새로운 병목 확인
성능 예산(Performance Budget) 내에 들어오면 최적화 중단

Q2. N+1 쿼리 문제란 무엇이며, ORM에서 이를 해결하는 3가지 방법을 설명하세요.

N+1 문제: 부모 엔티티 N개를 조회한 후, 각 부모의 자식 엔티티를 개별 쿼리로 조회하여 총 N+1번의 쿼리가 실행되는 문제입니다. 100개의 주문을 조회하면 101번의 DB 쿼리가 발생합니다.

해결 방법:

Eager Loading (select_related/include): JOIN을 사용하여 부모와 자식을 한 번의 쿼리로 조회. 1:1, N:1 관계에 효과적
Prefetch (prefetch_related): 별도 쿼리로 자식 엔티티를 일괄 조회 후 메모리에서 매핑. 1:N, M:N 관계에 효과적. IN 절 사용
DataLoader 패턴: 여러 개별 요청을 자동으로 배치하여 하나의 쿼리로 실행. GraphQL에서 특히 유용. Facebook이 개발한 패턴

Q3. Cache-Aside와 Write-Through 캐싱 패턴의 차이점과 각각의 적합한 사용 사례를 설명하세요.

Cache-Aside (Lazy Loading):

읽기 시 캐시 확인, 미스 시 DB 조회 후 캐시 저장
애플리케이션이 캐시를 직접 관리
첫 번째 요청은 항상 캐시 미스 (Cold Start)
적합: 읽기가 많고, 모든 데이터를 캐싱할 필요가 없는 경우

Write-Through:

쓰기 시 캐시와 DB를 동시에 업데이트
캐시가 항상 최신 상태
쓰기 지연시간 증가 (두 곳에 쓰기)
적합: 데이터 일관성이 중요하고, 읽기가 쓰기보다 훨씬 많은 경우

Write-Behind는 캐시에 먼저 쓰고 DB에는 비동기로 반영하여 쓰기 성능을 극대화하지만, 데이터 손실 리스크가 있습니다.

Q4. 부하 테스트의 6가지 유형(Smoke, Load, Stress, Spike, Soak, Breakpoint)을 각각 언제 사용하는지 설명하세요.

Smoke Test: 최소 부하(1-5 VU)로 시스템 기본 동작을 확인. 배포 후 기본 검증
Load Test: 예상 트래픽 수준에서 성능 확인. SLO 충족 여부 검증
Stress Test: 예상 트래픽을 초과하여 시스템 한계점 탐색. 용량 계획에 활용
Spike Test: 갑작스러운 트래픽 급증(예: 이벤트)에 대한 시스템 반응 확인. 오토스케일링 검증
Soak Test: 장시간(수 시간~하루) 일정 부하를 유지하여 메모리 누수, 커넥션 고갈 등 점진적 문제 발견
Breakpoint Test: 부하를 지속적으로 증가시켜 시스템이 완전히 실패하는 지점 탐색. 절대적 한계 파악

Q5. 커넥션 풀 튜닝에서 고려해야 할 핵심 파라미터와 적절한 풀 크기를 결정하는 방법을 설명하세요.

핵심 파라미터:

maximum-pool-size: 최대 커넥션 수 (과하면 DB 부하, 부족하면 대기)
minimum-idle: 유휴 커넥션 최소 수 (Cold Start 방지)
connection-timeout: 커넥션 획득 대기 시간
idle-timeout: 유휴 커넥션 반환 시간
max-lifetime: 커넥션 최대 수명 (DB 방화벽 타임아웃보다 짧게)

적절한 풀 크기 결정:

HikariCP 공식: connections = ((core_count * 2) + effective_spindle_count)
SSD의 경우: connections = core_count * 2 + 1 정도
일반적으로 10-20이면 충분한 경우가 많음
너무 큰 풀은 오히려 DB의 컨텍스트 스위칭 비용을 증가시킴
모니터링 기반으로 조정: 사용률이 80%를 넘으면 증가 검토, 대기 스레드가 발생하면 즉시 증가

참고 자료

Backend Performance Engineering Complete Guide 2025: Profiling, Load Testing, Bottleneck Analysis, Optimization

1. Performance Engineering Mindset

1.1 Measure First, Optimize Later

The golden rule of performance engineering is "Don't guess, measure." Optimization based on intuition usually wastes time on the wrong areas.

Three Stages of Performance Optimization:

Measure: Quantitatively measure current performance
Analyze: Precisely identify bottleneck points
Optimize: Resolve the most impactful bottlenecks first

1.2 Amdahl's Law

Overall system performance improvement is limited by the proportion of the improvable portion.

Overall Speedup = 1 / ((1 - P) + P / S)

P = Fraction of the program that can be improved
S = Speedup factor of the improved part

Example: Making code that accounts for 20% of total 10x faster
= 1 / ((1 - 0.2) + 0.2 / 10)
= 1 / (0.8 + 0.02)
= 1.22x (22% improvement)

Meanwhile, making code that accounts for 80% of total 2x faster
= 1 / ((1 - 0.8) + 0.8 / 2)
= 1 / (0.2 + 0.4)
= 1.67x (67% improvement)

Key takeaway: Moderately improving a large portion is more effective than dramatically improving a small portion.

1.3 Performance Budget

# Performance Budget Definition Example
performance_budget:
  api_endpoints:
    p50_latency_ms: 50
    p95_latency_ms: 200
    p99_latency_ms: 500
    max_latency_ms: 2000
    error_rate_percent: 0.1
    throughput_rps: 1000

  database:
    query_p95_ms: 50
    query_p99_ms: 200
    connection_pool_utilization: 70
    slow_query_threshold_ms: 100

  external_services:
    p95_latency_ms: 300
    timeout_ms: 5000
    retry_count: 3
    circuit_breaker_threshold: 50

2. Profiling

2.1 CPU Profiling and Flame Graphs

Flame Graphs are powerful tools that visually show where CPU time is being spent.

Node.js CPU Profiling:

// Node.js - Using built-in profiler
// Run: node --prof app.js
// Analyze: node --prof-process isolate-*.log > profile.txt

// Or using v8-profiler-next
const v8Profiler = require('v8-profiler-next');

function startProfiling(durationMs = 30000) {
  const title = `cpu-profile-${Date.now()}`;
  v8Profiler.startProfiling(title, true);

  setTimeout(() => {
    const profile = v8Profiler.stopProfiling(title);
    profile.export((error, result) => {
      if (!error) {
        require('fs').writeFileSync(
          `./profiles/${title}.cpuprofile`,
          result
        );
      }
      profile.delete();
    });
  }, durationMs);
}

// Middleware for profiling specific requests
function profilingMiddleware(req, res, next) {
  if (req.headers['x-profile'] !== 'true') {
    return next();
  }

  const title = `req-${req.method}-${req.path}-${Date.now()}`;
  v8Profiler.startProfiling(title, true);

  const originalEnd = res.end;
  res.end = function (...args) {
    const profile = v8Profiler.stopProfiling(title);
    profile.export((error, result) => {
      if (!error) {
        require('fs').writeFileSync(
          `./profiles/${title}.cpuprofile`,
          result
        );
      }
      profile.delete();
    });
    originalEnd.apply(res, args);
  };

  next();
}

Go CPU Profiling:

package main

import (
    "net/http"
    _ "net/http/pprof"
    "runtime"
)

func main() {
    // Enable pprof endpoints
    go func() {
        http.ListenAndServe("localhost:6060", nil)
    }()

    // Collect CPU profile: go tool pprof http://localhost:6060/debug/pprof/profile?seconds=30
    // Generate Flame Graph: go tool pprof -http=:8080 profile.pb.gz

    runtime.SetBlockProfileRate(1)
    runtime.SetMutexProfileFraction(1)

    // Application logic
    startServer()
}

Python CPU Profiling:

import cProfile
import pstats
from pyinstrument import Profiler

# Using cProfile
def profile_with_cprofile(func):
    def wrapper(*args, **kwargs):
        profiler = cProfile.Profile()
        profiler.enable()
        result = func(*args, **kwargs)
        profiler.disable()

        stats = pstats.Stats(profiler)
        stats.sort_stats('cumulative')
        stats.print_stats(20)  # Top 20 functions
        return result
    return wrapper

# Using pyinstrument (more readable output)
def profile_with_pyinstrument(func):
    def wrapper(*args, **kwargs):
        profiler = Profiler()
        profiler.start()
        result = func(*args, **kwargs)
        profiler.stop()
        print(profiler.output_text(unicode=True))
        return result
    return wrapper

# Django middleware
class ProfilingMiddleware:
    def __init__(self, get_response):
        self.get_response = get_response

    def __call__(self, request):
        if request.META.get('HTTP_X_PROFILE') == 'true':
            profiler = Profiler()
            profiler.start()
            response = self.get_response(request)
            profiler.stop()
            response['X-Profile-Duration'] = str(profiler.last_session.duration)
            profiler.open_in_browser()
            return response
        return self.get_response(request)

2.2 Memory Profiling

// Node.js Heap Snapshot
const v8 = require('v8');

function takeHeapSnapshot() {
  const snapshotStream = v8.writeHeapSnapshot();
  console.log(`Heap snapshot written to: ${snapshotStream}`);
  return snapshotStream;
}

// Memory usage monitoring
function monitorMemory(intervalMs = 5000) {
  setInterval(() => {
    const usage = process.memoryUsage();
    console.log({
      rss_mb: Math.round(usage.rss / 1024 / 1024),
      heapTotal_mb: Math.round(usage.heapTotal / 1024 / 1024),
      heapUsed_mb: Math.round(usage.heapUsed / 1024 / 1024),
      external_mb: Math.round(usage.external / 1024 / 1024),
      arrayBuffers_mb: Math.round(usage.arrayBuffers / 1024 / 1024)
    });
  }, intervalMs);
}

// Memory leak detection pattern
class MemoryLeakDetector {
  constructor(options = {}) {
    this.samples = [];
    this.maxSamples = options.maxSamples || 60;
    this.threshold = options.thresholdMB || 50;
  }

  sample() {
    const usage = process.memoryUsage();
    this.samples.push({
      timestamp: Date.now(),
      heapUsed: usage.heapUsed
    });

    if (this.samples.length > this.maxSamples) {
      this.samples.shift();
    }

    return this.detectLeak();
  }

  detectLeak() {
    if (this.samples.length < 10) return null;

    const first = this.samples[0].heapUsed;
    const last = this.samples[this.samples.length - 1].heapUsed;
    const diffMB = (last - first) / 1024 / 1024;

    let increasing = 0;
    for (let i = 1; i < this.samples.length; i++) {
      if (this.samples[i].heapUsed > this.samples[i - 1].heapUsed) {
        increasing++;
      }
    }

    const increaseRatio = increasing / (this.samples.length - 1);

    if (diffMB > this.threshold && increaseRatio > 0.7) {
      return {
        suspected: true,
        growthMB: diffMB.toFixed(2),
        increaseRatio: increaseRatio.toFixed(2),
        duration: this.samples[this.samples.length - 1].timestamp - this.samples[0].timestamp
      };
    }
    return null;
  }
}

2.3 I/O Profiling

# Python - I/O Profiling
import time
import functools
import logging

logger = logging.getLogger('io_profiler')

class IOProfiler:
    """I/O operation time measurement decorator"""

    _stats = {}

    @classmethod
    def track(cls, operation_name):
        def decorator(func):
            @functools.wraps(func)
            async def async_wrapper(*args, **kwargs):
                start = time.perf_counter()
                try:
                    result = await func(*args, **kwargs)
                    duration = time.perf_counter() - start
                    cls._record(operation_name, duration, success=True)
                    return result
                except Exception as e:
                    duration = time.perf_counter() - start
                    cls._record(operation_name, duration, success=False)
                    raise

            @functools.wraps(func)
            def sync_wrapper(*args, **kwargs):
                start = time.perf_counter()
                try:
                    result = func(*args, **kwargs)
                    duration = time.perf_counter() - start
                    cls._record(operation_name, duration, success=True)
                    return result
                except Exception as e:
                    duration = time.perf_counter() - start
                    cls._record(operation_name, duration, success=False)
                    raise

            import asyncio
            if asyncio.iscoroutinefunction(func):
                return async_wrapper
            return sync_wrapper
        return decorator

    @classmethod
    def _record(cls, name, duration, success):
        if name not in cls._stats:
            cls._stats[name] = {
                'count': 0, 'total_time': 0,
                'min_time': float('inf'), 'max_time': 0,
                'errors': 0
            }
        stats = cls._stats[name]
        stats['count'] += 1
        stats['total_time'] += duration
        stats['min_time'] = min(stats['min_time'], duration)
        stats['max_time'] = max(stats['max_time'], duration)
        if not success:
            stats['errors'] += 1

    @classmethod
    def report(cls):
        for name, stats in sorted(cls._stats.items()):
            avg = stats['total_time'] / stats['count'] if stats['count'] else 0
            logger.info(
                f"{name}: count={stats['count']}, "
                f"avg={avg*1000:.1f}ms, "
                f"min={stats['min_time']*1000:.1f}ms, "
                f"max={stats['max_time']*1000:.1f}ms, "
                f"errors={stats['errors']}"
            )

3. Load Testing

3.1 Load Testing Tools Comparison

Tool	Language	Protocols	Strengths	Weaknesses
k6	JavaScript	HTTP, WebSocket, gRPC	Developer-friendly, CI/CD integration	Limited browser testing
Artillery	JavaScript	HTTP, WebSocket, Socket.io	Config-based, extensible	Complex scenarios difficult
Locust	Python	HTTP	Python scripting, distributed	Limited protocols
Gatling	Scala/Java	HTTP, WebSocket	Detailed reports, JVM performance	Learning curve
JMeter	Java	Various	GUI, various protocols	Resource-heavy, dated

3.2 k6 Script Example

import http from 'k6/http';
import { check, sleep, group } from 'k6';
import { Rate, Trend, Counter } from 'k6/metrics';

// Custom metrics
const errorRate = new Rate('errors');
const apiDuration = new Trend('api_duration', true);
const requestCount = new Counter('requests');

// Test options
export const options = {
  scenarios: {
    // Scenario 1: Normal load test
    normal_load: {
      executor: 'ramping-vus',
      startVUs: 0,
      stages: [
        { duration: '2m', target: 50 },   // Ramp up to 50 VUs
        { duration: '5m', target: 50 },   // Hold at 50 VUs
        { duration: '2m', target: 100 },  // Ramp up to 100 VUs
        { duration: '5m', target: 100 },  // Hold at 100 VUs
        { duration: '2m', target: 0 },    // Ramp down
      ],
    },
    // Scenario 2: Spike test
    spike_test: {
      executor: 'ramping-vus',
      startVUs: 0,
      startTime: '16m',
      stages: [
        { duration: '10s', target: 500 },  // Sudden spike
        { duration: '1m', target: 500 },   // Hold
        { duration: '10s', target: 0 },    // Rapid decrease
      ],
    },
  },
  thresholds: {
    http_req_duration: ['p(95)<200', 'p(99)<500'],
    errors: ['rate<0.01'],
    http_req_failed: ['rate<0.01'],
  },
};

const BASE_URL = __ENV.BASE_URL || 'http://localhost:3000';

export default function () {
  const authToken = login();

  group('API Operations', () => {
    group('List Products', () => {
      const res = http.get(`${BASE_URL}/api/products?page=1&limit=20`, {
        headers: { Authorization: `Bearer ${authToken}` },
        tags: { name: 'GET /api/products' },
      });

      check(res, {
        'status is 200': (r) => r.status === 200,
        'response time OK': (r) => r.timings.duration < 200,
        'has products': (r) => JSON.parse(r.body).data.length > 0,
      });

      errorRate.add(res.status !== 200);
      apiDuration.add(res.timings.duration);
      requestCount.add(1);
    });

    group('Create Order', () => {
      const payload = JSON.stringify({
        productId: Math.floor(Math.random() * 1000) + 1,
        quantity: Math.floor(Math.random() * 5) + 1,
        shippingAddress: '123 Test Street',
      });

      const res = http.post(`${BASE_URL}/api/orders`, payload, {
        headers: {
          Authorization: `Bearer ${authToken}`,
          'Content-Type': 'application/json',
        },
        tags: { name: 'POST /api/orders' },
      });

      check(res, {
        'order created': (r) => r.status === 201,
        'has order id': (r) => JSON.parse(r.body).data.orderId,
      });

      errorRate.add(res.status !== 201);
      apiDuration.add(res.timings.duration);
    });
  });

  sleep(Math.random() * 3 + 1);
}

function login() {
  const res = http.post(`${BASE_URL}/api/auth/login`, JSON.stringify({
    email: `user${__VU}@test.com`,
    password: 'testpassword',
  }), {
    headers: { 'Content-Type': 'application/json' },
    tags: { name: 'POST /api/auth/login' },
  });

  return res.status === 200 ? JSON.parse(res.body).token : '';
}

3.3 Load Test Types

Type	Purpose	VU Pattern	Duration
Smoke	Verify basic operation	1-5	1-5 min
Load	Verify expected traffic handling	Expected levels	15-60 min
Stress	Find breaking points	Above expected	30-60 min
Spike	Test sudden traffic surges	Sudden spike	5-10 min
Soak	Verify long-term stability	Steady level	2-24 hours
Breakpoint	Find system failure point	Continuous increase	Variable

4. Key Performance Metrics

4.1 RED Method

# RED Method Monitoring Implementation
from prometheus_client import Counter, Histogram

# Rate: Requests per second
request_count = Counter(
    'http_requests_total',
    'Total HTTP requests',
    ['method', 'endpoint', 'status']
)

# Errors: Error ratio
error_count = Counter(
    'http_errors_total',
    'Total HTTP errors',
    ['method', 'endpoint', 'error_type']
)

# Duration: Response time distribution
request_duration = Histogram(
    'http_request_duration_seconds',
    'HTTP request duration',
    ['method', 'endpoint'],
    buckets=[0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1.0, 2.5, 5.0, 10.0]
)

4.2 Latency Percentiles

              Mean     p50    p95    p99    p99.9   Max
User Impact   Low      Med    High   High   V.High  Extreme

p50 (median): 50% of requests complete within this time
p95: 95% of requests complete within this time (1 in 20 is slower)
p99: 99% of requests complete within this time (1 in 100 is slower)

Why averages are dangerous:
- Average can be 50ms while p99 is 5000ms
- Every 100th request forces 5-second wait
- Heavy users are more likely to hit high percentiles

5. Common Bottlenecks

5.1 Database Bottlenecks

N+1 Query Problem:

# BAD: N+1 query - 101 queries for 100 orders
orders = Order.objects.all()[:100]
for order in orders:
    print(f"Order {order.id} by {order.user.name}")

# GOOD: Eager loading - 2 queries total
orders = Order.objects.select_related('user').all()[:100]
for order in orders:
    print(f"Order {order.id} by {order.user.name}")

# GOOD: Prefetch (M:N relationships)
orders = Order.objects.prefetch_related('items__product').all()[:100]
for order in orders:
    for item in order.items.all():
        print(f"  - {item.product.name}")

// Node.js + Prisma - N+1 Resolution
// BAD: N+1
const orders = await prisma.order.findMany({ take: 100 });
for (const order of orders) {
  const user = await prisma.user.findUnique({
    where: { id: order.userId }
  });
}

// GOOD: Include (Join)
const orders = await prisma.order.findMany({
  take: 100,
  include: {
    user: true,
    items: { include: { product: true } }
  }
});

// GOOD: DataLoader Pattern
const DataLoader = require('dataloader');

const userLoader = new DataLoader(async (userIds) => {
  const users = await prisma.user.findMany({
    where: { id: { in: [...userIds] } }
  });
  const userMap = new Map(users.map(u => [u.id, u]));
  return userIds.map(id => userMap.get(id));
});

5.2 Missing Indexes and Full Table Scans

-- Detect slow queries (PostgreSQL)
SELECT
  query,
  calls,
  mean_exec_time,
  total_exec_time,
  rows
FROM pg_stat_statements
ORDER BY mean_exec_time DESC
LIMIT 20;

-- Execution plan analysis
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT o.*, u.name
FROM orders o
JOIN users u ON o.user_id = u.id
WHERE o.status = 'pending'
  AND o.created_at > NOW() - INTERVAL '7 days'
ORDER BY o.created_at DESC
LIMIT 50;

-- Create composite index (matching query patterns)
CREATE INDEX CONCURRENTLY idx_orders_status_created
ON orders (status, created_at DESC)
WHERE status IN ('pending', 'processing');

-- Check index utilization
SELECT
  schemaname,
  tablename,
  indexname,
  idx_scan,
  idx_tup_read,
  idx_tup_fetch
FROM pg_stat_user_indexes
ORDER BY idx_scan ASC;

5.3 Connection Pool Exhaustion

# PgBouncer Configuration (PostgreSQL connection pooler)
# pgbouncer.ini
"""
[databases]
mydb = host=127.0.0.1 port=5432 dbname=mydb

[pgbouncer]
listen_port = 6432
listen_addr = 0.0.0.0
auth_type = md5

pool_mode = transaction
default_pool_size = 25
min_pool_size = 5
reserve_pool_size = 5
reserve_pool_timeout = 3
max_client_conn = 1000
max_db_connections = 50

server_idle_timeout = 600
server_lifetime = 3600
stats_period = 60
"""

5.4 Lock Contention

// Go - Lock contention resolution with sharding
package main

import "sync"

// BAD: Global mutex locking entire map
type BadCache struct {
    mu    sync.Mutex
    items map[string]interface{}
}

// GOOD: Sharding to distribute lock contention
type ShardedCache struct {
    shards    [256]shard
    shardMask uint8
}

type shard struct {
    mu    sync.RWMutex
    items map[string]interface{}
}

func NewShardedCache() *ShardedCache {
    c := &ShardedCache{shardMask: 255}
    for i := range c.shards {
        c.shards[i].items = make(map[string]interface{})
    }
    return c
}

func (c *ShardedCache) getShard(key string) *shard {
    hash := fnv32(key)
    return &c.shards[hash&uint32(c.shardMask)]
}

func (c *ShardedCache) Get(key string) (interface{}, bool) {
    s := c.getShard(key)
    s.mu.RLock()
    defer s.mu.RUnlock()
    val, ok := s.items[key]
    return val, ok
}

func (c *ShardedCache) Set(key string, value interface{}) {
    s := c.getShard(key)
    s.mu.Lock()
    defer s.mu.Unlock()
    s.items[key] = value
}

func fnv32(key string) uint32 {
    hash := uint32(2166136261)
    for i := 0; i < len(key); i++ {
        hash *= 16777619
        hash ^= uint32(key[i])
    }
    return hash
}

6. Database Optimization

6.1 Query Optimization Strategies

-- 1. Convert subqueries to JOINs
-- BAD
SELECT * FROM orders
WHERE user_id IN (SELECT id FROM users WHERE status = 'active');

-- GOOD
SELECT o.* FROM orders o
INNER JOIN users u ON o.user_id = u.id
WHERE u.status = 'active';

-- 2. Cursor-based pagination (consistent performance)
-- BAD: OFFSET-based (slow on deep pages)
SELECT * FROM products ORDER BY id LIMIT 20 OFFSET 10000;

-- GOOD: Cursor-based
SELECT * FROM products
WHERE id > 10000
ORDER BY id
LIMIT 20;

-- 3. Partitioning
CREATE TABLE orders (
  id BIGSERIAL,
  user_id BIGINT NOT NULL,
  status VARCHAR(20) NOT NULL,
  created_at TIMESTAMP NOT NULL,
  total_amount DECIMAL(10,2)
) PARTITION BY RANGE (created_at);

CREATE TABLE orders_2025_q1 PARTITION OF orders
  FOR VALUES FROM ('2025-01-01') TO ('2025-04-01');
CREATE TABLE orders_2025_q2 PARTITION OF orders
  FOR VALUES FROM ('2025-04-01') TO ('2025-07-01');

6.2 Read Replicas

# SQLAlchemy - Read/Write Separation
from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker

class DatabaseRouter:
    def __init__(self):
        self.writer = create_engine(
            'postgresql://writer:pass@primary:5432/mydb',
            pool_size=10, max_overflow=20
        )
        self.readers = [
            create_engine(
                f'postgresql://reader:pass@replica{i}:5432/mydb',
                pool_size=10, max_overflow=20
            )
            for i in range(1, 4)  # 3 read replicas
        ]
        self._reader_index = 0

    def get_writer_session(self):
        Session = sessionmaker(bind=self.writer)
        return Session()

    def get_reader_session(self):
        reader = self.readers[self._reader_index % len(self.readers)]
        self._reader_index += 1
        Session = sessionmaker(bind=reader)
        return Session()

7. Caching Strategies

7.1 Cache-Aside Pattern

import redis
import json
from functools import wraps

redis_client = redis.Redis(host='localhost', port=6379, db=0)

class CacheAside:
    """Cache-Aside (Lazy Loading) pattern implementation"""

    @staticmethod
    def cached(key_prefix, ttl_seconds=300):
        def decorator(func):
            @wraps(func)
            async def wrapper(*args, **kwargs):
                cache_key = f"{key_prefix}:{':'.join(str(a) for a in args)}"

                # 1. Check cache
                cached = redis_client.get(cache_key)
                if cached:
                    return json.loads(cached)

                # 2. Cache miss - query DB
                result = await func(*args, **kwargs)

                # 3. Store result in cache
                if result is not None:
                    redis_client.setex(
                        cache_key, ttl_seconds,
                        json.dumps(result, default=str)
                    )
                return result
            return wrapper
        return decorator

    @staticmethod
    def invalidate(key_pattern):
        """Pattern-based cache invalidation"""
        keys = redis_client.keys(key_pattern)
        if keys:
            redis_client.delete(*keys)

7.2 Write-Through and Write-Behind

class WriteThrough:
    """Write-Through: Update cache and DB simultaneously"""

    async def update(self, key, value, ttl=300):
        await self.db.update(key, value)
        redis_client.setex(f"wt:{key}", ttl, json.dumps(value, default=str))

    async def get(self, key):
        cached = redis_client.get(f"wt:{key}")
        if cached:
            return json.loads(cached)
        value = await self.db.get(key)
        if value:
            redis_client.setex(f"wt:{key}", 300, json.dumps(value, default=str))
        return value


class WriteBehind:
    """Write-Behind: Write to cache first, async DB update"""

    def __init__(self):
        self.write_queue = asyncio.Queue()
        self.batch_size = 100
        self.flush_interval = 5  # seconds

    async def update(self, key, value, ttl=300):
        # 1. Write to cache immediately (fast response)
        redis_client.setex(f"wb:{key}", ttl, json.dumps(value, default=str))
        # 2. Add to queue (async DB write)
        await self.write_queue.put((key, value))

    async def flush_worker(self):
        """Background worker: batch write from queue to DB"""
        while True:
            batch = []
            try:
                while len(batch) < self.batch_size:
                    item = await asyncio.wait_for(
                        self.write_queue.get(),
                        timeout=self.flush_interval
                    )
                    batch.append(item)
            except asyncio.TimeoutError:
                pass

            if batch:
                try:
                    await self.db.bulk_update(batch)
                except Exception:
                    for item in batch:
                        await self.write_queue.put(item)
                    await asyncio.sleep(1)

7.3 TTL Strategies and Cache Invalidation

# Tiered TTL Strategy
class TieredTTLCache:
    TTL_CONFIG = {
        # Frequently changing data
        'user:session': 1800,          # 30 min
        'cart:items': 900,             # 15 min

        # Periodically changing data
        'product:detail': 3600,        # 1 hour
        'product:list': 600,           # 10 min
        'search:results': 300,         # 5 min

        # Rarely changing data
        'category:list': 86400,        # 24 hours
        'config:settings': 86400,      # 24 hours
        'static:content': 604800,      # 7 days
    }

    @classmethod
    def get_ttl(cls, key_type):
        return cls.TTL_CONFIG.get(key_type, 300)  # Default 5 min

8. Async Processing

8.1 Message Queue-Based Async Processing

# Celery async task processing
from celery import Celery, chain, group

app = Celery('tasks', broker='redis://localhost:6379/0')

app.conf.update(
    task_serializer='json',
    accept_content=['json'],
    task_acks_late=True,
    worker_prefetch_multiplier=1,
    task_routes={
        'tasks.send_email': {'queue': 'email'},
        'tasks.process_image': {'queue': 'image'},
        'tasks.generate_report': {'queue': 'report'},
    }
)

@app.task(bind=True, max_retries=3, default_retry_delay=60)
def send_email(self, to, subject, body):
    try:
        email_service.send(to=to, subject=subject, body=body)
    except Exception as exc:
        self.retry(exc=exc)

@app.task(bind=True, max_retries=3)
def process_order(self, order_id):
    """Order processing pipeline"""
    try:
        workflow = chain(
            validate_inventory.s(order_id),
            process_payment.s(order_id),
            send_confirmation_email.s(order_id),
            update_analytics.s(order_id)
        )
        workflow.apply_async()
    except Exception as exc:
        self.retry(exc=exc, countdown=30)

@app.task
def bulk_process_orders(order_ids):
    """Parallel batch processing"""
    job = group(process_order.s(oid) for oid in order_ids)
    return job.apply_async()

8.2 Event-Driven Architecture

// Node.js - EventEmitter-based async processing
const EventEmitter = require('events');

class OrderEventBus extends EventEmitter {
  constructor() {
    super();
    this.setMaxListeners(20);
  }
}

const orderBus = new OrderEventBus();

// Register event handlers (separation of concerns)
orderBus.on('order.created', async (order) => {
  await inventoryService.decrementStock(order.items);
});

orderBus.on('order.created', async (order) => {
  await emailService.sendOrderConfirmation(order);
});

orderBus.on('order.created', async (order) => {
  await analyticsService.trackOrder(order);
});

class OrderService {
  async createOrder(orderData) {
    const order = await this.orderRepo.create(orderData);
    orderBus.emit('order.created', order);
    return order; // Fast response
  }
}

9. Batch Optimization

9.1 Bulk Inserts

# SQLAlchemy Bulk Insert Comparison
import time

# BAD: One by one (N INSERT statements)
def insert_one_by_one(session, records):
    start = time.time()
    for record in records:
        session.add(MyModel(**record))
    session.commit()
    print(f"One by one: {time.time() - start:.2f}s")

# GOOD: Bulk insert (1 INSERT statement)
def bulk_insert(session, records):
    start = time.time()
    session.bulk_insert_mappings(MyModel, records)
    session.commit()
    print(f"Bulk insert: {time.time() - start:.2f}s")

# BETTER: execute_values (PostgreSQL, psycopg2)
def execute_values_insert(conn, records):
    start = time.time()
    from psycopg2.extras import execute_values
    cursor = conn.cursor()
    execute_values(
        cursor,
        "INSERT INTO my_table (col1, col2, col3) VALUES %s",
        [(r['col1'], r['col2'], r['col3']) for r in records],
        page_size=1000
    )
    conn.commit()
    print(f"execute_values: {time.time() - start:.2f}s")

# Performance comparison (10,000 records)
# One by one: 12.5s
# Bulk insert: 0.8s
# execute_values: 0.3s

9.2 Batch API Calls

// Optimized batch external API calls
class BatchAPIClient {
  constructor(options = {}) {
    this.batchSize = options.batchSize || 50;
    this.concurrency = options.concurrency || 5;
    this.retryAttempts = options.retryAttempts || 3;
    this.delayBetweenBatches = options.delayMs || 100;
  }

  async processBatch(items, processFn) {
    const results = [];
    const errors = [];

    const batches = [];
    for (let i = 0; i < items.length; i += this.batchSize) {
      batches.push(items.slice(i, i + this.batchSize));
    }

    for (let i = 0; i < batches.length; i += this.concurrency) {
      const concurrentBatches = batches.slice(i, i + this.concurrency);

      const batchResults = await Promise.allSettled(
        concurrentBatches.map(batch => this.processWithRetry(batch, processFn))
      );

      for (const result of batchResults) {
        if (result.status === 'fulfilled') {
          results.push(...result.value);
        } else {
          errors.push(result.reason);
        }
      }

      if (i + this.concurrency < batches.length) {
        await new Promise(r => setTimeout(r, this.delayBetweenBatches));
      }
    }

    return { results, errors, total: items.length, processed: results.length };
  }

  async processWithRetry(batch, processFn, attempt = 1) {
    try {
      return await processFn(batch);
    } catch (error) {
      if (attempt < this.retryAttempts) {
        const delay = Math.pow(2, attempt) * 1000;
        await new Promise(r => setTimeout(r, delay));
        return this.processWithRetry(batch, processFn, attempt + 1);
      }
      throw error;
    }
  }
}

10. HTTP Optimization

10.1 Compression and Protocol Optimization

// Express.js compression configuration
const compression = require('compression');

app.use(compression({
  filter: (req, res) => {
    if (req.headers['x-no-compression']) return false;
    return compression.filter(req, res);
  },
  level: 6,              // Compression level (1-9, 6 is balanced)
  threshold: 1024,       // Compress only above 1KB
  memLevel: 8,
}));

// HTTP/2 server setup
const http2 = require('http2');
const fs = require('fs');

const server = http2.createSecureServer({
  key: fs.readFileSync('server.key'),
  cert: fs.readFileSync('server.crt'),
  allowHTTP1: true,
});

10.2 Keep-Alive and Connection Reuse

# Python requests - Session reuse
import requests
from requests.adapters import HTTPAdapter
from urllib3.util.retry import Retry

# BAD: New connection every time
def fetch_bad(urls):
    results = []
    for url in urls:
        response = requests.get(url)  # TCP handshake each time
        results.append(response.json())
    return results

# GOOD: Session reuse (Keep-Alive)
def fetch_good(urls):
    session = requests.Session()

    retry_strategy = Retry(
        total=3, backoff_factor=0.5,
        status_forcelist=[500, 502, 503, 504]
    )
    adapter = HTTPAdapter(
        max_retries=retry_strategy,
        pool_connections=10,
        pool_maxsize=20,
    )
    session.mount("http://", adapter)
    session.mount("https://", adapter)

    results = []
    for url in urls:
        response = session.get(url)  # Connection reuse
        results.append(response.json())
    session.close()
    return results

11. Application-Level Optimization

11.1 Efficient Serialization

// JSON vs MessagePack comparison
const msgpack = require('msgpack-lite');

const data = {
  users: Array.from({ length: 1000 }, (_, i) => ({
    id: i,
    name: `User ${i}`,
    email: `user${i}@example.com`,
    age: 20 + (i % 50),
    active: i % 3 !== 0,
    tags: ['tag1', 'tag2', 'tag3'],
  }))
};

// JSON
const jsonStr = JSON.stringify(data);
console.log(`JSON size: ${Buffer.byteLength(jsonStr)} bytes`);

// MessagePack
const msgpackBuf = msgpack.encode(data);
console.log(`MessagePack size: ${msgpackBuf.length} bytes`);

// Typical results:
// JSON size: ~120KB, serialize: ~3ms
// MessagePack size: ~85KB, serialize: ~2ms (about 30% smaller)

12. Production Monitoring

12.1 SLO-Based Alerting

# Prometheus alerting rules
groups:
  - name: slo-alerts
    rules:
      # p99 latency SLO violation
      - alert: HighP99Latency
        expr: |
          histogram_quantile(0.99,
            rate(http_request_duration_seconds_bucket[5m])
          ) > 0.5
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "p99 latency exceeds 500ms"

      # Error rate SLO violation
      - alert: HighErrorRate
        expr: |
          sum(rate(http_requests_total{status=~"5.."}[5m]))
          /
          sum(rate(http_requests_total[5m])) > 0.01
        for: 3m
        labels:
          severity: critical
        annotations:
          summary: "Error rate exceeds 1%"

      # Throughput drop
      - alert: ThroughputDrop
        expr: |
          sum(rate(http_requests_total[5m]))
          < 0.5 * sum(rate(http_requests_total[5m] offset 1h))
        for: 5m
        labels:
          severity: warning

      # Connection pool near exhaustion
      - alert: ConnectionPoolExhaustion
        expr: |
          hikaricp_connections_active
          / hikaricp_connections_max > 0.85
        for: 2m
        labels:
          severity: warning

13. Practice Quiz

Q1. What are the implications of Amdahl's Law for performance optimization, and how can it be applied in practice?

Amdahl's Law shows that overall system performance improvement is limited by the proportion of the improvable portion.

Key implications:

No matter how much you speed up a small portion (e.g., 5%) of total execution time, overall improvement is minimal
Making a large portion (e.g., 80%) just 2x faster yields significant overall improvement
Therefore, identify the biggest bottlenecks first through profiling, then focus improvements there

Practical application:

Use profiling (Flame Graphs) to understand execution time distribution
Optimize bottlenecks in order of their proportion
Re-measure after each optimization to identify new bottlenecks
Stop optimizing when within Performance Budget

Q2. What is the N+1 query problem, and describe 3 ways to solve it in ORMs.

N+1 Problem: After querying N parent entities, each parent's child entities are queried individually, resulting in N+1 total queries. Querying 100 orders triggers 101 DB queries.

Solutions:

Eager Loading (select_related/include): Use JOINs to fetch parent and children in one query. Effective for 1:1, N:1 relationships
Prefetch (prefetch_related): Batch-fetch child entities in a separate query, then map in memory. Effective for 1:N, M:N relationships using IN clause
DataLoader Pattern: Automatically batch individual requests into a single query. Especially useful in GraphQL. Pattern developed by Facebook

Q3. Explain the differences between Cache-Aside and Write-Through caching patterns and their suitable use cases.

Cache-Aside (Lazy Loading):

Check cache on read, query DB on miss, then store in cache
Application manages cache directly
First request always has cache miss (Cold Start)
Suitable for: Read-heavy scenarios where not all data needs caching

Write-Through:

Update cache and DB simultaneously on writes
Cache is always up-to-date
Write latency increases (writing to two locations)
Suitable for: When data consistency matters and reads far exceed writes

Write-Behind writes to cache first and updates DB asynchronously, maximizing write performance but risking data loss.

Q4. Explain when to use each of the 6 load test types (Smoke, Load, Stress, Spike, Soak, Breakpoint).

Smoke Test: Minimal load (1-5 VUs) to verify basic system operation. Post-deployment verification
Load Test: Verify performance at expected traffic levels. Check SLO compliance
Stress Test: Exceed expected traffic to find system limits. Used for capacity planning
Spike Test: Test system reaction to sudden traffic surges (e.g., events). Verify auto-scaling
Soak Test: Maintain steady load for hours to find gradual issues like memory leaks or connection exhaustion
Breakpoint Test: Continuously increase load to find absolute system failure point

Q5. What key parameters should be considered for connection pool tuning, and how do you determine the appropriate pool size?

Key parameters:

maximum-pool-size: Maximum connections (too many overloads DB, too few causes waits)
minimum-idle: Minimum idle connections (prevents cold start)
connection-timeout: Wait time for connection acquisition
idle-timeout: Time before idle connection is returned
max-lifetime: Maximum connection lifetime (should be shorter than DB firewall timeout)

Determining appropriate pool size:

HikariCP formula: connections = ((core_count * 2) + effective_spindle_count)
For SSDs: connections = core_count * 2 + 1
Generally 10-20 is sufficient for many cases
Too-large pools actually increase DB context switching costs
Adjust based on monitoring: increase when utilization exceeds 80%, immediately increase when waiting threads appear

백엔드 성능 엔지니어링 완전 가이드 2025: 프로파일링, 부하 테스트, 병목 분석, 최적화

목차

1. 성능 엔지니어링 마인드셋

1.1 측정 먼저, 최적화는 나중에

1.2 암달의 법칙(Amdahl's Law)

1.3 성능 예산(Performance Budget)

2. 프로파일링

2.1 CPU 프로파일링과 Flame Graph

2.2 메모리 프로파일링

2.3 I/O 프로파일링

3. 부하 테스트(Load Testing)

3.1 부하 테스트 도구 비교

3.2 k6 스크립트 예시

3.3 Artillery 설정 예시

3.4 부하 테스트 유형

4. 핵심 성능 메트릭

4.1 RED Method

4.2 지연시간 백분위(Percentiles)

5. 일반적인 병목 지점

5.1 데이터베이스 병목

5.2 인덱스 부재와 전체 테이블 스캔

5.3 커넥션 풀 고갈

5.4 잠금 경합(Lock Contention)

6. 데이터베이스 최적화

6.1 쿼리 최적화 전략

6.2 읽기 복제본(Read Replica)

7. 캐싱 전략

7.1 Cache-Aside 패턴

7.2 Write-Through와 Write-Behind

7.3 TTL 전략과 캐시 무효화

8. 비동기 처리

8.1 메시지 큐 기반 비동기 처리

8.2 이벤트 드리븐 아키텍처

9. 배치 최적화

9.1 벌크 인서트

9.2 배치 API 호출

10. HTTP 최적화

10.1 압축과 프로토콜 최적화

10.2 Keep-Alive와 연결 재사용

11. 애플리케이션 레벨 최적화

11.1 효율적인 직렬화

11.2 Object Pooling

12. 프로덕션 모니터링

12.1 SLO 기반 알림 설정

12.2 대시보드 구성

13. 실전 퀴즈

참고 자료

Backend Performance Engineering Complete Guide 2025: Profiling, Load Testing, Bottleneck Analysis, Optimization

Table of Contents

1. Performance Engineering Mindset

1.1 Measure First, Optimize Later

1.2 Amdahl's Law

1.3 Performance Budget

2. Profiling

2.1 CPU Profiling and Flame Graphs

2.2 Memory Profiling

2.3 I/O Profiling

3. Load Testing

3.1 Load Testing Tools Comparison

3.2 k6 Script Example

3.3 Load Test Types

4. Key Performance Metrics

4.1 RED Method

4.2 Latency Percentiles

5. Common Bottlenecks

5.1 Database Bottlenecks

5.2 Missing Indexes and Full Table Scans

5.3 Connection Pool Exhaustion

5.4 Lock Contention

6. Database Optimization

6.1 Query Optimization Strategies

6.2 Read Replicas

7. Caching Strategies

7.1 Cache-Aside Pattern

7.2 Write-Through and Write-Behind

7.3 TTL Strategies and Cache Invalidation

8. Async Processing

8.1 Message Queue-Based Async Processing

8.2 Event-Driven Architecture

9. Batch Optimization