Skip to content
Published on

Backend Performance Engineering Complete Guide 2025: Profiling, Load Testing, Bottleneck Analysis, Optimization

Authors

Table of Contents

1. Performance Engineering Mindset

1.1 Measure First, Optimize Later

The golden rule of performance engineering is "Don't guess, measure." Optimization based on intuition usually wastes time on the wrong areas.

Three Stages of Performance Optimization:

  1. Measure: Quantitatively measure current performance
  2. Analyze: Precisely identify bottleneck points
  3. Optimize: Resolve the most impactful bottlenecks first

1.2 Amdahl's Law

Overall system performance improvement is limited by the proportion of the improvable portion.

Overall Speedup = 1 / ((1 - P) + P / S)

P = Fraction of the program that can be improved
S = Speedup factor of the improved part

Example: Making code that accounts for 20% of total 10x faster
= 1 / ((1 - 0.2) + 0.2 / 10)
= 1 / (0.8 + 0.02)
= 1.22x (22% improvement)

Meanwhile, making code that accounts for 80% of total 2x faster
= 1 / ((1 - 0.8) + 0.8 / 2)
= 1 / (0.2 + 0.4)
= 1.67x (67% improvement)

Key takeaway: Moderately improving a large portion is more effective than dramatically improving a small portion.

1.3 Performance Budget

# Performance Budget Definition Example
performance_budget:
  api_endpoints:
    p50_latency_ms: 50
    p95_latency_ms: 200
    p99_latency_ms: 500
    max_latency_ms: 2000
    error_rate_percent: 0.1
    throughput_rps: 1000

  database:
    query_p95_ms: 50
    query_p99_ms: 200
    connection_pool_utilization: 70
    slow_query_threshold_ms: 100

  external_services:
    p95_latency_ms: 300
    timeout_ms: 5000
    retry_count: 3
    circuit_breaker_threshold: 50

2. Profiling

2.1 CPU Profiling and Flame Graphs

Flame Graphs are powerful tools that visually show where CPU time is being spent.

Node.js CPU Profiling:

// Node.js - Using built-in profiler
// Run: node --prof app.js
// Analyze: node --prof-process isolate-*.log > profile.txt

// Or using v8-profiler-next
const v8Profiler = require('v8-profiler-next');

function startProfiling(durationMs = 30000) {
  const title = `cpu-profile-${Date.now()}`;
  v8Profiler.startProfiling(title, true);

  setTimeout(() => {
    const profile = v8Profiler.stopProfiling(title);
    profile.export((error, result) => {
      if (!error) {
        require('fs').writeFileSync(
          `./profiles/${title}.cpuprofile`,
          result
        );
      }
      profile.delete();
    });
  }, durationMs);
}

// Middleware for profiling specific requests
function profilingMiddleware(req, res, next) {
  if (req.headers['x-profile'] !== 'true') {
    return next();
  }

  const title = `req-${req.method}-${req.path}-${Date.now()}`;
  v8Profiler.startProfiling(title, true);

  const originalEnd = res.end;
  res.end = function (...args) {
    const profile = v8Profiler.stopProfiling(title);
    profile.export((error, result) => {
      if (!error) {
        require('fs').writeFileSync(
          `./profiles/${title}.cpuprofile`,
          result
        );
      }
      profile.delete();
    });
    originalEnd.apply(res, args);
  };

  next();
}

Go CPU Profiling:

package main

import (
    "net/http"
    _ "net/http/pprof"
    "runtime"
)

func main() {
    // Enable pprof endpoints
    go func() {
        http.ListenAndServe("localhost:6060", nil)
    }()

    // Collect CPU profile: go tool pprof http://localhost:6060/debug/pprof/profile?seconds=30
    // Generate Flame Graph: go tool pprof -http=:8080 profile.pb.gz

    runtime.SetBlockProfileRate(1)
    runtime.SetMutexProfileFraction(1)

    // Application logic
    startServer()
}

Python CPU Profiling:

import cProfile
import pstats
from pyinstrument import Profiler

# Using cProfile
def profile_with_cprofile(func):
    def wrapper(*args, **kwargs):
        profiler = cProfile.Profile()
        profiler.enable()
        result = func(*args, **kwargs)
        profiler.disable()

        stats = pstats.Stats(profiler)
        stats.sort_stats('cumulative')
        stats.print_stats(20)  # Top 20 functions
        return result
    return wrapper

# Using pyinstrument (more readable output)
def profile_with_pyinstrument(func):
    def wrapper(*args, **kwargs):
        profiler = Profiler()
        profiler.start()
        result = func(*args, **kwargs)
        profiler.stop()
        print(profiler.output_text(unicode=True))
        return result
    return wrapper

# Django middleware
class ProfilingMiddleware:
    def __init__(self, get_response):
        self.get_response = get_response

    def __call__(self, request):
        if request.META.get('HTTP_X_PROFILE') == 'true':
            profiler = Profiler()
            profiler.start()
            response = self.get_response(request)
            profiler.stop()
            response['X-Profile-Duration'] = str(profiler.last_session.duration)
            profiler.open_in_browser()
            return response
        return self.get_response(request)

2.2 Memory Profiling

// Node.js Heap Snapshot
const v8 = require('v8');

function takeHeapSnapshot() {
  const snapshotStream = v8.writeHeapSnapshot();
  console.log(`Heap snapshot written to: ${snapshotStream}`);
  return snapshotStream;
}

// Memory usage monitoring
function monitorMemory(intervalMs = 5000) {
  setInterval(() => {
    const usage = process.memoryUsage();
    console.log({
      rss_mb: Math.round(usage.rss / 1024 / 1024),
      heapTotal_mb: Math.round(usage.heapTotal / 1024 / 1024),
      heapUsed_mb: Math.round(usage.heapUsed / 1024 / 1024),
      external_mb: Math.round(usage.external / 1024 / 1024),
      arrayBuffers_mb: Math.round(usage.arrayBuffers / 1024 / 1024)
    });
  }, intervalMs);
}

// Memory leak detection pattern
class MemoryLeakDetector {
  constructor(options = {}) {
    this.samples = [];
    this.maxSamples = options.maxSamples || 60;
    this.threshold = options.thresholdMB || 50;
  }

  sample() {
    const usage = process.memoryUsage();
    this.samples.push({
      timestamp: Date.now(),
      heapUsed: usage.heapUsed
    });

    if (this.samples.length > this.maxSamples) {
      this.samples.shift();
    }

    return this.detectLeak();
  }

  detectLeak() {
    if (this.samples.length < 10) return null;

    const first = this.samples[0].heapUsed;
    const last = this.samples[this.samples.length - 1].heapUsed;
    const diffMB = (last - first) / 1024 / 1024;

    let increasing = 0;
    for (let i = 1; i < this.samples.length; i++) {
      if (this.samples[i].heapUsed > this.samples[i - 1].heapUsed) {
        increasing++;
      }
    }

    const increaseRatio = increasing / (this.samples.length - 1);

    if (diffMB > this.threshold && increaseRatio > 0.7) {
      return {
        suspected: true,
        growthMB: diffMB.toFixed(2),
        increaseRatio: increaseRatio.toFixed(2),
        duration: this.samples[this.samples.length - 1].timestamp - this.samples[0].timestamp
      };
    }
    return null;
  }
}

2.3 I/O Profiling

# Python - I/O Profiling
import time
import functools
import logging

logger = logging.getLogger('io_profiler')

class IOProfiler:
    """I/O operation time measurement decorator"""

    _stats = {}

    @classmethod
    def track(cls, operation_name):
        def decorator(func):
            @functools.wraps(func)
            async def async_wrapper(*args, **kwargs):
                start = time.perf_counter()
                try:
                    result = await func(*args, **kwargs)
                    duration = time.perf_counter() - start
                    cls._record(operation_name, duration, success=True)
                    return result
                except Exception as e:
                    duration = time.perf_counter() - start
                    cls._record(operation_name, duration, success=False)
                    raise

            @functools.wraps(func)
            def sync_wrapper(*args, **kwargs):
                start = time.perf_counter()
                try:
                    result = func(*args, **kwargs)
                    duration = time.perf_counter() - start
                    cls._record(operation_name, duration, success=True)
                    return result
                except Exception as e:
                    duration = time.perf_counter() - start
                    cls._record(operation_name, duration, success=False)
                    raise

            import asyncio
            if asyncio.iscoroutinefunction(func):
                return async_wrapper
            return sync_wrapper
        return decorator

    @classmethod
    def _record(cls, name, duration, success):
        if name not in cls._stats:
            cls._stats[name] = {
                'count': 0, 'total_time': 0,
                'min_time': float('inf'), 'max_time': 0,
                'errors': 0
            }
        stats = cls._stats[name]
        stats['count'] += 1
        stats['total_time'] += duration
        stats['min_time'] = min(stats['min_time'], duration)
        stats['max_time'] = max(stats['max_time'], duration)
        if not success:
            stats['errors'] += 1

    @classmethod
    def report(cls):
        for name, stats in sorted(cls._stats.items()):
            avg = stats['total_time'] / stats['count'] if stats['count'] else 0
            logger.info(
                f"{name}: count={stats['count']}, "
                f"avg={avg*1000:.1f}ms, "
                f"min={stats['min_time']*1000:.1f}ms, "
                f"max={stats['max_time']*1000:.1f}ms, "
                f"errors={stats['errors']}"
            )

3. Load Testing

3.1 Load Testing Tools Comparison

ToolLanguageProtocolsStrengthsWeaknesses
k6JavaScriptHTTP, WebSocket, gRPCDeveloper-friendly, CI/CD integrationLimited browser testing
ArtilleryJavaScriptHTTP, WebSocket, Socket.ioConfig-based, extensibleComplex scenarios difficult
LocustPythonHTTPPython scripting, distributedLimited protocols
GatlingScala/JavaHTTP, WebSocketDetailed reports, JVM performanceLearning curve
JMeterJavaVariousGUI, various protocolsResource-heavy, dated

3.2 k6 Script Example

import http from 'k6/http';
import { check, sleep, group } from 'k6';
import { Rate, Trend, Counter } from 'k6/metrics';

// Custom metrics
const errorRate = new Rate('errors');
const apiDuration = new Trend('api_duration', true);
const requestCount = new Counter('requests');

// Test options
export const options = {
  scenarios: {
    // Scenario 1: Normal load test
    normal_load: {
      executor: 'ramping-vus',
      startVUs: 0,
      stages: [
        { duration: '2m', target: 50 },   // Ramp up to 50 VUs
        { duration: '5m', target: 50 },   // Hold at 50 VUs
        { duration: '2m', target: 100 },  // Ramp up to 100 VUs
        { duration: '5m', target: 100 },  // Hold at 100 VUs
        { duration: '2m', target: 0 },    // Ramp down
      ],
    },
    // Scenario 2: Spike test
    spike_test: {
      executor: 'ramping-vus',
      startVUs: 0,
      startTime: '16m',
      stages: [
        { duration: '10s', target: 500 },  // Sudden spike
        { duration: '1m', target: 500 },   // Hold
        { duration: '10s', target: 0 },    // Rapid decrease
      ],
    },
  },
  thresholds: {
    http_req_duration: ['p(95)<200', 'p(99)<500'],
    errors: ['rate<0.01'],
    http_req_failed: ['rate<0.01'],
  },
};

const BASE_URL = __ENV.BASE_URL || 'http://localhost:3000';

export default function () {
  const authToken = login();

  group('API Operations', () => {
    group('List Products', () => {
      const res = http.get(`${BASE_URL}/api/products?page=1&limit=20`, {
        headers: { Authorization: `Bearer ${authToken}` },
        tags: { name: 'GET /api/products' },
      });

      check(res, {
        'status is 200': (r) => r.status === 200,
        'response time OK': (r) => r.timings.duration < 200,
        'has products': (r) => JSON.parse(r.body).data.length > 0,
      });

      errorRate.add(res.status !== 200);
      apiDuration.add(res.timings.duration);
      requestCount.add(1);
    });

    group('Create Order', () => {
      const payload = JSON.stringify({
        productId: Math.floor(Math.random() * 1000) + 1,
        quantity: Math.floor(Math.random() * 5) + 1,
        shippingAddress: '123 Test Street',
      });

      const res = http.post(`${BASE_URL}/api/orders`, payload, {
        headers: {
          Authorization: `Bearer ${authToken}`,
          'Content-Type': 'application/json',
        },
        tags: { name: 'POST /api/orders' },
      });

      check(res, {
        'order created': (r) => r.status === 201,
        'has order id': (r) => JSON.parse(r.body).data.orderId,
      });

      errorRate.add(res.status !== 201);
      apiDuration.add(res.timings.duration);
    });
  });

  sleep(Math.random() * 3 + 1);
}

function login() {
  const res = http.post(`${BASE_URL}/api/auth/login`, JSON.stringify({
    email: `user${__VU}@test.com`,
    password: 'testpassword',
  }), {
    headers: { 'Content-Type': 'application/json' },
    tags: { name: 'POST /api/auth/login' },
  });

  return res.status === 200 ? JSON.parse(res.body).token : '';
}

3.3 Load Test Types

TypePurposeVU PatternDuration
SmokeVerify basic operation1-51-5 min
LoadVerify expected traffic handlingExpected levels15-60 min
StressFind breaking pointsAbove expected30-60 min
SpikeTest sudden traffic surgesSudden spike5-10 min
SoakVerify long-term stabilitySteady level2-24 hours
BreakpointFind system failure pointContinuous increaseVariable

4. Key Performance Metrics

4.1 RED Method

# RED Method Monitoring Implementation
from prometheus_client import Counter, Histogram

# Rate: Requests per second
request_count = Counter(
    'http_requests_total',
    'Total HTTP requests',
    ['method', 'endpoint', 'status']
)

# Errors: Error ratio
error_count = Counter(
    'http_errors_total',
    'Total HTTP errors',
    ['method', 'endpoint', 'error_type']
)

# Duration: Response time distribution
request_duration = Histogram(
    'http_request_duration_seconds',
    'HTTP request duration',
    ['method', 'endpoint'],
    buckets=[0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1.0, 2.5, 5.0, 10.0]
)

4.2 Latency Percentiles

              Mean     p50    p95    p99    p99.9   Max
User Impact   Low      Med    High   High   V.High  Extreme

p50 (median): 50% of requests complete within this time
p95: 95% of requests complete within this time (1 in 20 is slower)
p99: 99% of requests complete within this time (1 in 100 is slower)

Why averages are dangerous:
- Average can be 50ms while p99 is 5000ms
- Every 100th request forces 5-second wait
- Heavy users are more likely to hit high percentiles

5. Common Bottlenecks

5.1 Database Bottlenecks

N+1 Query Problem:

# BAD: N+1 query - 101 queries for 100 orders
orders = Order.objects.all()[:100]
for order in orders:
    print(f"Order {order.id} by {order.user.name}")

# GOOD: Eager loading - 2 queries total
orders = Order.objects.select_related('user').all()[:100]
for order in orders:
    print(f"Order {order.id} by {order.user.name}")

# GOOD: Prefetch (M:N relationships)
orders = Order.objects.prefetch_related('items__product').all()[:100]
for order in orders:
    for item in order.items.all():
        print(f"  - {item.product.name}")
// Node.js + Prisma - N+1 Resolution
// BAD: N+1
const orders = await prisma.order.findMany({ take: 100 });
for (const order of orders) {
  const user = await prisma.user.findUnique({
    where: { id: order.userId }
  });
}

// GOOD: Include (Join)
const orders = await prisma.order.findMany({
  take: 100,
  include: {
    user: true,
    items: { include: { product: true } }
  }
});

// GOOD: DataLoader Pattern
const DataLoader = require('dataloader');

const userLoader = new DataLoader(async (userIds) => {
  const users = await prisma.user.findMany({
    where: { id: { in: [...userIds] } }
  });
  const userMap = new Map(users.map(u => [u.id, u]));
  return userIds.map(id => userMap.get(id));
});

5.2 Missing Indexes and Full Table Scans

-- Detect slow queries (PostgreSQL)
SELECT
  query,
  calls,
  mean_exec_time,
  total_exec_time,
  rows
FROM pg_stat_statements
ORDER BY mean_exec_time DESC
LIMIT 20;

-- Execution plan analysis
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT o.*, u.name
FROM orders o
JOIN users u ON o.user_id = u.id
WHERE o.status = 'pending'
  AND o.created_at > NOW() - INTERVAL '7 days'
ORDER BY o.created_at DESC
LIMIT 50;

-- Create composite index (matching query patterns)
CREATE INDEX CONCURRENTLY idx_orders_status_created
ON orders (status, created_at DESC)
WHERE status IN ('pending', 'processing');

-- Check index utilization
SELECT
  schemaname,
  tablename,
  indexname,
  idx_scan,
  idx_tup_read,
  idx_tup_fetch
FROM pg_stat_user_indexes
ORDER BY idx_scan ASC;

5.3 Connection Pool Exhaustion

# PgBouncer Configuration (PostgreSQL connection pooler)
# pgbouncer.ini
"""
[databases]
mydb = host=127.0.0.1 port=5432 dbname=mydb

[pgbouncer]
listen_port = 6432
listen_addr = 0.0.0.0
auth_type = md5

pool_mode = transaction
default_pool_size = 25
min_pool_size = 5
reserve_pool_size = 5
reserve_pool_timeout = 3
max_client_conn = 1000
max_db_connections = 50

server_idle_timeout = 600
server_lifetime = 3600
stats_period = 60
"""

5.4 Lock Contention

// Go - Lock contention resolution with sharding
package main

import "sync"

// BAD: Global mutex locking entire map
type BadCache struct {
    mu    sync.Mutex
    items map[string]interface{}
}

// GOOD: Sharding to distribute lock contention
type ShardedCache struct {
    shards    [256]shard
    shardMask uint8
}

type shard struct {
    mu    sync.RWMutex
    items map[string]interface{}
}

func NewShardedCache() *ShardedCache {
    c := &ShardedCache{shardMask: 255}
    for i := range c.shards {
        c.shards[i].items = make(map[string]interface{})
    }
    return c
}

func (c *ShardedCache) getShard(key string) *shard {
    hash := fnv32(key)
    return &c.shards[hash&uint32(c.shardMask)]
}

func (c *ShardedCache) Get(key string) (interface{}, bool) {
    s := c.getShard(key)
    s.mu.RLock()
    defer s.mu.RUnlock()
    val, ok := s.items[key]
    return val, ok
}

func (c *ShardedCache) Set(key string, value interface{}) {
    s := c.getShard(key)
    s.mu.Lock()
    defer s.mu.Unlock()
    s.items[key] = value
}

func fnv32(key string) uint32 {
    hash := uint32(2166136261)
    for i := 0; i < len(key); i++ {
        hash *= 16777619
        hash ^= uint32(key[i])
    }
    return hash
}

6. Database Optimization

6.1 Query Optimization Strategies

-- 1. Convert subqueries to JOINs
-- BAD
SELECT * FROM orders
WHERE user_id IN (SELECT id FROM users WHERE status = 'active');

-- GOOD
SELECT o.* FROM orders o
INNER JOIN users u ON o.user_id = u.id
WHERE u.status = 'active';

-- 2. Cursor-based pagination (consistent performance)
-- BAD: OFFSET-based (slow on deep pages)
SELECT * FROM products ORDER BY id LIMIT 20 OFFSET 10000;

-- GOOD: Cursor-based
SELECT * FROM products
WHERE id > 10000
ORDER BY id
LIMIT 20;

-- 3. Partitioning
CREATE TABLE orders (
  id BIGSERIAL,
  user_id BIGINT NOT NULL,
  status VARCHAR(20) NOT NULL,
  created_at TIMESTAMP NOT NULL,
  total_amount DECIMAL(10,2)
) PARTITION BY RANGE (created_at);

CREATE TABLE orders_2025_q1 PARTITION OF orders
  FOR VALUES FROM ('2025-01-01') TO ('2025-04-01');
CREATE TABLE orders_2025_q2 PARTITION OF orders
  FOR VALUES FROM ('2025-04-01') TO ('2025-07-01');

6.2 Read Replicas

# SQLAlchemy - Read/Write Separation
from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker

class DatabaseRouter:
    def __init__(self):
        self.writer = create_engine(
            'postgresql://writer:pass@primary:5432/mydb',
            pool_size=10, max_overflow=20
        )
        self.readers = [
            create_engine(
                f'postgresql://reader:pass@replica{i}:5432/mydb',
                pool_size=10, max_overflow=20
            )
            for i in range(1, 4)  # 3 read replicas
        ]
        self._reader_index = 0

    def get_writer_session(self):
        Session = sessionmaker(bind=self.writer)
        return Session()

    def get_reader_session(self):
        reader = self.readers[self._reader_index % len(self.readers)]
        self._reader_index += 1
        Session = sessionmaker(bind=reader)
        return Session()

7. Caching Strategies

7.1 Cache-Aside Pattern

import redis
import json
from functools import wraps

redis_client = redis.Redis(host='localhost', port=6379, db=0)

class CacheAside:
    """Cache-Aside (Lazy Loading) pattern implementation"""

    @staticmethod
    def cached(key_prefix, ttl_seconds=300):
        def decorator(func):
            @wraps(func)
            async def wrapper(*args, **kwargs):
                cache_key = f"{key_prefix}:{':'.join(str(a) for a in args)}"

                # 1. Check cache
                cached = redis_client.get(cache_key)
                if cached:
                    return json.loads(cached)

                # 2. Cache miss - query DB
                result = await func(*args, **kwargs)

                # 3. Store result in cache
                if result is not None:
                    redis_client.setex(
                        cache_key, ttl_seconds,
                        json.dumps(result, default=str)
                    )
                return result
            return wrapper
        return decorator

    @staticmethod
    def invalidate(key_pattern):
        """Pattern-based cache invalidation"""
        keys = redis_client.keys(key_pattern)
        if keys:
            redis_client.delete(*keys)

7.2 Write-Through and Write-Behind

class WriteThrough:
    """Write-Through: Update cache and DB simultaneously"""

    async def update(self, key, value, ttl=300):
        await self.db.update(key, value)
        redis_client.setex(f"wt:{key}", ttl, json.dumps(value, default=str))

    async def get(self, key):
        cached = redis_client.get(f"wt:{key}")
        if cached:
            return json.loads(cached)
        value = await self.db.get(key)
        if value:
            redis_client.setex(f"wt:{key}", 300, json.dumps(value, default=str))
        return value


class WriteBehind:
    """Write-Behind: Write to cache first, async DB update"""

    def __init__(self):
        self.write_queue = asyncio.Queue()
        self.batch_size = 100
        self.flush_interval = 5  # seconds

    async def update(self, key, value, ttl=300):
        # 1. Write to cache immediately (fast response)
        redis_client.setex(f"wb:{key}", ttl, json.dumps(value, default=str))
        # 2. Add to queue (async DB write)
        await self.write_queue.put((key, value))

    async def flush_worker(self):
        """Background worker: batch write from queue to DB"""
        while True:
            batch = []
            try:
                while len(batch) < self.batch_size:
                    item = await asyncio.wait_for(
                        self.write_queue.get(),
                        timeout=self.flush_interval
                    )
                    batch.append(item)
            except asyncio.TimeoutError:
                pass

            if batch:
                try:
                    await self.db.bulk_update(batch)
                except Exception:
                    for item in batch:
                        await self.write_queue.put(item)
                    await asyncio.sleep(1)

7.3 TTL Strategies and Cache Invalidation

# Tiered TTL Strategy
class TieredTTLCache:
    TTL_CONFIG = {
        # Frequently changing data
        'user:session': 1800,          # 30 min
        'cart:items': 900,             # 15 min

        # Periodically changing data
        'product:detail': 3600,        # 1 hour
        'product:list': 600,           # 10 min
        'search:results': 300,         # 5 min

        # Rarely changing data
        'category:list': 86400,        # 24 hours
        'config:settings': 86400,      # 24 hours
        'static:content': 604800,      # 7 days
    }

    @classmethod
    def get_ttl(cls, key_type):
        return cls.TTL_CONFIG.get(key_type, 300)  # Default 5 min

8. Async Processing

8.1 Message Queue-Based Async Processing

# Celery async task processing
from celery import Celery, chain, group

app = Celery('tasks', broker='redis://localhost:6379/0')

app.conf.update(
    task_serializer='json',
    accept_content=['json'],
    task_acks_late=True,
    worker_prefetch_multiplier=1,
    task_routes={
        'tasks.send_email': {'queue': 'email'},
        'tasks.process_image': {'queue': 'image'},
        'tasks.generate_report': {'queue': 'report'},
    }
)

@app.task(bind=True, max_retries=3, default_retry_delay=60)
def send_email(self, to, subject, body):
    try:
        email_service.send(to=to, subject=subject, body=body)
    except Exception as exc:
        self.retry(exc=exc)

@app.task(bind=True, max_retries=3)
def process_order(self, order_id):
    """Order processing pipeline"""
    try:
        workflow = chain(
            validate_inventory.s(order_id),
            process_payment.s(order_id),
            send_confirmation_email.s(order_id),
            update_analytics.s(order_id)
        )
        workflow.apply_async()
    except Exception as exc:
        self.retry(exc=exc, countdown=30)

@app.task
def bulk_process_orders(order_ids):
    """Parallel batch processing"""
    job = group(process_order.s(oid) for oid in order_ids)
    return job.apply_async()

8.2 Event-Driven Architecture

// Node.js - EventEmitter-based async processing
const EventEmitter = require('events');

class OrderEventBus extends EventEmitter {
  constructor() {
    super();
    this.setMaxListeners(20);
  }
}

const orderBus = new OrderEventBus();

// Register event handlers (separation of concerns)
orderBus.on('order.created', async (order) => {
  await inventoryService.decrementStock(order.items);
});

orderBus.on('order.created', async (order) => {
  await emailService.sendOrderConfirmation(order);
});

orderBus.on('order.created', async (order) => {
  await analyticsService.trackOrder(order);
});

class OrderService {
  async createOrder(orderData) {
    const order = await this.orderRepo.create(orderData);
    orderBus.emit('order.created', order);
    return order; // Fast response
  }
}

9. Batch Optimization

9.1 Bulk Inserts

# SQLAlchemy Bulk Insert Comparison
import time

# BAD: One by one (N INSERT statements)
def insert_one_by_one(session, records):
    start = time.time()
    for record in records:
        session.add(MyModel(**record))
    session.commit()
    print(f"One by one: {time.time() - start:.2f}s")

# GOOD: Bulk insert (1 INSERT statement)
def bulk_insert(session, records):
    start = time.time()
    session.bulk_insert_mappings(MyModel, records)
    session.commit()
    print(f"Bulk insert: {time.time() - start:.2f}s")

# BETTER: execute_values (PostgreSQL, psycopg2)
def execute_values_insert(conn, records):
    start = time.time()
    from psycopg2.extras import execute_values
    cursor = conn.cursor()
    execute_values(
        cursor,
        "INSERT INTO my_table (col1, col2, col3) VALUES %s",
        [(r['col1'], r['col2'], r['col3']) for r in records],
        page_size=1000
    )
    conn.commit()
    print(f"execute_values: {time.time() - start:.2f}s")

# Performance comparison (10,000 records)
# One by one: 12.5s
# Bulk insert: 0.8s
# execute_values: 0.3s

9.2 Batch API Calls

// Optimized batch external API calls
class BatchAPIClient {
  constructor(options = {}) {
    this.batchSize = options.batchSize || 50;
    this.concurrency = options.concurrency || 5;
    this.retryAttempts = options.retryAttempts || 3;
    this.delayBetweenBatches = options.delayMs || 100;
  }

  async processBatch(items, processFn) {
    const results = [];
    const errors = [];

    const batches = [];
    for (let i = 0; i < items.length; i += this.batchSize) {
      batches.push(items.slice(i, i + this.batchSize));
    }

    for (let i = 0; i < batches.length; i += this.concurrency) {
      const concurrentBatches = batches.slice(i, i + this.concurrency);

      const batchResults = await Promise.allSettled(
        concurrentBatches.map(batch => this.processWithRetry(batch, processFn))
      );

      for (const result of batchResults) {
        if (result.status === 'fulfilled') {
          results.push(...result.value);
        } else {
          errors.push(result.reason);
        }
      }

      if (i + this.concurrency < batches.length) {
        await new Promise(r => setTimeout(r, this.delayBetweenBatches));
      }
    }

    return { results, errors, total: items.length, processed: results.length };
  }

  async processWithRetry(batch, processFn, attempt = 1) {
    try {
      return await processFn(batch);
    } catch (error) {
      if (attempt < this.retryAttempts) {
        const delay = Math.pow(2, attempt) * 1000;
        await new Promise(r => setTimeout(r, delay));
        return this.processWithRetry(batch, processFn, attempt + 1);
      }
      throw error;
    }
  }
}

10. HTTP Optimization

10.1 Compression and Protocol Optimization

// Express.js compression configuration
const compression = require('compression');

app.use(compression({
  filter: (req, res) => {
    if (req.headers['x-no-compression']) return false;
    return compression.filter(req, res);
  },
  level: 6,              // Compression level (1-9, 6 is balanced)
  threshold: 1024,       // Compress only above 1KB
  memLevel: 8,
}));

// HTTP/2 server setup
const http2 = require('http2');
const fs = require('fs');

const server = http2.createSecureServer({
  key: fs.readFileSync('server.key'),
  cert: fs.readFileSync('server.crt'),
  allowHTTP1: true,
});

10.2 Keep-Alive and Connection Reuse

# Python requests - Session reuse
import requests
from requests.adapters import HTTPAdapter
from urllib3.util.retry import Retry

# BAD: New connection every time
def fetch_bad(urls):
    results = []
    for url in urls:
        response = requests.get(url)  # TCP handshake each time
        results.append(response.json())
    return results

# GOOD: Session reuse (Keep-Alive)
def fetch_good(urls):
    session = requests.Session()

    retry_strategy = Retry(
        total=3, backoff_factor=0.5,
        status_forcelist=[500, 502, 503, 504]
    )
    adapter = HTTPAdapter(
        max_retries=retry_strategy,
        pool_connections=10,
        pool_maxsize=20,
    )
    session.mount("http://", adapter)
    session.mount("https://", adapter)

    results = []
    for url in urls:
        response = session.get(url)  # Connection reuse
        results.append(response.json())
    session.close()
    return results

11. Application-Level Optimization

11.1 Efficient Serialization

// JSON vs MessagePack comparison
const msgpack = require('msgpack-lite');

const data = {
  users: Array.from({ length: 1000 }, (_, i) => ({
    id: i,
    name: `User ${i}`,
    email: `user${i}@example.com`,
    age: 20 + (i % 50),
    active: i % 3 !== 0,
    tags: ['tag1', 'tag2', 'tag3'],
  }))
};

// JSON
const jsonStr = JSON.stringify(data);
console.log(`JSON size: ${Buffer.byteLength(jsonStr)} bytes`);

// MessagePack
const msgpackBuf = msgpack.encode(data);
console.log(`MessagePack size: ${msgpackBuf.length} bytes`);

// Typical results:
// JSON size: ~120KB, serialize: ~3ms
// MessagePack size: ~85KB, serialize: ~2ms (about 30% smaller)

12. Production Monitoring

12.1 SLO-Based Alerting

# Prometheus alerting rules
groups:
  - name: slo-alerts
    rules:
      # p99 latency SLO violation
      - alert: HighP99Latency
        expr: |
          histogram_quantile(0.99,
            rate(http_request_duration_seconds_bucket[5m])
          ) > 0.5
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "p99 latency exceeds 500ms"

      # Error rate SLO violation
      - alert: HighErrorRate
        expr: |
          sum(rate(http_requests_total{status=~"5.."}[5m]))
          /
          sum(rate(http_requests_total[5m])) > 0.01
        for: 3m
        labels:
          severity: critical
        annotations:
          summary: "Error rate exceeds 1%"

      # Throughput drop
      - alert: ThroughputDrop
        expr: |
          sum(rate(http_requests_total[5m]))
          < 0.5 * sum(rate(http_requests_total[5m] offset 1h))
        for: 5m
        labels:
          severity: warning

      # Connection pool near exhaustion
      - alert: ConnectionPoolExhaustion
        expr: |
          hikaricp_connections_active
          / hikaricp_connections_max > 0.85
        for: 2m
        labels:
          severity: warning

13. Practice Quiz

Q1. What are the implications of Amdahl's Law for performance optimization, and how can it be applied in practice?

Amdahl's Law shows that overall system performance improvement is limited by the proportion of the improvable portion.

Key implications:

  • No matter how much you speed up a small portion (e.g., 5%) of total execution time, overall improvement is minimal
  • Making a large portion (e.g., 80%) just 2x faster yields significant overall improvement
  • Therefore, identify the biggest bottlenecks first through profiling, then focus improvements there

Practical application:

  1. Use profiling (Flame Graphs) to understand execution time distribution
  2. Optimize bottlenecks in order of their proportion
  3. Re-measure after each optimization to identify new bottlenecks
  4. Stop optimizing when within Performance Budget
Q2. What is the N+1 query problem, and describe 3 ways to solve it in ORMs.

N+1 Problem: After querying N parent entities, each parent's child entities are queried individually, resulting in N+1 total queries. Querying 100 orders triggers 101 DB queries.

Solutions:

  1. Eager Loading (select_related/include): Use JOINs to fetch parent and children in one query. Effective for 1:1, N:1 relationships
  2. Prefetch (prefetch_related): Batch-fetch child entities in a separate query, then map in memory. Effective for 1:N, M:N relationships using IN clause
  3. DataLoader Pattern: Automatically batch individual requests into a single query. Especially useful in GraphQL. Pattern developed by Facebook
Q3. Explain the differences between Cache-Aside and Write-Through caching patterns and their suitable use cases.

Cache-Aside (Lazy Loading):

  • Check cache on read, query DB on miss, then store in cache
  • Application manages cache directly
  • First request always has cache miss (Cold Start)
  • Suitable for: Read-heavy scenarios where not all data needs caching

Write-Through:

  • Update cache and DB simultaneously on writes
  • Cache is always up-to-date
  • Write latency increases (writing to two locations)
  • Suitable for: When data consistency matters and reads far exceed writes

Write-Behind writes to cache first and updates DB asynchronously, maximizing write performance but risking data loss.

Q4. Explain when to use each of the 6 load test types (Smoke, Load, Stress, Spike, Soak, Breakpoint).
  1. Smoke Test: Minimal load (1-5 VUs) to verify basic system operation. Post-deployment verification
  2. Load Test: Verify performance at expected traffic levels. Check SLO compliance
  3. Stress Test: Exceed expected traffic to find system limits. Used for capacity planning
  4. Spike Test: Test system reaction to sudden traffic surges (e.g., events). Verify auto-scaling
  5. Soak Test: Maintain steady load for hours to find gradual issues like memory leaks or connection exhaustion
  6. Breakpoint Test: Continuously increase load to find absolute system failure point
Q5. What key parameters should be considered for connection pool tuning, and how do you determine the appropriate pool size?

Key parameters:

  • maximum-pool-size: Maximum connections (too many overloads DB, too few causes waits)
  • minimum-idle: Minimum idle connections (prevents cold start)
  • connection-timeout: Wait time for connection acquisition
  • idle-timeout: Time before idle connection is returned
  • max-lifetime: Maximum connection lifetime (should be shorter than DB firewall timeout)

Determining appropriate pool size:

  • HikariCP formula: connections = ((core_count * 2) + effective_spindle_count)
  • For SSDs: connections = core_count * 2 + 1
  • Generally 10-20 is sufficient for many cases
  • Too-large pools actually increase DB context switching costs
  • Adjust based on monitoring: increase when utilization exceeds 80%, immediately increase when waiting threads appear

References

  1. Google SRE Book - Performance Engineering
  2. k6 Documentation
  3. Artillery Documentation
  4. Brendan Gregg - Systems Performance
  5. Flame Graphs
  6. HikariCP - About Pool Sizing
  7. Redis Best Practices
  8. PostgreSQL Performance Tips
  9. Node.js Diagnostics Guide
  10. Go pprof Documentation
  11. Python cProfile Documentation
  12. Prometheus Monitoring
  13. DataLoader Pattern