- Published on
Backend Performance Engineering Complete Guide 2025: Profiling, Load Testing, Bottleneck Analysis, Optimization
- Authors

- Name
- Youngju Kim
- @fjvbn20031
Table of Contents
1. Performance Engineering Mindset
1.1 Measure First, Optimize Later
The golden rule of performance engineering is "Don't guess, measure." Optimization based on intuition usually wastes time on the wrong areas.
Three Stages of Performance Optimization:
- Measure: Quantitatively measure current performance
- Analyze: Precisely identify bottleneck points
- Optimize: Resolve the most impactful bottlenecks first
1.2 Amdahl's Law
Overall system performance improvement is limited by the proportion of the improvable portion.
Overall Speedup = 1 / ((1 - P) + P / S)
P = Fraction of the program that can be improved
S = Speedup factor of the improved part
Example: Making code that accounts for 20% of total 10x faster
= 1 / ((1 - 0.2) + 0.2 / 10)
= 1 / (0.8 + 0.02)
= 1.22x (22% improvement)
Meanwhile, making code that accounts for 80% of total 2x faster
= 1 / ((1 - 0.8) + 0.8 / 2)
= 1 / (0.2 + 0.4)
= 1.67x (67% improvement)
Key takeaway: Moderately improving a large portion is more effective than dramatically improving a small portion.
1.3 Performance Budget
# Performance Budget Definition Example
performance_budget:
api_endpoints:
p50_latency_ms: 50
p95_latency_ms: 200
p99_latency_ms: 500
max_latency_ms: 2000
error_rate_percent: 0.1
throughput_rps: 1000
database:
query_p95_ms: 50
query_p99_ms: 200
connection_pool_utilization: 70
slow_query_threshold_ms: 100
external_services:
p95_latency_ms: 300
timeout_ms: 5000
retry_count: 3
circuit_breaker_threshold: 50
2. Profiling
2.1 CPU Profiling and Flame Graphs
Flame Graphs are powerful tools that visually show where CPU time is being spent.
Node.js CPU Profiling:
// Node.js - Using built-in profiler
// Run: node --prof app.js
// Analyze: node --prof-process isolate-*.log > profile.txt
// Or using v8-profiler-next
const v8Profiler = require('v8-profiler-next');
function startProfiling(durationMs = 30000) {
const title = `cpu-profile-${Date.now()}`;
v8Profiler.startProfiling(title, true);
setTimeout(() => {
const profile = v8Profiler.stopProfiling(title);
profile.export((error, result) => {
if (!error) {
require('fs').writeFileSync(
`./profiles/${title}.cpuprofile`,
result
);
}
profile.delete();
});
}, durationMs);
}
// Middleware for profiling specific requests
function profilingMiddleware(req, res, next) {
if (req.headers['x-profile'] !== 'true') {
return next();
}
const title = `req-${req.method}-${req.path}-${Date.now()}`;
v8Profiler.startProfiling(title, true);
const originalEnd = res.end;
res.end = function (...args) {
const profile = v8Profiler.stopProfiling(title);
profile.export((error, result) => {
if (!error) {
require('fs').writeFileSync(
`./profiles/${title}.cpuprofile`,
result
);
}
profile.delete();
});
originalEnd.apply(res, args);
};
next();
}
Go CPU Profiling:
package main
import (
"net/http"
_ "net/http/pprof"
"runtime"
)
func main() {
// Enable pprof endpoints
go func() {
http.ListenAndServe("localhost:6060", nil)
}()
// Collect CPU profile: go tool pprof http://localhost:6060/debug/pprof/profile?seconds=30
// Generate Flame Graph: go tool pprof -http=:8080 profile.pb.gz
runtime.SetBlockProfileRate(1)
runtime.SetMutexProfileFraction(1)
// Application logic
startServer()
}
Python CPU Profiling:
import cProfile
import pstats
from pyinstrument import Profiler
# Using cProfile
def profile_with_cprofile(func):
def wrapper(*args, **kwargs):
profiler = cProfile.Profile()
profiler.enable()
result = func(*args, **kwargs)
profiler.disable()
stats = pstats.Stats(profiler)
stats.sort_stats('cumulative')
stats.print_stats(20) # Top 20 functions
return result
return wrapper
# Using pyinstrument (more readable output)
def profile_with_pyinstrument(func):
def wrapper(*args, **kwargs):
profiler = Profiler()
profiler.start()
result = func(*args, **kwargs)
profiler.stop()
print(profiler.output_text(unicode=True))
return result
return wrapper
# Django middleware
class ProfilingMiddleware:
def __init__(self, get_response):
self.get_response = get_response
def __call__(self, request):
if request.META.get('HTTP_X_PROFILE') == 'true':
profiler = Profiler()
profiler.start()
response = self.get_response(request)
profiler.stop()
response['X-Profile-Duration'] = str(profiler.last_session.duration)
profiler.open_in_browser()
return response
return self.get_response(request)
2.2 Memory Profiling
// Node.js Heap Snapshot
const v8 = require('v8');
function takeHeapSnapshot() {
const snapshotStream = v8.writeHeapSnapshot();
console.log(`Heap snapshot written to: ${snapshotStream}`);
return snapshotStream;
}
// Memory usage monitoring
function monitorMemory(intervalMs = 5000) {
setInterval(() => {
const usage = process.memoryUsage();
console.log({
rss_mb: Math.round(usage.rss / 1024 / 1024),
heapTotal_mb: Math.round(usage.heapTotal / 1024 / 1024),
heapUsed_mb: Math.round(usage.heapUsed / 1024 / 1024),
external_mb: Math.round(usage.external / 1024 / 1024),
arrayBuffers_mb: Math.round(usage.arrayBuffers / 1024 / 1024)
});
}, intervalMs);
}
// Memory leak detection pattern
class MemoryLeakDetector {
constructor(options = {}) {
this.samples = [];
this.maxSamples = options.maxSamples || 60;
this.threshold = options.thresholdMB || 50;
}
sample() {
const usage = process.memoryUsage();
this.samples.push({
timestamp: Date.now(),
heapUsed: usage.heapUsed
});
if (this.samples.length > this.maxSamples) {
this.samples.shift();
}
return this.detectLeak();
}
detectLeak() {
if (this.samples.length < 10) return null;
const first = this.samples[0].heapUsed;
const last = this.samples[this.samples.length - 1].heapUsed;
const diffMB = (last - first) / 1024 / 1024;
let increasing = 0;
for (let i = 1; i < this.samples.length; i++) {
if (this.samples[i].heapUsed > this.samples[i - 1].heapUsed) {
increasing++;
}
}
const increaseRatio = increasing / (this.samples.length - 1);
if (diffMB > this.threshold && increaseRatio > 0.7) {
return {
suspected: true,
growthMB: diffMB.toFixed(2),
increaseRatio: increaseRatio.toFixed(2),
duration: this.samples[this.samples.length - 1].timestamp - this.samples[0].timestamp
};
}
return null;
}
}
2.3 I/O Profiling
# Python - I/O Profiling
import time
import functools
import logging
logger = logging.getLogger('io_profiler')
class IOProfiler:
"""I/O operation time measurement decorator"""
_stats = {}
@classmethod
def track(cls, operation_name):
def decorator(func):
@functools.wraps(func)
async def async_wrapper(*args, **kwargs):
start = time.perf_counter()
try:
result = await func(*args, **kwargs)
duration = time.perf_counter() - start
cls._record(operation_name, duration, success=True)
return result
except Exception as e:
duration = time.perf_counter() - start
cls._record(operation_name, duration, success=False)
raise
@functools.wraps(func)
def sync_wrapper(*args, **kwargs):
start = time.perf_counter()
try:
result = func(*args, **kwargs)
duration = time.perf_counter() - start
cls._record(operation_name, duration, success=True)
return result
except Exception as e:
duration = time.perf_counter() - start
cls._record(operation_name, duration, success=False)
raise
import asyncio
if asyncio.iscoroutinefunction(func):
return async_wrapper
return sync_wrapper
return decorator
@classmethod
def _record(cls, name, duration, success):
if name not in cls._stats:
cls._stats[name] = {
'count': 0, 'total_time': 0,
'min_time': float('inf'), 'max_time': 0,
'errors': 0
}
stats = cls._stats[name]
stats['count'] += 1
stats['total_time'] += duration
stats['min_time'] = min(stats['min_time'], duration)
stats['max_time'] = max(stats['max_time'], duration)
if not success:
stats['errors'] += 1
@classmethod
def report(cls):
for name, stats in sorted(cls._stats.items()):
avg = stats['total_time'] / stats['count'] if stats['count'] else 0
logger.info(
f"{name}: count={stats['count']}, "
f"avg={avg*1000:.1f}ms, "
f"min={stats['min_time']*1000:.1f}ms, "
f"max={stats['max_time']*1000:.1f}ms, "
f"errors={stats['errors']}"
)
3. Load Testing
3.1 Load Testing Tools Comparison
| Tool | Language | Protocols | Strengths | Weaknesses |
|---|---|---|---|---|
| k6 | JavaScript | HTTP, WebSocket, gRPC | Developer-friendly, CI/CD integration | Limited browser testing |
| Artillery | JavaScript | HTTP, WebSocket, Socket.io | Config-based, extensible | Complex scenarios difficult |
| Locust | Python | HTTP | Python scripting, distributed | Limited protocols |
| Gatling | Scala/Java | HTTP, WebSocket | Detailed reports, JVM performance | Learning curve |
| JMeter | Java | Various | GUI, various protocols | Resource-heavy, dated |
3.2 k6 Script Example
import http from 'k6/http';
import { check, sleep, group } from 'k6';
import { Rate, Trend, Counter } from 'k6/metrics';
// Custom metrics
const errorRate = new Rate('errors');
const apiDuration = new Trend('api_duration', true);
const requestCount = new Counter('requests');
// Test options
export const options = {
scenarios: {
// Scenario 1: Normal load test
normal_load: {
executor: 'ramping-vus',
startVUs: 0,
stages: [
{ duration: '2m', target: 50 }, // Ramp up to 50 VUs
{ duration: '5m', target: 50 }, // Hold at 50 VUs
{ duration: '2m', target: 100 }, // Ramp up to 100 VUs
{ duration: '5m', target: 100 }, // Hold at 100 VUs
{ duration: '2m', target: 0 }, // Ramp down
],
},
// Scenario 2: Spike test
spike_test: {
executor: 'ramping-vus',
startVUs: 0,
startTime: '16m',
stages: [
{ duration: '10s', target: 500 }, // Sudden spike
{ duration: '1m', target: 500 }, // Hold
{ duration: '10s', target: 0 }, // Rapid decrease
],
},
},
thresholds: {
http_req_duration: ['p(95)<200', 'p(99)<500'],
errors: ['rate<0.01'],
http_req_failed: ['rate<0.01'],
},
};
const BASE_URL = __ENV.BASE_URL || 'http://localhost:3000';
export default function () {
const authToken = login();
group('API Operations', () => {
group('List Products', () => {
const res = http.get(`${BASE_URL}/api/products?page=1&limit=20`, {
headers: { Authorization: `Bearer ${authToken}` },
tags: { name: 'GET /api/products' },
});
check(res, {
'status is 200': (r) => r.status === 200,
'response time OK': (r) => r.timings.duration < 200,
'has products': (r) => JSON.parse(r.body).data.length > 0,
});
errorRate.add(res.status !== 200);
apiDuration.add(res.timings.duration);
requestCount.add(1);
});
group('Create Order', () => {
const payload = JSON.stringify({
productId: Math.floor(Math.random() * 1000) + 1,
quantity: Math.floor(Math.random() * 5) + 1,
shippingAddress: '123 Test Street',
});
const res = http.post(`${BASE_URL}/api/orders`, payload, {
headers: {
Authorization: `Bearer ${authToken}`,
'Content-Type': 'application/json',
},
tags: { name: 'POST /api/orders' },
});
check(res, {
'order created': (r) => r.status === 201,
'has order id': (r) => JSON.parse(r.body).data.orderId,
});
errorRate.add(res.status !== 201);
apiDuration.add(res.timings.duration);
});
});
sleep(Math.random() * 3 + 1);
}
function login() {
const res = http.post(`${BASE_URL}/api/auth/login`, JSON.stringify({
email: `user${__VU}@test.com`,
password: 'testpassword',
}), {
headers: { 'Content-Type': 'application/json' },
tags: { name: 'POST /api/auth/login' },
});
return res.status === 200 ? JSON.parse(res.body).token : '';
}
3.3 Load Test Types
| Type | Purpose | VU Pattern | Duration |
|---|---|---|---|
| Smoke | Verify basic operation | 1-5 | 1-5 min |
| Load | Verify expected traffic handling | Expected levels | 15-60 min |
| Stress | Find breaking points | Above expected | 30-60 min |
| Spike | Test sudden traffic surges | Sudden spike | 5-10 min |
| Soak | Verify long-term stability | Steady level | 2-24 hours |
| Breakpoint | Find system failure point | Continuous increase | Variable |
4. Key Performance Metrics
4.1 RED Method
# RED Method Monitoring Implementation
from prometheus_client import Counter, Histogram
# Rate: Requests per second
request_count = Counter(
'http_requests_total',
'Total HTTP requests',
['method', 'endpoint', 'status']
)
# Errors: Error ratio
error_count = Counter(
'http_errors_total',
'Total HTTP errors',
['method', 'endpoint', 'error_type']
)
# Duration: Response time distribution
request_duration = Histogram(
'http_request_duration_seconds',
'HTTP request duration',
['method', 'endpoint'],
buckets=[0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1.0, 2.5, 5.0, 10.0]
)
4.2 Latency Percentiles
Mean p50 p95 p99 p99.9 Max
User Impact Low Med High High V.High Extreme
p50 (median): 50% of requests complete within this time
p95: 95% of requests complete within this time (1 in 20 is slower)
p99: 99% of requests complete within this time (1 in 100 is slower)
Why averages are dangerous:
- Average can be 50ms while p99 is 5000ms
- Every 100th request forces 5-second wait
- Heavy users are more likely to hit high percentiles
5. Common Bottlenecks
5.1 Database Bottlenecks
N+1 Query Problem:
# BAD: N+1 query - 101 queries for 100 orders
orders = Order.objects.all()[:100]
for order in orders:
print(f"Order {order.id} by {order.user.name}")
# GOOD: Eager loading - 2 queries total
orders = Order.objects.select_related('user').all()[:100]
for order in orders:
print(f"Order {order.id} by {order.user.name}")
# GOOD: Prefetch (M:N relationships)
orders = Order.objects.prefetch_related('items__product').all()[:100]
for order in orders:
for item in order.items.all():
print(f" - {item.product.name}")
// Node.js + Prisma - N+1 Resolution
// BAD: N+1
const orders = await prisma.order.findMany({ take: 100 });
for (const order of orders) {
const user = await prisma.user.findUnique({
where: { id: order.userId }
});
}
// GOOD: Include (Join)
const orders = await prisma.order.findMany({
take: 100,
include: {
user: true,
items: { include: { product: true } }
}
});
// GOOD: DataLoader Pattern
const DataLoader = require('dataloader');
const userLoader = new DataLoader(async (userIds) => {
const users = await prisma.user.findMany({
where: { id: { in: [...userIds] } }
});
const userMap = new Map(users.map(u => [u.id, u]));
return userIds.map(id => userMap.get(id));
});
5.2 Missing Indexes and Full Table Scans
-- Detect slow queries (PostgreSQL)
SELECT
query,
calls,
mean_exec_time,
total_exec_time,
rows
FROM pg_stat_statements
ORDER BY mean_exec_time DESC
LIMIT 20;
-- Execution plan analysis
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT o.*, u.name
FROM orders o
JOIN users u ON o.user_id = u.id
WHERE o.status = 'pending'
AND o.created_at > NOW() - INTERVAL '7 days'
ORDER BY o.created_at DESC
LIMIT 50;
-- Create composite index (matching query patterns)
CREATE INDEX CONCURRENTLY idx_orders_status_created
ON orders (status, created_at DESC)
WHERE status IN ('pending', 'processing');
-- Check index utilization
SELECT
schemaname,
tablename,
indexname,
idx_scan,
idx_tup_read,
idx_tup_fetch
FROM pg_stat_user_indexes
ORDER BY idx_scan ASC;
5.3 Connection Pool Exhaustion
# PgBouncer Configuration (PostgreSQL connection pooler)
# pgbouncer.ini
"""
[databases]
mydb = host=127.0.0.1 port=5432 dbname=mydb
[pgbouncer]
listen_port = 6432
listen_addr = 0.0.0.0
auth_type = md5
pool_mode = transaction
default_pool_size = 25
min_pool_size = 5
reserve_pool_size = 5
reserve_pool_timeout = 3
max_client_conn = 1000
max_db_connections = 50
server_idle_timeout = 600
server_lifetime = 3600
stats_period = 60
"""
5.4 Lock Contention
// Go - Lock contention resolution with sharding
package main
import "sync"
// BAD: Global mutex locking entire map
type BadCache struct {
mu sync.Mutex
items map[string]interface{}
}
// GOOD: Sharding to distribute lock contention
type ShardedCache struct {
shards [256]shard
shardMask uint8
}
type shard struct {
mu sync.RWMutex
items map[string]interface{}
}
func NewShardedCache() *ShardedCache {
c := &ShardedCache{shardMask: 255}
for i := range c.shards {
c.shards[i].items = make(map[string]interface{})
}
return c
}
func (c *ShardedCache) getShard(key string) *shard {
hash := fnv32(key)
return &c.shards[hash&uint32(c.shardMask)]
}
func (c *ShardedCache) Get(key string) (interface{}, bool) {
s := c.getShard(key)
s.mu.RLock()
defer s.mu.RUnlock()
val, ok := s.items[key]
return val, ok
}
func (c *ShardedCache) Set(key string, value interface{}) {
s := c.getShard(key)
s.mu.Lock()
defer s.mu.Unlock()
s.items[key] = value
}
func fnv32(key string) uint32 {
hash := uint32(2166136261)
for i := 0; i < len(key); i++ {
hash *= 16777619
hash ^= uint32(key[i])
}
return hash
}
6. Database Optimization
6.1 Query Optimization Strategies
-- 1. Convert subqueries to JOINs
-- BAD
SELECT * FROM orders
WHERE user_id IN (SELECT id FROM users WHERE status = 'active');
-- GOOD
SELECT o.* FROM orders o
INNER JOIN users u ON o.user_id = u.id
WHERE u.status = 'active';
-- 2. Cursor-based pagination (consistent performance)
-- BAD: OFFSET-based (slow on deep pages)
SELECT * FROM products ORDER BY id LIMIT 20 OFFSET 10000;
-- GOOD: Cursor-based
SELECT * FROM products
WHERE id > 10000
ORDER BY id
LIMIT 20;
-- 3. Partitioning
CREATE TABLE orders (
id BIGSERIAL,
user_id BIGINT NOT NULL,
status VARCHAR(20) NOT NULL,
created_at TIMESTAMP NOT NULL,
total_amount DECIMAL(10,2)
) PARTITION BY RANGE (created_at);
CREATE TABLE orders_2025_q1 PARTITION OF orders
FOR VALUES FROM ('2025-01-01') TO ('2025-04-01');
CREATE TABLE orders_2025_q2 PARTITION OF orders
FOR VALUES FROM ('2025-04-01') TO ('2025-07-01');
6.2 Read Replicas
# SQLAlchemy - Read/Write Separation
from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker
class DatabaseRouter:
def __init__(self):
self.writer = create_engine(
'postgresql://writer:pass@primary:5432/mydb',
pool_size=10, max_overflow=20
)
self.readers = [
create_engine(
f'postgresql://reader:pass@replica{i}:5432/mydb',
pool_size=10, max_overflow=20
)
for i in range(1, 4) # 3 read replicas
]
self._reader_index = 0
def get_writer_session(self):
Session = sessionmaker(bind=self.writer)
return Session()
def get_reader_session(self):
reader = self.readers[self._reader_index % len(self.readers)]
self._reader_index += 1
Session = sessionmaker(bind=reader)
return Session()
7. Caching Strategies
7.1 Cache-Aside Pattern
import redis
import json
from functools import wraps
redis_client = redis.Redis(host='localhost', port=6379, db=0)
class CacheAside:
"""Cache-Aside (Lazy Loading) pattern implementation"""
@staticmethod
def cached(key_prefix, ttl_seconds=300):
def decorator(func):
@wraps(func)
async def wrapper(*args, **kwargs):
cache_key = f"{key_prefix}:{':'.join(str(a) for a in args)}"
# 1. Check cache
cached = redis_client.get(cache_key)
if cached:
return json.loads(cached)
# 2. Cache miss - query DB
result = await func(*args, **kwargs)
# 3. Store result in cache
if result is not None:
redis_client.setex(
cache_key, ttl_seconds,
json.dumps(result, default=str)
)
return result
return wrapper
return decorator
@staticmethod
def invalidate(key_pattern):
"""Pattern-based cache invalidation"""
keys = redis_client.keys(key_pattern)
if keys:
redis_client.delete(*keys)
7.2 Write-Through and Write-Behind
class WriteThrough:
"""Write-Through: Update cache and DB simultaneously"""
async def update(self, key, value, ttl=300):
await self.db.update(key, value)
redis_client.setex(f"wt:{key}", ttl, json.dumps(value, default=str))
async def get(self, key):
cached = redis_client.get(f"wt:{key}")
if cached:
return json.loads(cached)
value = await self.db.get(key)
if value:
redis_client.setex(f"wt:{key}", 300, json.dumps(value, default=str))
return value
class WriteBehind:
"""Write-Behind: Write to cache first, async DB update"""
def __init__(self):
self.write_queue = asyncio.Queue()
self.batch_size = 100
self.flush_interval = 5 # seconds
async def update(self, key, value, ttl=300):
# 1. Write to cache immediately (fast response)
redis_client.setex(f"wb:{key}", ttl, json.dumps(value, default=str))
# 2. Add to queue (async DB write)
await self.write_queue.put((key, value))
async def flush_worker(self):
"""Background worker: batch write from queue to DB"""
while True:
batch = []
try:
while len(batch) < self.batch_size:
item = await asyncio.wait_for(
self.write_queue.get(),
timeout=self.flush_interval
)
batch.append(item)
except asyncio.TimeoutError:
pass
if batch:
try:
await self.db.bulk_update(batch)
except Exception:
for item in batch:
await self.write_queue.put(item)
await asyncio.sleep(1)
7.3 TTL Strategies and Cache Invalidation
# Tiered TTL Strategy
class TieredTTLCache:
TTL_CONFIG = {
# Frequently changing data
'user:session': 1800, # 30 min
'cart:items': 900, # 15 min
# Periodically changing data
'product:detail': 3600, # 1 hour
'product:list': 600, # 10 min
'search:results': 300, # 5 min
# Rarely changing data
'category:list': 86400, # 24 hours
'config:settings': 86400, # 24 hours
'static:content': 604800, # 7 days
}
@classmethod
def get_ttl(cls, key_type):
return cls.TTL_CONFIG.get(key_type, 300) # Default 5 min
8. Async Processing
8.1 Message Queue-Based Async Processing
# Celery async task processing
from celery import Celery, chain, group
app = Celery('tasks', broker='redis://localhost:6379/0')
app.conf.update(
task_serializer='json',
accept_content=['json'],
task_acks_late=True,
worker_prefetch_multiplier=1,
task_routes={
'tasks.send_email': {'queue': 'email'},
'tasks.process_image': {'queue': 'image'},
'tasks.generate_report': {'queue': 'report'},
}
)
@app.task(bind=True, max_retries=3, default_retry_delay=60)
def send_email(self, to, subject, body):
try:
email_service.send(to=to, subject=subject, body=body)
except Exception as exc:
self.retry(exc=exc)
@app.task(bind=True, max_retries=3)
def process_order(self, order_id):
"""Order processing pipeline"""
try:
workflow = chain(
validate_inventory.s(order_id),
process_payment.s(order_id),
send_confirmation_email.s(order_id),
update_analytics.s(order_id)
)
workflow.apply_async()
except Exception as exc:
self.retry(exc=exc, countdown=30)
@app.task
def bulk_process_orders(order_ids):
"""Parallel batch processing"""
job = group(process_order.s(oid) for oid in order_ids)
return job.apply_async()
8.2 Event-Driven Architecture
// Node.js - EventEmitter-based async processing
const EventEmitter = require('events');
class OrderEventBus extends EventEmitter {
constructor() {
super();
this.setMaxListeners(20);
}
}
const orderBus = new OrderEventBus();
// Register event handlers (separation of concerns)
orderBus.on('order.created', async (order) => {
await inventoryService.decrementStock(order.items);
});
orderBus.on('order.created', async (order) => {
await emailService.sendOrderConfirmation(order);
});
orderBus.on('order.created', async (order) => {
await analyticsService.trackOrder(order);
});
class OrderService {
async createOrder(orderData) {
const order = await this.orderRepo.create(orderData);
orderBus.emit('order.created', order);
return order; // Fast response
}
}
9. Batch Optimization
9.1 Bulk Inserts
# SQLAlchemy Bulk Insert Comparison
import time
# BAD: One by one (N INSERT statements)
def insert_one_by_one(session, records):
start = time.time()
for record in records:
session.add(MyModel(**record))
session.commit()
print(f"One by one: {time.time() - start:.2f}s")
# GOOD: Bulk insert (1 INSERT statement)
def bulk_insert(session, records):
start = time.time()
session.bulk_insert_mappings(MyModel, records)
session.commit()
print(f"Bulk insert: {time.time() - start:.2f}s")
# BETTER: execute_values (PostgreSQL, psycopg2)
def execute_values_insert(conn, records):
start = time.time()
from psycopg2.extras import execute_values
cursor = conn.cursor()
execute_values(
cursor,
"INSERT INTO my_table (col1, col2, col3) VALUES %s",
[(r['col1'], r['col2'], r['col3']) for r in records],
page_size=1000
)
conn.commit()
print(f"execute_values: {time.time() - start:.2f}s")
# Performance comparison (10,000 records)
# One by one: 12.5s
# Bulk insert: 0.8s
# execute_values: 0.3s
9.2 Batch API Calls
// Optimized batch external API calls
class BatchAPIClient {
constructor(options = {}) {
this.batchSize = options.batchSize || 50;
this.concurrency = options.concurrency || 5;
this.retryAttempts = options.retryAttempts || 3;
this.delayBetweenBatches = options.delayMs || 100;
}
async processBatch(items, processFn) {
const results = [];
const errors = [];
const batches = [];
for (let i = 0; i < items.length; i += this.batchSize) {
batches.push(items.slice(i, i + this.batchSize));
}
for (let i = 0; i < batches.length; i += this.concurrency) {
const concurrentBatches = batches.slice(i, i + this.concurrency);
const batchResults = await Promise.allSettled(
concurrentBatches.map(batch => this.processWithRetry(batch, processFn))
);
for (const result of batchResults) {
if (result.status === 'fulfilled') {
results.push(...result.value);
} else {
errors.push(result.reason);
}
}
if (i + this.concurrency < batches.length) {
await new Promise(r => setTimeout(r, this.delayBetweenBatches));
}
}
return { results, errors, total: items.length, processed: results.length };
}
async processWithRetry(batch, processFn, attempt = 1) {
try {
return await processFn(batch);
} catch (error) {
if (attempt < this.retryAttempts) {
const delay = Math.pow(2, attempt) * 1000;
await new Promise(r => setTimeout(r, delay));
return this.processWithRetry(batch, processFn, attempt + 1);
}
throw error;
}
}
}
10. HTTP Optimization
10.1 Compression and Protocol Optimization
// Express.js compression configuration
const compression = require('compression');
app.use(compression({
filter: (req, res) => {
if (req.headers['x-no-compression']) return false;
return compression.filter(req, res);
},
level: 6, // Compression level (1-9, 6 is balanced)
threshold: 1024, // Compress only above 1KB
memLevel: 8,
}));
// HTTP/2 server setup
const http2 = require('http2');
const fs = require('fs');
const server = http2.createSecureServer({
key: fs.readFileSync('server.key'),
cert: fs.readFileSync('server.crt'),
allowHTTP1: true,
});
10.2 Keep-Alive and Connection Reuse
# Python requests - Session reuse
import requests
from requests.adapters import HTTPAdapter
from urllib3.util.retry import Retry
# BAD: New connection every time
def fetch_bad(urls):
results = []
for url in urls:
response = requests.get(url) # TCP handshake each time
results.append(response.json())
return results
# GOOD: Session reuse (Keep-Alive)
def fetch_good(urls):
session = requests.Session()
retry_strategy = Retry(
total=3, backoff_factor=0.5,
status_forcelist=[500, 502, 503, 504]
)
adapter = HTTPAdapter(
max_retries=retry_strategy,
pool_connections=10,
pool_maxsize=20,
)
session.mount("http://", adapter)
session.mount("https://", adapter)
results = []
for url in urls:
response = session.get(url) # Connection reuse
results.append(response.json())
session.close()
return results
11. Application-Level Optimization
11.1 Efficient Serialization
// JSON vs MessagePack comparison
const msgpack = require('msgpack-lite');
const data = {
users: Array.from({ length: 1000 }, (_, i) => ({
id: i,
name: `User ${i}`,
email: `user${i}@example.com`,
age: 20 + (i % 50),
active: i % 3 !== 0,
tags: ['tag1', 'tag2', 'tag3'],
}))
};
// JSON
const jsonStr = JSON.stringify(data);
console.log(`JSON size: ${Buffer.byteLength(jsonStr)} bytes`);
// MessagePack
const msgpackBuf = msgpack.encode(data);
console.log(`MessagePack size: ${msgpackBuf.length} bytes`);
// Typical results:
// JSON size: ~120KB, serialize: ~3ms
// MessagePack size: ~85KB, serialize: ~2ms (about 30% smaller)
12. Production Monitoring
12.1 SLO-Based Alerting
# Prometheus alerting rules
groups:
- name: slo-alerts
rules:
# p99 latency SLO violation
- alert: HighP99Latency
expr: |
histogram_quantile(0.99,
rate(http_request_duration_seconds_bucket[5m])
) > 0.5
for: 5m
labels:
severity: warning
annotations:
summary: "p99 latency exceeds 500ms"
# Error rate SLO violation
- alert: HighErrorRate
expr: |
sum(rate(http_requests_total{status=~"5.."}[5m]))
/
sum(rate(http_requests_total[5m])) > 0.01
for: 3m
labels:
severity: critical
annotations:
summary: "Error rate exceeds 1%"
# Throughput drop
- alert: ThroughputDrop
expr: |
sum(rate(http_requests_total[5m]))
< 0.5 * sum(rate(http_requests_total[5m] offset 1h))
for: 5m
labels:
severity: warning
# Connection pool near exhaustion
- alert: ConnectionPoolExhaustion
expr: |
hikaricp_connections_active
/ hikaricp_connections_max > 0.85
for: 2m
labels:
severity: warning
13. Practice Quiz
Q1. What are the implications of Amdahl's Law for performance optimization, and how can it be applied in practice?
Amdahl's Law shows that overall system performance improvement is limited by the proportion of the improvable portion.
Key implications:
- No matter how much you speed up a small portion (e.g., 5%) of total execution time, overall improvement is minimal
- Making a large portion (e.g., 80%) just 2x faster yields significant overall improvement
- Therefore, identify the biggest bottlenecks first through profiling, then focus improvements there
Practical application:
- Use profiling (Flame Graphs) to understand execution time distribution
- Optimize bottlenecks in order of their proportion
- Re-measure after each optimization to identify new bottlenecks
- Stop optimizing when within Performance Budget
Q2. What is the N+1 query problem, and describe 3 ways to solve it in ORMs.
N+1 Problem: After querying N parent entities, each parent's child entities are queried individually, resulting in N+1 total queries. Querying 100 orders triggers 101 DB queries.
Solutions:
- Eager Loading (select_related/include): Use JOINs to fetch parent and children in one query. Effective for 1:1, N:1 relationships
- Prefetch (prefetch_related): Batch-fetch child entities in a separate query, then map in memory. Effective for 1:N, M:N relationships using IN clause
- DataLoader Pattern: Automatically batch individual requests into a single query. Especially useful in GraphQL. Pattern developed by Facebook
Q3. Explain the differences between Cache-Aside and Write-Through caching patterns and their suitable use cases.
Cache-Aside (Lazy Loading):
- Check cache on read, query DB on miss, then store in cache
- Application manages cache directly
- First request always has cache miss (Cold Start)
- Suitable for: Read-heavy scenarios where not all data needs caching
Write-Through:
- Update cache and DB simultaneously on writes
- Cache is always up-to-date
- Write latency increases (writing to two locations)
- Suitable for: When data consistency matters and reads far exceed writes
Write-Behind writes to cache first and updates DB asynchronously, maximizing write performance but risking data loss.
Q4. Explain when to use each of the 6 load test types (Smoke, Load, Stress, Spike, Soak, Breakpoint).
- Smoke Test: Minimal load (1-5 VUs) to verify basic system operation. Post-deployment verification
- Load Test: Verify performance at expected traffic levels. Check SLO compliance
- Stress Test: Exceed expected traffic to find system limits. Used for capacity planning
- Spike Test: Test system reaction to sudden traffic surges (e.g., events). Verify auto-scaling
- Soak Test: Maintain steady load for hours to find gradual issues like memory leaks or connection exhaustion
- Breakpoint Test: Continuously increase load to find absolute system failure point
Q5. What key parameters should be considered for connection pool tuning, and how do you determine the appropriate pool size?
Key parameters:
- maximum-pool-size: Maximum connections (too many overloads DB, too few causes waits)
- minimum-idle: Minimum idle connections (prevents cold start)
- connection-timeout: Wait time for connection acquisition
- idle-timeout: Time before idle connection is returned
- max-lifetime: Maximum connection lifetime (should be shorter than DB firewall timeout)
Determining appropriate pool size:
- HikariCP formula:
connections = ((core_count * 2) + effective_spindle_count) - For SSDs:
connections = core_count * 2 + 1 - Generally 10-20 is sufficient for many cases
- Too-large pools actually increase DB context switching costs
- Adjust based on monitoring: increase when utilization exceeds 80%, immediately increase when waiting threads appear
References
- Google SRE Book - Performance Engineering
- k6 Documentation
- Artillery Documentation
- Brendan Gregg - Systems Performance
- Flame Graphs
- HikariCP - About Pool Sizing
- Redis Best Practices
- PostgreSQL Performance Tips
- Node.js Diagnostics Guide
- Go pprof Documentation
- Python cProfile Documentation
- Prometheus Monitoring
- DataLoader Pattern