✍️ 필사 모드: Load/Performance Testing Tools 2026 — Deep Dive on k6, Locust, Vegeta, Gatling, Artillery, JMeter (Beyond JMeter)
EnglishPrologue — "We can do 10k RPS" is usually a lie
Some company, the night before launch in 2026.
PM: "We can do 10k RPS, right?" Backend: "I ran it in JMeter. It passed." SRE: "What scenario?" Backend: "Constant 10k RPS for 5 minutes..." SRE: "What about the production traffic pattern? Did you do warm-up? What's p99?" Backend: "..."
This scene is still common in 2026. The tools got better; the measurement culture did not. "It passed" is the phrase, but what passed, under what distribution, how close to production — nobody answers. Then real traffic arrives in bursts, caches start cold, p99 breaks the SLO, and none of it correlates with the test that "passed."
The tools themselves have improved. The JMeter-centric world of the 2010s has shifted to one where k6 (Grafana Labs) is the de facto default, with Locust, Vegeta, Gatling, Artillery, and Bombardier each holding their own ground. wrk, wrk2, and autocannon are alive and well in the micro-benchmark niche, and non-HTTP protocols like gRPC and WebSocket have their own homes in ghz and fortio.
This post maps the 2026 landscape of load/performance testing tools. Where each tool sits, how the same scenario looks across them, what "a good load test" actually means, and how to pick honestly for your team.
1. Four purposes of load testing — name the target first
Before picking a tool, name the purpose. The same tool doesn't fit every purpose.
- Micro-benchmark — peak throughput and p99 latency of a single endpoint. "How fast is this hot path?" Good for catching regressions on small changes. wrk, autocannon, Bombardier, Vegeta.
- Load test — does the system stay inside SLO at the expected traffic level. Usually a steady RPS for some time. k6, Locust, Gatling, Artillery, JMeter.
- Stress test — find the limit. Where does it break, how does it break, can it recover. Same tools plus scenario design.
- Spike & soak — sudden surges (spike) and long runs (soak — catches memory or connection leaks). k6 and Locust express these scenarios well.
A fifth axis is chaos testing — injecting failure while traffic flows — usually a combination of a load tool and a chaos tool.
The core insight: "performance testing" is one phrase covering four different things. Pick your tool only after answering "what kind am I doing most often?" Dragging out JMeter for a micro-benchmark is overkill; using wrk for a complex scenario is undershooting.
2. Tool map 2026 — one table
| Tool | Language/script | Strengths | Weaknesses | Typical use |
|---|---|---|---|---|
| k6 | JS (ES2015+), Go runtime | Modern default, rich output, cloud option, gRPC/WS/browser | Distributed runs need setup in OSS | The general default in 2026 |
| Locust | Python | Easy distributed mode, full Python code | Single-worker throughput limited | Python teams, complex user models |
| Vegeta | Go (CLI + lib) | One-liner runs, strong result analysis | Simple scenarios only | HTTP micro-bench, quick checks |
| Gatling | Scala/Java DSL | Scenario expressiveness, enterprise reports | Scala learning curve | Large, JVM-friendly orgs |
| Artillery | Node.js, YAML | Fast start, declarative YAML | Single-node limits at high load | Node teams, CI scenarios |
| wrk / wrk2 | C, Lua scripting | Very light, very fast HTTP bench | HTTP only, simple scenarios | Hot-path micro-bench |
| autocannon | Node.js | npm install and go | Mostly fits Node teams | Quick Node API bench |
| Bombardier | Go | Dead-simple, fast CLI | Almost no scenario | One-liner load checks |
| JMeter | Java, GUI/XML | Old library and plugin ecosystem | XML scenarios, dated UX | Enterprise, legacy assets |
| ghz | Go, CLI | gRPC-only, simple | gRPC only | gRPC service benchmarks |
| fortio | Go (Istio) | gRPC + HTTP, distribution analysis | Plain UI | Service mesh validation |
One-liner: in 2026, the "tool you pick up first" is usually k6. Python-friendly teams pick Locust. Vegeta (or wrk) for one-liner micro-benches. JMeter/Gatling for enterprise/JVM orgs and legacy assets. Artillery for Node teams in CI. gRPC goes to ghz or k6's gRPC module.
3. k6 — the 2026 default
Status (May 2026): under Grafana Labs. The k6 OSS binary is free; Grafana Cloud k6 is the paid option for distributed runs and dashboards. v0.5x stable releases ship steadily; the browser module (Playwright backend), gRPC, WebSocket, and the xk6 extension ecosystem continue to grow.
Why it became the default:
- JS-scriptable — most engineers can read and write it.
- Single Go binary — easy to install, CPU-efficient (much higher single-worker throughput than Locust).
- Rich output — p50/p90/p95/p99 in the console by default. Exports to Prometheus, InfluxDB, Datadog.
- scenario/executor model —
constant-vus,ramping-arrival-rate,per-vu-iterationsand friends are expressive. - Extensible — xk6 adds modules for SQL, Kafka, Redis, etc.
Base script (we'll compare against it in section 4):
// k6 script: login + ramp 50 → 200 RPS for 5min
import http from 'k6/http';
import { check, sleep } from 'k6';
export const options = {
scenarios: {
login_ramp: {
executor: 'ramping-arrival-rate',
startRate: 50,
timeUnit: '1s',
preAllocatedVUs: 50,
maxVUs: 500,
stages: [
{ target: 50, duration: '1m' },
{ target: 200, duration: '5m' },
{ target: 200, duration: '2m' },
],
},
},
thresholds: {
http_req_duration: ['p(99)<500'],
http_req_failed: ['rate<0.01'],
},
};
export default function () {
const res = http.post('https://api.example.com/login', JSON.stringify({
user: 'demo',
pass: 'pw'
}), { headers: { 'Content-Type': 'application/json' } });
check(res, { 'status 200': (r) => r.status === 200 });
sleep(1);
}
Grafana Cloud k6 pricing sense (2026): free tier offers ~50 VUh/month; Pro plans start around $299/month with more VUh and concurrency. Distributed and regional runs are the convenience play; the alternative is self-hosted distributed (multiple nodes running the same script, results aggregated).
Limits:
- Distributed runs need setup in OSS — the
grafana/k6-operatorfor Kubernetes exists, but it's operational overhead. - The JS runtime is Goja (an ES5+ embedded in Go), not V8. Some modern JS features need babel transforms.
- Heavy data manipulation is safer outside the scenario.
4. Same scenario, different tools — POST /login + ramp 50 → 200 RPS
Side-by-side: k6 / Locust / Vegeta for the same shape.
k6
The script in section 3. The key bits: the ramping-arrival-rate executor and thresholds for p99 and error rate as code.
Locust (Python)
# locustfile.py
from locust import HttpUser, task, LoadTestShape, constant_throughput
class LoginUser(HttpUser):
wait_time = constant_throughput(1) # 1 req/sec per user
@task
def login(self):
self.client.post(
"/login",
json={"user": "demo", "pass": "pw"},
headers={"Content-Type": "application/json"},
)
class RampShape(LoadTestShape):
stages = [
{"duration": 60, "users": 50, "spawn_rate": 50},
{"duration": 360, "users": 200, "spawn_rate": 5},
{"duration": 480, "users": 200, "spawn_rate": 0},
]
def tick(self):
run_time = self.get_run_time()
for stage in self.stages:
if run_time < stage["duration"]:
return (stage["users"], stage["spawn_rate"])
return None
Run: locust -f locustfile.py --headless -H https://api.example.com. Distributed via --master and --worker, with well-known Helm charts for Kubernetes.
Vegeta (CLI)
# step 1: 50 RPS for 1m
echo "POST https://api.example.com/login" | \
vegeta attack -rate=50 -duration=60s -body=body.json \
-header="Content-Type: application/json" \
| vegeta report -type=hist[0,100ms,200ms,500ms,1s]
# step 2: 200 RPS for 5m
echo "POST https://api.example.com/login" | \
vegeta attack -rate=200 -duration=5m -body=body.json \
-header="Content-Type: application/json" \
| tee result.bin \
| vegeta report
vegeta plot result.bin > plot.html
Vegeta keeps scenarios deliberately simple — a fixed rate for a fixed duration. Ramps usually come from chaining commands or driving from a shell script. That simplicity is its strength — one line measures, vegeta report -type=json and vegeta plot give you the distribution and timeseries.
Side-by-side observations:
- Expressiveness: Locust (most freedom) ≈ k6 > Vegeta. Vegeta is simple on purpose.
- CPU efficiency: k6 > Vegeta > Locust (single-worker; Locust compensates via distributed).
- CI friendliness: k6 (threshold exit codes) ≈ Vegeta (grep reports). Locust runs headless in CI too.
- Result visualization: k6 (Cloud k6 or own Prometheus) > Locust (web UI) > Vegeta (plot HTML).
5. Locust — the comfortable friend for Python teams
Status (2026): actively maintained, v2.x stable. Locust's strength is behavior modeling in code — represent what users do as classes/methods with @task weights.
When it fits:
- Python-friendly team that wants to call data and abstraction libraries directly.
- Complex user behavior (multi-page flows, stateful sessions).
- Distributed runs needed but no SaaS —
--master/--workeris genuinely simple.
Limits:
- Single-worker throughput is lower than k6. Python's gevent concurrency is fine, but Go is lighter.
- The built-in web UI is nice but doesn't match a k6 + Grafana stack.
Tip: Locust's killer feature is distributed is easy. If you need 100k+ RPS, spinning up dozens of workers feels natural. Deploy via Helm to Kubernetes, scrape metrics into Prometheus — the pattern is familiar and operationally light.
6. Vegeta — the elegance of one line
Status (2026): maintained by a small set of contributors, stable and simple. New features are rare, and that's the point.
Why Vegeta survives:
- One-line measurement —
echo "GET https://..." | vegeta attack -rate=100 -duration=30s | vegeta report. What takes a file and a runner and a container mount with other tools is a single line here. - Accurate distribution analysis —
vegeta reportshows p50/p90/p95/p99/min/mean/max in one go. Histogram buckets are CLI flags. - Result serialization — the
.binformat saves results for later reprocessing. CI can archive the bin and analyze it later.
Limits:
- Scenario expressiveness is minimal — a fixed rate or a chained set of steps. Not for complex user simulations.
- Distributed is "run the same command on multiple machines and merge bins" — manual.
Typical use: precisely measuring one hot path's latency distribution, micro-bench regression in CI, "is the server alive right now" quick check.
7. Gatling — the JVM heavyweight
Status (2026): Gatling 3.x stable. Both Gatling Enterprise (paid, formerly FrontLine) and OSS are active. Scala DSL is the default; Java and Kotlin DSLs are first-class citizens now.
Why it's still picked:
- DSL expressiveness — scenarios, chaining, assertions feel natural as code.
- Reporting — clean HTML reports out of the box; Enterprise adds distributed and team workflows.
- JVM-friendly orgs — fits Maven/Gradle builds naturally.
- Scala barrier dropped — Java DSL is a first-class citizen.
When it fits:
- JVM-based backends, want integration with the existing build system.
- Complex scenarios you want to keep in code, code-reviewed.
- Need enterprise support.
Limits:
- Not for lightweight one-liners — JVM startup and SBT/Maven cost.
- Scala barrier is lower but not zero.
8. Artillery — YAML scenarios, fast start
Status (2026): OSS + Cloud (paid). The appeal is YAML scenarios. v2 strengthened distributed runs via Cloud.
Why it's picked:
- Declarative YAML — describe URL/payload/flow without writing code.
- Node.js-based — familiar to Node teams.
- Quick start —
npm install -g artilleryand a yml file is enough to run.
Limits:
- Single-node throughput is below k6 (Node event-loop limits).
- Very high load needs Cloud or multiple instances.
Example (slightly different shape):
config:
target: 'https://api.example.com'
phases:
- duration: 60
arrivalRate: 50
- duration: 300
arrivalRate: 50
rampTo: 200
- duration: 120
arrivalRate: 200
scenarios:
- flow:
- post:
url: '/login'
json:
user: 'demo'
pass: 'pw'
expect:
- statusCode: 200
9. JMeter — the old guard
Status (2026): Apache JMeter 5.x maintained. Largest body of learning material, widest plugin ecosystem. GUI-centric but headless runs are officially supported.
Why it still appears:
- Legacy assets — many orgs own piles of
.jmxfiles. Migration cost vs. maintenance cost. - Plugins — Prometheus listener, custom protocols, BlazeMeter integration.
- Finance, telecom, regulated — internal standards stuck on JMeter.
- GUI-friendly — QA teams that don't write code can drive it (both blessing and curse).
Why new teams rarely choose it fresh:
- XML (
.jmx) scenarios — not git-diff-friendly. - Modern UX/CLI friendliness lags.
- CI is possible but heavier than k6/Locust.
Guidance: there's almost no reason a new project would pick JMeter fresh. But if you have the assets, the cost of throwing them away vs. keeping them is a separate decision. The common pattern is "write new tests in k6/Gatling, keep the old in JMeter, migrate as opportunity allows."
10. Micro-benchmarks — wrk / wrk2 / autocannon / Bombardier
These four are specialized for "peak throughput and latency distribution of one endpoint, fast."
wrk
- Written in C, very fast. Lua scripting for light customization.
- HTTP only, keep-alive friendly.
- Caveat: no rate-limit — always at max load. Use for measuring the ceiling.
wrk2
- A fork of wrk that supports fixed-rate runs — "exactly 1000 RPS for 30 seconds."
- Best fit for micro-bench + fixed-rate measurement.
- Stronger on coordinated-omission correction.
autocannon
- Node.js-based.
npm install -g autocannon, thenautocannon -c 50 -d 30 https://.... - Suits Node teams wanting fast CI benches.
Bombardier
- Single Go binary. Dead simple and fast.
bombardier -c 125 -n 1000000 https://.... - Almost no scenario expression — that's the point.
- Maintenance is alive but mostly in stability mode.
When which:
- Need precise latency distribution and fixed rate → wrk2.
- Just want to throw load quickly → wrk or Bombardier.
- Node team in CI → autocannon.
- Need even a small scenario → not these four; reach for Vegeta or k6.
11. Non-HTTP protocols — gRPC, WebSocket, real browser
In 2026 load testing isn't just HTTP. gRPC, WebSocket, and headless real browsers are all valid targets.
gRPC
- ghz — a CLI for gRPC only.
ghz --insecure --proto ./svc.proto --call svc.Hello -c 50 -n 10000 .... Simple and precise. - k6 gRPC module —
import grpc from 'k6/net/grpc'mixes gRPC calls into a scenario. - fortio — born inside Istio. Both HTTP and gRPC, strong on p50–p999 distribution analysis.
WebSocket
- k6 WS module —
import ws from 'k6/ws'. Connection count, message RPS, session length scenarios all expressible. - Artillery — first-class WebSocket support in YAML.
- Gatling — strong WebSocket scenario expressiveness.
Real browser load
- k6 browser module — Playwright backend. Runs JS, renders, interacts in a real browser. Use cases: frontend regression, page-load SLOs.
- Cost warning: browser instances are heavy; think in concurrent sessions, not RPS.
12. What "a good load test" means — 2026 checklist
Even with a good tool, a bad scenario gives bad results. The core of a good load test.
1) Production-like data
- User IDs, payload sizes, token distribution should look like production.
- "One user hitting the same page 10k times" all fits in cache — production doesn't.
- A common pattern: sample payloads from production, anonymize, replay.
2) Distribution, not a flat rate
- "Constant 1000 RPS" is fine for measurement but far from reality.
- Production has bursts, dips, diurnal cycles.
- Model with k6's
ramping-arrival-rate, Locust'sLoadTestShape.
3) Warm-up and ramp-up
- Caches, DB connection pools, JIT all start cold — measuring during cold start gives pessimistic numbers.
- Set aside the first 1–2 minutes as warm-up, exclude from measurement.
4) p99 (not the average)
- "Average latency" is not your SLO's friend — it hides the long tail.
- p95, p99, p999 (especially p99) connect to SLOs. Check percentiles in every tool's output.
- A tool can under-report p99 due to coordinated omission — wrk2 was the first to popularize the fix.
5) Separate error rates
- Computing latency only over successful responses hides dead requests.
- Count 4xx, 5xx, timeouts, connection refused separately, with thresholds.
6) Multiple measurement points
- Client-side latency from the load tool plus server-side RED (Rate, Error, Duration) plus downstream dependencies.
- Load tool says slow, server says fast → network or measurement issue. Vice versa exists too.
7) Regular, not one-shot
- Not a single pre-launch run; a weekly/monthly regression suite.
- Track how code changes shift latency distributions over time.
13. Self-host vs cloud — cost and operations
Once distributed is needed, two paths.
Self-host distributed
- k6 OSS + grafana/k6-operator (k8s) — pods run the test, results land in Prometheus/InfluxDB.
- Locust master/worker — simplest. Helm chart with N workers.
- Same command on N nodes + merge results — vegeta/wrk/Bombardier style.
Pros: cost control, data stays in-house. Cons: operational overhead, regional distribution is hard.
Cloud
- Grafana Cloud k6 — distributed runs, regions, dashboards. From around $299/month.
- BlazeMeter — JMeter/Gatling/k6 compatible enterprise SaaS.
- Artillery Cloud — Artillery-based SaaS.
- Loader.io, k6 Cloud free tier — small free tiers for occasional measurement.
Pros: regions, instant runs, instant dashboards. Cons: cost adds up (a weekly large test can quickly hit hundreds to thousands per month).
Rule of thumb: one-off big runs (pre-launch) — cloud is cheap and fast. Regular regression — self-host is cheaper long-term. Many orgs mix — daily CI self-hosted, quarterly big run on cloud.
14. Decision frame — picking honestly
| Situation | Pick |
|---|---|
| Starting fresh, generic backend | k6 |
| Python team, complex behavior | Locust |
| One-line micro-bench | Vegeta or wrk2 |
| Node team, fast CI | Artillery or autocannon |
| Enterprise JVM org | Gatling |
| Legacy JMeter assets | Keep JMeter + add k6 for new |
| gRPC | ghz or k6 gRPC |
| Browser load | k6 browser |
| Pure throughput ceiling | wrk or Bombardier |
| Chaos + load | k6/Locust + Toxiproxy/Litmus |
Mixing is common. Within one org: Vegeta for micro-bench, k6 for general load, JMeter for the big legacy scenario. That's the real picture. Don't force a single tool — accepting the right tool per area keeps operations simpler.
15. Anti-patterns and traps
- Running directly against production — use staging or an isolated production-like environment. Production canaries are a different technique.
- Bypassing DNS/CDN to hit origin directly — measurement passes; SLOs include CDN. The measurement path must match the user path.
- Measuring on localhost and concluding — network latency is 0, RPS-only views give false confidence.
- Constant rate, single pass, declare success — production is a distribution.
- Looking only at average latency — look at p99.
- Looking only at the load tool — pair with server metrics.
- One-off pre-launch run — put it in regression.
- Measuring scenarios with wrk — wrong tool.
- Trying 100k RPS from a single node — confusing tool limits with system limits.
Epilogue — measurement is a decision tool
The tools are abundant. JMeter's monolith era is gone; k6 has become the default. Locust is the friendly Python option, Vegeta the elegant one-liner, Gatling the JVM heavyweight, Artillery the quick starter, and wrk/wrk2/autocannon/Bombardier are the micro-bench specialists.
But tools don't make the result. Does the scenario resemble production? Did you look at percentiles? Did you warm up? Is it regression? Those make the result. The work after picking a tool is longer and more important.
One-line summary: "It runs fine" is not measurement. p99, distribution, scenario — those are measurement. Tools are how you get there; the destination is something you have to define.
12-item checklist
- Is the purpose clear (micro/load/stress/spike)?
- Is the data production-like?
- Is the rate a distribution/ramp, not a flat constant?
- Did you exclude warm-up from measurement?
- Is p99 (or p999) tied to your SLO?
- Are error rates separated from latency?
- Are client and server metrics looked at together?
- Is regression wired into CI?
- Did you make a self-host vs cloud decision for distributed?
- If non-HTTP, are you using the right tool (gRPC/WS/browser)?
- Does the measurement path match the user path (DNS, CDN)?
- Do you understand coordinated-omission correction?
Ten anti-patterns
- Only looking at the average — look at p99.
- One flat-rate run as "passed" — model the distribution.
- Measuring on localhost and concluding — no network.
- Loading production directly — use staging.
- Looking only at the tool — look at server metrics too.
- One-shot pre-launch — make it regression.
- Doing everything in JMeter — pick per area.
- Forcing 100k RPS from one node — tool limit ≠ system limit.
- Ignoring warm-up — early data poisons the conclusion.
- Dropping error counts — dead requests hide.
Next post teasers
Candidates: production signal vs noise — designing RED/USE/SLO dashboards, chaos engineering 2026 — Litmus, Chaos Mesh, Gremlin, AWS FIS, performance regression CI — from a one-liner to a distribution.
"What can be measured can be improved. And if you don't look at the distribution, you aren't measuring."
— Load/performance testing tools 2026, end.
References
- k6 — Grafana Labs
- k6 GitHub — grafana/k6
- k6 Documentation
- k6 Cloud Pricing — Grafana Cloud k6
- k6 Operator — grafana/k6-operator
- k6 browser module
- Locust — Python load testing
- Locust GitHub — locustio/locust
- Locust Documentation
- Vegeta GitHub — tsenart/vegeta
- Gatling — Open source load testing
- Gatling GitHub — gatling/gatling
- Artillery — Cloud-scale load testing
- Artillery GitHub — artilleryio/artillery
- Apache JMeter
- wrk GitHub — wg/wrk
- wrk2 GitHub — giltene/wrk2
- autocannon — Node.js HTTP benchmark
- Bombardier — Go HTTP benchmark
- ghz — gRPC benchmarking
- fortio — load testing library and tool
- BlazeMeter — Performance testing platform
- Coordinated Omission — Gil Tene
- Google SRE Book — Monitoring distributed systems
- Brendan Gregg — USE Method
현재 단락 (1/313)
Some company, the night before launch in 2026.