Skip to content
Published on

Performance / Load Testing 2026 — k6 v1 / Artillery / Locust / JMeter / Gatling / Vegeta / oha Deep Dive

Authors

"The first load shape you see in production is one you have never seen in a load test." — Tammy Bützer, Grafana k6 maintainer, KubeCon EU 2024

Performance testing is one of the oldest disciplines in software engineering, but the landscape in 2026 looks almost nothing like it did in 2016. On one side, Apache JMeter — first released in 1998 — still has the largest install base after 25+ years. On the other, Grafana k6 (acquired in 2021) reached v1.0 GA in 2024 and has become the de facto modern standard. In between sit code-first tools like Locust, Gatling, and Artillery, and CLI one-liners like Vegeta, hey, wrk, oha, autocannon, and Bombardier that answer the simpler question "before I bother writing a scenario, just show me the RPS."

This post is a May 2026 deep dive into the load testing and browser performance landscape. We cover 16+ backend load testing tools, 4 browser perf tools, Core Web Vitals, and case studies from Korea and Japan.

1. The 2026 Load Testing Map — DSL / Scripted / CLI Benchmark / Browser Perf

Load testing is not a single category. The question "which is better, tool X or tool Y?" is often comparing tools that live in different boxes. In 2026 the cleanest taxonomy uses four:

CategoryExamplesWhat it does
Scripted (code-first) load testsk6, Artillery, Locust, Gatling, JMeterDefine user scenarios in code/DSL, distributed runs, time-series metrics
One-line CLI benchmarkshey, wrk, oha, Vegeta, autocannon, BombardierHit a single URL at a fixed RPS/concurrency, report p50/p95/p99
Enterprise SaaS / on-premBlazeMeter, LoadRunner, NeoLoad, Azure Load TestingGUI, recording, audit trails, regulated industries, JMeter/k6 compatible
Browser perf / RUMLighthouse, WebPageTest, SpeedCurve, Calibre, Cloudflare RUMMeasure LCP/INP/CLS in real browsers, watch for regressions

The core thesis of this post is simple. In 2026, if you are starting a new load testing project, 90% of teams should start with k6, and 10% should pick Locust or JMeter. Locust when the team is 100% Python-native, JMeter when regulated industries demand a GUI script as an audit artifact. When you just need a one-line benchmark, install oha or Vegeta in your PATH and move on.

The rest of this post explains how we get to that conclusion.

2. k6 v1.0 (Grafana) — The JS Scripting Standard

Grafana k6 was open-sourced in 2017 by Load Impact, written in Go. Grafana Labs acquired Load Impact in 2021, gave the project full-time maintainers, and shipped k6 v1.0 in spring 2024 with a frozen API. As of 2026, it has over 27k GitHub stars and is rapidly overtaking JMeter in weekly downloads.

The core architectural choice of k6 is "run goja (a JavaScript interpreter) inside a Go runtime." Users write JS/TS scenarios but the load generator itself is Go, so a single machine can sustain tens of thousands of virtual users. Memory and CPU efficiency are clearly better than Python-based Locust or JVM-based JMeter and Gatling.

A canonical script looks like this:

import http from 'k6/http'
import { check, sleep } from 'k6'

export const options = {
  scenarios: {
    ramp_up: {
      executor: 'ramping-vus',
      startVUs: 0,
      stages: [
        { duration: '30s', target: 100 },
        { duration: '2m',  target: 100 },
        { duration: '30s', target: 0 },
      ],
    },
  },
  thresholds: {
    http_req_duration: ['p(95)<500', 'p(99)<1500'],
    http_req_failed:   ['rate<0.01'],
  },
}

export default function () {
  const res = http.get('https://api.example.com/products')
  check(res, {
    'status is 200': (r) => r.status === 200,
    'body has items': (r) => r.json('items').length > 0,
  })
  sleep(1)
}

The executor (generator) family has seven members. constant-vus, ramping-vus, constant-arrival-rate (this is real RPS), ramping-arrival-rate, per-vu-iterations, shared-iterations, and externally-controlled. In practice, the ability to express "hit this API at exactly 1000 RPS for 5 minutes" with constant-arrival-rate is decisive.

thresholds is the single feature that turns a load test into a CI gate. If p95 exceeds 500ms, k6 exits with code 99 and GitHub Actions goes red. Without thresholds, a load test is just a graph you look at.

Notable additions since v1.0 in 2024:

  • Browser module: Spin up Playwright Chromium and measure CWV (LCP, INP, CLS) alongside the load test
  • gRPC, WebSocket, Redis, Kafka protocol modules
  • Extensions (xk6): Write Go modules and compile them into a custom k6 binary. 60+ official and community extensions
  • Distributed mode: Multi-node runs via the k6-operator Kubernetes operator
  • k6 Cloud → Grafana Cloud k6: managed load generators + dashboard integration

CLI output is a single text screen:

running (2m00.0s), 000/100 VUs, 11947 complete and 0 interrupted iterations
ramp_up        [======================================] 100/100 VUs  2m0s

     checks.........................: 100.00% 23894 ✓  0 ✗
     http_req_duration..............: avg=87.2ms p(95)=312.5ms p(99)=482.1ms
     http_req_failed................: 0.00%   0 / 23894
     http_reqs......................: 23894   199.117/s
     iteration_duration.............: avg=1.08s
     ✓ http_req_duration..p(95)<500
     ✓ http_req_duration..p(99)<1500
     ✓ http_req_failed..rate<0.01

k6 has two weak spots. First, JavaScript yes, but goja is not Node.js. You cannot just npm install arbitrary packages. You need a bundler (k6-browserify or webpack) to transpile and bundle before importing. Second, async/await is only partially supported across modules since v0.43. Complex business logic in a scenario can be painful to debug.

Even so, the default choice for a new project in 2026 is k6. Single binary, JS scenarios, hard thresholds, Grafana integration — no other tool catches up on all four.

3. Artillery — Node.js + Serverless Workers

Artillery was created in 2015 by Hassy Veldstra in Node.js. The project went company in 2022 (Artillery Inc., Series A) and now runs an OSS edition alongside Artillery Cloud (SaaS) as of 2026.

Artillery has two defining traits. First, scenarios are YAML. That is a hard branching point versus k6's JS. Second, workers can launch on AWS Fargate or Azure Container Instances to deliver essentially unlimited distributed load (Artillery Pro / Cloud).

A YAML scenario looks like this:

config:
  target: https://api.example.com
  phases:
    - duration: 60
      arrivalRate: 10
      name: warm up
    - duration: 300
      arrivalRate: 100
      name: sustained load
  defaults:
    headers:
      User-Agent: artillery-loadtest
  ensure:
    p95: 500
    maxErrorRate: 1

scenarios:
  - name: browse and buy
    flow:
      - get:
          url: /products
          capture:
            - json: $.items[0].id
              as: productId
      - get:
          url: /products/{{ productId }}
      - post:
          url: /cart
          json:
            productId: "{{ productId }}"
            quantity: 1
      - think: 2

YAML as a choice is polarizing. For non-developer QA writing scenarios it lowers the bar, but as soon as you need branching, looping, or computation you fall back to JS via processor: ./hooks.js. Artillery exposes that hook deliberately.

The 2024 addition of Artillery Pro AI ingests an OpenAPI/Swagger spec and produces scenario YAML automatically. For a microservice with 100 endpoints, it spits out a baseline scenario in five minutes. In practice the model is "90% draft + 10% human polish."

Artillery's real strength is distribution. artillery run-fargate scenario.yml --count 20 --region us-east-1 launches 20 Fargate workers in one line. To get the same in k6, you need k6-operator and a Kubernetes cluster. If you need "instantly massive load with no pre-existing infrastructure," Artillery is the right pick.

Weaknesses: the OSS edition has limits (distributed and Cloud dashboards are paid), and scenario YAML gets ugly fast as it grows.

4. Locust — Python Distributed

Locust was started in 2011 by Jonatan Heyman in Python. As of 2026 it has 25k GitHub stars and is effectively the Python load testing standard.

Locust's identity fits in one sentence: "define users with Python code."

from locust import HttpUser, task, between

class WebsiteUser(HttpUser):
    wait_time = between(1, 3)
    host = "https://api.example.com"

    def on_start(self):
        self.client.post("/login", json={"user": "test", "pw": "demo"})

    @task(3)
    def browse_products(self):
        with self.client.get("/products", catch_response=True) as resp:
            if resp.elapsed.total_seconds() > 0.5:
                resp.failure("Too slow")

    @task(1)
    def view_product(self):
        self.client.get("/products/42")

    @task
    def search(self):
        self.client.get("/search?q=keyboard")

@task(weight) sets relative action weights, and wait_time mimics user think time. Being Python, you can pull in SQLAlchemy, Pandas, or NumPy inside scenarios. For data science teams and ML inference servers under test, that is decisive.

Distributed mode is just master/worker:

# master
locust -f locustfile.py --master --expect-workers 4

# 4 workers (Docker Swarm, k8s, EC2 — anywhere)
locust -f locustfile.py --worker --master-host master.example.com

A built-in Web UI on port 8089 displays graphs and live metrics, which is a major plus for non-developer PMs and QAs. The --processes flag added in 2024 bypasses GIL by running multiple processes per machine, finally using all cores.

Two weak spots. First, GIL and gevent constraints make a single worker generate roughly one-third to one-fifth of what k6 produces. Distribution is essentially mandatory. Second, graphs and thresholds are weak. To use Locust as a CI gate, you have to parse the post-run stats CSV with your own judgment script.

Still, if your team is 100% Python, Locust is a fine first pick.

5. JMeter 5.6 — The Apache Classic

Apache JMeter has been alive since 1998 — 27 years and counting. The 5.6 release in 2024 is the LTS as of 2026, with 6.0 alpha available to early adopters.

JMeter is a GUI-first tool. Scenarios are saved as .jmx XML files, and you build them by composing a tree (Thread Group → HTTP Sampler → Listener) in a desktop UI. Without writing a single line of code you can have a first load test running in five minutes. That low barrier is exactly why JMeter refuses to die in 2026.

<jmeterTestPlan version="1.2">
  <hashTree>
    <TestPlan testname="Demo Test"/>
    <hashTree>
      <ThreadGroup testname="Users">
        <intProp name="num_threads">100</intProp>
        <intProp name="ramp_time">30</intProp>
      </ThreadGroup>
      <hashTree>
        <HTTPSamplerProxy testname="Get products">
          <stringProp name="HTTPSampler.domain">api.example.com</stringProp>
          <stringProp name="HTTPSampler.path">/products</stringProp>
          <stringProp name="HTTPSampler.method">GET</stringProp>
        </HTTPSamplerProxy>
        <hashTree>
          <ResponseAssertion>
            <stringProp name="Assertion.test_field">Assertion.response_code</stringProp>
            <stringProp name="EQUALS">200</stringProp>
          </ResponseAssertion>
        </hashTree>
      </hashTree>
    </hashTree>
  </hashTree>
</jmeterTestPlan>

In CI you skip the GUI: jmeter -n -t plan.jmx -l results.jtl. The official guidance is GUI for building scenarios, non-GUI for running them, because the GUI itself imposes overhead.

JMeter's real moat is the depth of its sampler library. Beyond HTTP/HTTPS, it ships JDBC, LDAP, SOAP, SMTP, IMAP, JMS, FTP, MongoDB samplers — one tool covers nearly every protocol. For enterprise systems that grew up in the 2010s SOA era, that is decisive.

When to still pick JMeter in 2026:

  • Regulated industries (finance, healthcare, public sector) where reproducible GUI scripts are required audit artifacts
  • Large enterprise QA orgs where JMeter is the corporate standard and 50+ users already exist
  • 50% or more of your protocols are SOAP, JMS, or JDBC (non-HTTP)
  • You want to pair with BlazeMeter SaaS for cloud distribution

For a greenfield project, do not pick it. XML files are a Git merge nightmare, thresholds are weak, and a single JVM hits GC pressure past about 1000 VUs.

6. Gatling — Scala / Kotlin DSL

Gatling was created in 2012 by Stéphane Landelle in Scala. As of 2026 there is OSS Gatling 3.11 and the commercial Gatling Enterprise (formerly FrontLine). Since 2023, Kotlin DSL and Java DSL are first-class, removing Scala as a barrier to entry.

Two things Gatling is famous for: (1) a type-safe builder DSL and (2) beautiful HTML reports.

import io.gatling.core.Predef._
import io.gatling.http.Predef._
import scala.concurrent.duration._

class DemoSimulation extends Simulation {
  val httpProtocol = http
    .baseUrl("https://api.example.com")
    .acceptHeader("application/json")

  val browse = scenario("Browse")
    .exec(http("get products").get("/products").check(status.is(200)))
    .pause(1)
    .exec(http("get product 42").get("/products/42"))

  setUp(
    browse.inject(
      rampUsersPerSec(1).to(100).during(30.seconds),
      constantUsersPerSec(100).during(2.minutes)
    )
  ).protocols(httpProtocol)
   .assertions(
     global.responseTime.percentile(95).lt(500),
     global.failedRequests.percent.lt(1)
   )
}

Injection profiles like rampUsersPerSec and constantUsersPerSec are rich, and .check() DSL is validated at compile time — typos break the build. After a run, an HTML report is generated automatically: percentile distributions, response time over time, and per-request-type graphs, all cleanly laid out on one screen. You can hand it to an executive as-is.

Kotlin DSL example:

import io.gatling.javaapi.core.CoreDsl.*
import io.gatling.javaapi.http.HttpDsl.*

class DemoSimulation : Simulation() {
  val httpProtocol = http.baseUrl("https://api.example.com")

  val browse = scenario("Browse")
    .exec(http("get").get("/products").check(status().`is`(200)))

  init {
    setUp(browse.injectOpen(rampUsers(100).during(30)))
      .protocols(httpProtocol)
  }
}

Gatling shines for JVM teams testing JVM backends (Spring Boot, Quarkus, Micronaut). Load tests live in the same Maven/Gradle build, JVM tuning knowledge transfers directly, and the reports are pretty enough that a Korean chaebol exec will love them. Weak spots: a few days of ramp-up to learn the DSL if you have never touched JVM DSLs, and weaker OSS distributed support (distributed is in paid Enterprise).

7. Vegeta / hey / wrk / oha / autocannon / Bombardier — CLI Benchmarks

Eighty percent of load testing needs no scenario at all. Most of the time you want one answer: "what happens when I hit this URL at 1000 RPS for 30 seconds?" That is what one-line CLI benchmarks are for.

hey (formerly boom). Written in Go, the simplest of the lot. Three options matter: -z (duration), -c (concurrency), -q (QPS).

hey -z 30s -c 50 -q 100 https://api.example.com/products

wrk. C-based, best single-machine throughput. Lua scripting for headers, bodies, and request branching. Downsides: Homebrew builds drift on macOS, and the text-only output is hard to post-process.

wrk -t 8 -c 100 -d 30s --latency https://api.example.com/products

Vegeta. Go, with a decisive attack | report | plot pipeline model. It writes a binary result file you can render as JSON, text, HTML, or plots.

echo "GET https://api.example.com/products" | \
  vegeta attack -duration=30s -rate=100/s | \
  tee results.bin | \
  vegeta report -type=text

vegeta report -type=json < results.bin > results.json
vegeta plot < results.bin > plot.html

oha. Rust, released in 2021. As of 2026 the hottest CLI benchmark tool. It is hey-compatible on the CLI, with a real-time TUI of ASCII charts. Once you see it, you have no reason to go back to hey.

oha -z 30s -c 50 -q 100 https://api.example.com/products

The TUI live-updates RPS, p50/p95/p99 latency, and status distribution as ASCII charts. In CI you switch to --no-tui --json.

autocannon. Node.js. npx autocannon runs it instantly, and being Node, it feels native when testing Node servers. The Fastify team's official benchmark tool.

npx autocannon -c 100 -d 30 https://api.example.com/products

Bombardier. Go, supports HTTP/2, JSON output, single binary. The smoothest experience on Windows.

bombardier -c 100 -d 30s -l https://api.example.com/products

Six tools, one table:

ToolLanguageScriptingTUI/graphHTTP/2When to pick
heyGononenonenoSimplest, familiar from old days
wrkCLuanonenoMax single-machine throughput
VegetaGononeHTML plotnoPost-process binary results
ohaRustnonereal-time TUIyes (HTTP/2)First pick in 2026
autocannonNodefn hooknoneyesNative fit for Node servers
BombardierGononenoneyesWindows, JSON output

The 2026 recommendation is simple. Put oha on your PATH and use it. The TUI makes results obvious even to first-time viewers. The day you need post-processable binary output, add Vegeta.

8. BlazeMeter / LoadRunner / NeoLoad — Enterprise

The enterprise SaaS market lives in a different world from OSS. The four differentiators are (1) audit and procurement artifacts, (2) cloud-distributed load generators, (3) GUI recording (record browser actions, convert to scenarios), and (4) APM integration.

BlazeMeter (Perforce, formerly CA). Ingests JMeter, Gatling, k6, and Selenium scripts and runs them distributed in the cloud. The 2024 launch of AI Co-Pilot added scenario generation and result analysis. Their real product is infrastructure that can legally manufacture a million concurrent users.

LoadRunner (Micro Focus → OpenText, acquired in 2023). The grandparent of load testing, dating back to Mercury Interactive in 1994. As of 2026, three lines: LoadRunner Professional, LoadRunner Enterprise, LoadRunner Cloud. Near-monopoly on legacy and industrial protocols like SAP, Oracle Forms, Citrix, and RDP. Famous for licensing costs in the tens to hundreds of thousands of USD per year.

NeoLoad (Tricentis). Founded by Neotys in 2014, acquired by Tricentis in 2021. Compatible with JMeter and Selenium scripts, strong on CI/CD integration. The 2025 launch of NeoLoad Web 2.0 accelerated the SaaS transition. High adoption in European enterprises, especially finance.

Azure Load Testing (Microsoft). GA in 2022. Essentially managed JMeter + k6: upload a .jmx or k6 JS file, Azure spins up generators and surfaces metrics joined with Application Insights. For Azure-centric orgs, you can use it without going through procurement.

Cloudflare Load Balancing test mode. Added in 2025. Shadows production traffic to a new origin, simultaneously validating load and correctness. Strictly speaking this is production shadowing, not load testing, but it belongs in the same conversation.

Three signals that you should pick enterprise SaaS:

  1. Your security team requires audit of any external tool that generates load against your servers
  2. You need huge-scale load (a million concurrent users) but lack the internal k6/k8s ops headcount
  3. You need to test SAP, Citrix, or Oracle Forms — LoadRunner has no real OSS alternative

Outside those cases, OSS k6 + Grafana Cloud k6 SaaS is overwhelmingly more cost-effective.

9. Chaos Mesh + Chaos Engineering — The Neighboring Field

Strictly, chaos engineering is a sibling to load testing, not the same tool. Load testing asks "what happens when traffic grows in normal conditions?" Chaos engineering asks "how does the system fail under abnormal conditions (node down, network latency, disk full)?"

Major chaos tools in 2026:

ToolEnvironmentNotable
Chaos MeshKubernetesCNCF Graduated (2024), 11 fault types, dashboard
LitmusChaosKubernetesCNCF Incubating, ChaosHub experiment library
GremlinSaaSCommercial, blast-radius safeguards, enterprise
AWS Fault Injection Service (FIS)AWSManaged, EC2/RDS/ECS/EKS actions
Azure Chaos StudioAzureManaged, fault on Azure resources

Pairing load tests with chaos has been a real pattern since 2025. Hold 500 RPS with k6 while killing 30% of your pods with Chaos Mesh, and you validate HPA, readiness probes, and circuit breakers all at once. CNCF's Chaos Engineering Working Group is expected to publish a best-practices document in 2026.

10. Browser Perf — Lighthouse / WebPageTest / SpeedCurve / Calibre

Browser performance is a different box from backend load testing. Backend asks "does the server hold the RPS." Browser perf asks "does the user feel the page load fast."

Lighthouse. Google's OSS audit tool. Built into Chrome DevTools, with a CLI and Node API, and CI integration via lighthouse-ci. It measures a single page once and scores Performance / Accessibility / Best Practices / SEO out of 100. Single-measurement variance is its biggest weakness.

npx lighthouse https://example.com \
  --output=json --output=html --output-path=./report

WebPageTest (now Catchpoint). Started in 2008 by Patrick Meenan, acquired by Catchpoint in 2020. Measures pages from real browsers (Chrome, Firefox, Edge) in 50+ locations worldwide. Waterfall, filmstrip, and video outputs are unmatched. Free interactive UI plus WebPageTest API and private instance options.

# WebPageTest API
webpagetest test https://example.com \
  --location ec2-us-east-1:Chrome \
  --connectivity 4G --runs 3 --first

SpeedCurve. A paid SaaS by Mark Zeman and Steve Souders. Combines synthetic and Real User Monitoring (RUM) in one dashboard, with daily automated measurements and regression alerts. The UX and visualization design appeals strongly to design teams.

Calibre. A paid SaaS by Karolina Szczur. Combines Lighthouse and WebPageTest, lets you define performance budgets as code (YAML), and gates CI on them. A reasonable price point for small teams.

# .calibre.yml example
budgets:
  - metric: largest-contentful-paint
    budget: 2500
  - metric: cumulative-layout-shift
    budget: 0.1
  - metric: interaction-to-next-paint
    budget: 200

RUM tools (Real User Monitoring): Cloudflare Web Analytics, Vercel Speed Insights, Google Search Console's CrUX, Sentry Performance, New Relic Browser, Datadog RUM. They aggregate CWV telemetry sent by real users' browsers.

You need both synthetic (controlled, repeated measurements) and RUM (real user telemetry). Synthetic alone misses the "fast in the lab, slow on the user's phone" gap. RUM alone makes regression debugging painful.

11. Core Web Vitals — INP Replaced FID

Core Web Vitals (CWV) are the three user-perceived performance metrics Google launched in 2020. In March 2024, INP (Interaction to Next Paint) officially replaced FID (First Input Delay), changing how we measure. As of May 2026 the official three are:

MetricDefinitionGoodNeeds ImprovementPoor
LCP (Largest Contentful Paint)When the largest content element rendered<2.5s2.5s-4s>4s
INP (Interaction to Next Paint)Slowest interaction delay over the page session<200ms200-500ms>500ms
CLS (Cumulative Layout Shift)Cumulative layout shift score<0.10.1-0.25>0.25

Why INP replaced FID: FID measured only the first input, and the criticism was that "it is already too easy — almost every site scores Good." It also drifted from real user experience. INP tracks every interaction across the page's lifetime and reports the P98 value. That is the actual moment users say "it lagged."

The single biggest factor in INP is main-thread work. Heavy React/Vue/Svelte renders, bloated hydration, and synchronous third-party scripts dominate. The acceleration of React 19 Server Components, Next.js 15 PPR (Partial Prerendering), and Astro Islands in 2024–2025 is the main-thread relief trend that came out of this.

A practical CWV regression workflow in four steps:

  1. Synthetic measurements (Lighthouse CI or Calibre) catch changes at main-branch merge time
  2. RUM (Vercel / Cloudflare / Sentry) tracks real-user P75 and P98
  3. WebPageTest dives deep on regressed pages (filmstrip, waterfall)
  4. Experiment (A/B) and verify RUM metrics recover after the fix

12. AI in Load Testing — k6 Studio, Artillery Pro AI

AI moved into load testing tools properly in 2024–2025. Two threads:

First, scenario generation. Record browser actions and auto-generate scripts. Released in 2024, k6 Studio is a desktop app: you click around in a browser, it captures HAR, and it produces a cleaned k6 JS scenario with tokenization and parameterization applied. Artillery Pro AI takes an OpenAPI spec and emits scenario YAML.

Second, result analysis. An LLM reads the load test output (time-series metrics + logs + APM) and writes natural-language diagnoses like "the bottleneck appears to be database connection pool exhaustion; the p99 spike from 1.5s to 8s coincides with a query whose new index has not been applied yet." BlazeMeter AI Co-Pilot, Grafana Sift, and Datadog Bits are the main examples.

As of 2026, AI is strong at drafting and summarizing, and weak at setting thresholds and causal inference. Deciding that "p95 on this API must stay under 500ms" is still a human decision. AI is an assistant that organizes data fast and narrows the suspect list.

13. Korea / Japan — Naver, Toss, Mercari

Naver. According to Naver D2 blog, core services like Search, Pay, and Webtoon long used a proprietary load testing framework plus JMeter. A recent post (2024) shared a k6 adoption case for a new service; the internal load testing platform is moving toward accepting both JMeter .jmx and k6 JS scripts.

Toss. Toss runs financial services where load testing is a mandatory release gate. Toss tech blog has repeatedly described a k6 + custom metrics pipeline + Grafana stack that catches regressions on every deploy. Notably, they encode "the scenario where one user completes one payment" in code and report not RPS but TPS (Transactions Per Second).

Kakao. Kakao's tech blog has long documented Gatling in large systems like KakaoTalk message delivery. It fits their JVM-heavy (Spring) organization.

Japan — Mercari. Mercari's engineering blog (engineering.mercari.com) published the 2023–2025 journey of standardizing on k6 as the main load testing tool. The pattern of using k6 + GraphQL extension to load-test a GraphQL gateway influenced the Japanese community heavily.

Japan — Yahoo Japan / LINE. LINE runs Locust + a custom distributed cluster for messaging backend load tests. Yahoo Japan has long used JMeter + private infrastructure for search backend load tests.

Japan — SmartHR, freee, MoneyForward. SaaS startups consistently picked k6 as the first choice. It fits their JS/TS-friendly backends (Node, Go), and Japanese-language documentation is abundant.

The pattern across Korea and Japan is consistent. Legacy and enterprise stay on JMeter/Gatling; new and SaaS pick k6. And both pair synthetic load tests with production RUM.

14. Who Should Pick What — Backend / Frontend / Enterprise / Solo

Recommendations by persona.

PersonaFirst pickSecond pickOne-liner
Modern backend team (Node/Go/Rust)k6 v1oha (CLI)JS scenarios + Grafana integration
Python ML / data teamLocustk6When you want pandas/numpy inside scenarios
JVM team (Spring/Quarkus)Gatlingk6Lives in the build + pretty reports
Regulated industries (finance/health)JMeterLoadRunnerGUI artifact required for audit
Enterprise massive scale (1M+)BlazeMeterNeoLoadOutsource the infra too
Quick one-line benchmarkohaVegetaTUI = instant understanding
Node server affinityautocannonk6One npx call
SaaS startup, solok6 + ohaVercel Speed InsightsFree OSS to start
Frontend perf regression watchLighthouse CICalibreAuto-measure every PR
Synthetic + RUM comboSpeedCurveCalibre + Cloudflare RUMUX team friendly
Paired with chaos engineeringChaos Mesh + k6LitmusChaosLoad + failure on k8s
AWS-centricAWS FIS + k6BlazeMeterManaged is easier
Azure-centricAzure Load Testingk6 + Azure Chaos StudioNo procurement needed

If you must pick one: a full-stack team handling modern backend + modern frontend, in May 2026, optimum is k6 v1 (load) + Lighthouse CI (browser) + oha (one-liner benchmark). All three are OSS, single binary or single package, instantly startable, and trivially turned into CI gates.

Load testing is never a one-off event. It must be a regression gate on every deploy, and it must be paired with production RUM so you always notice the gap between "the number in the lab" and "the number real users see." Tools are just parts that make that workflow possible.

15. References