Frontend Testing 2025 — Vitest, Jest, Bun, Testing Library, Playwright, Storybook, MSW, Visual Regression, AI (S6 E11)

Prologue — why the testing stack got easier

Three things changed in 2024–2025 that simplified frontend testing:

Vitest matured and absorbed most Jest users. Speed + ESM-native + Vite-integration compound.
Playwright became boring in the good sense — it just works, across browsers, on CI, with great tooling.
AI-generated tests moved from gimmick to "reasonable first pass" — Cursor, Continue, Claude Code can write competent tests.

This post is about what to actually write in 2026, and in what proportions.

1. The testing pyramid is still right

The 1:10:100 pyramid (e2e : integration : unit) is still the right shape. What changed:

The unit layer is cheaper than ever (Vitest is instant).
The integration layer got richer (Testing Library + MSW + Storybook play).
The e2e layer got reliable (Playwright replaced Cypress at scale).

Anti-pattern: inverted pyramid (lots of e2e, few units) — brittle, slow, expensive.

2. Unit tests — Vitest is the answer

Why Vitest won

ESM-first, TS out of the box.
Same config as Vite — monorepo bliss.
Happy-dom and JSDOM both supported.
Instant watch mode.
Vitest UI (browser dashboard).

Minimal setup

// vitest.config.ts
import { defineConfig } from 'vitest/config'
import react from '@vitejs/plugin-react'
export default defineConfig({
  plugins: [react()],
  test: {
    globals: true,
    environment: 'happy-dom',
    setupFiles: ['./test/setup.ts'],
  },
})

When Jest is still OK

Very large legacy Jest suites (migration cost > benefit).
React Native (Jest is still the standard).

Bun test

Fastest runtime in raw benchmarks.
Good for pure Bun stacks.
Less ecosystem than Vitest.

3. Component tests — Testing Library + Storybook 8

Testing Library

"Test what the user sees, not what the component renders."
getByRole, getByLabelText, userEvent.
Works with Vitest or Jest.

Storybook 8 (2024 release) + Test Runner

Every story becomes a test.
play function lets you script interactions + assertions.
Portable stories: import stories into Vitest and run as normal tests.
Chromatic integrates for visual regression.

Pattern: write the story first, test the story. One file, three uses: docs, unit test, visual regression.

4. Integration tests — MSW is the glue

Mock Service Worker (MSW) intercepts fetch calls at the network layer. Same mocks work in:

Unit tests (Vitest/Jest).
Storybook.
Cypress/Playwright.
Dev server.

// mocks/handlers.ts
import { http, HttpResponse } from 'msw'
export const handlers = [
  http.get('/api/users', () => HttpResponse.json([{ id: 1, name: 'Alice' }])),
]

Benefit: your test fixtures match your dev fixtures match your docs fixtures. One source of truth.

5. E2E — Playwright is the default

Why Playwright

Cross-browser (Chromium, Firefox, WebKit).
Auto-wait built in — fewer flaky tests.
Trace viewer, time-travel debugging, network inspector.
VSCode extension, CI caching, sharding.

Playwright practices

Page Object Model, but lightly — overbuilt POMs hurt readability.
Test isolation — each test gets a fresh browser context.
Accessibility snapshots via axe-playwright or built-in a11y checks.
Trace on failure only — trace on every run is expensive.

Cypress

Still loved for DX but losing market share.
Worth considering if you're already invested.

Bun-native e2e

Emerging but not yet a Playwright replacement in 2026.

6. Visual regression — Chromatic, Percy, Loki

Chromatic (by Storybook team) — per-story screenshots, works with Storybook.
Percy (BrowserStack) — broader, works with many frameworks.
Loki / Reg-cli — self-hosted options.

Pattern: on every PR, visual regression runs against your Storybook preview. Bot comments with visual diffs.

Caveat: too many false positives (tiny AA diffs) → team ignores → signal loss. Tune thresholds early.

axe-core + @axe-core/playwright — in unit, component, and e2e layers.
Storybook a11y addon — runs axe on every story.
Lighthouse CI — runs on preview deploys.

Goal: accessibility failures become build failures, not "someone else's job."

8. Contract tests — Zod + typed API

With tRPC/GraphQL/OpenAPI, contract tests are mostly automatic. For REST:

Zod schemas on both client and server, generated from OpenAPI.
Pact (for consumer-driven contract testing) when backend ships independently.
Prism from Stoplight for mocking OpenAPI.

9. Performance testing

Lighthouse CI — per-PR, fails build on regression.
WebPageTest API — for richer scenarios.
k6 (Grafana) — load testing, with browser-scripted scenarios.
Chrome DevTools Performance traces — local profiling.

Tie performance budgets into CI so you don't ship an LCP regression.

10. AI-augmented testing

What works in 2026:

Generating test scaffolds from component source — Cursor/Claude Code do this well.
Fix suggestions for failing tests (Datadog FlakyTest, Trunk Merge Queue).
Playwright test generation from recorded browser sessions (npx playwright codegen).
AI-assisted code coverage analysis (suggest which branches need tests).

What doesn't yet work:

"Write a full test suite for this app" — too shallow, lots of hallucinated assertions.
Automated test repair without human review — drifts subtly.

Rule: AI writes the first draft, human reviews and tightens.

11. Flakiness — the enemy

Flaky tests destroy team trust. Fixes:

Auto-wait by default (Playwright does this).
No setTimeout in tests — wait for condition, not clock.
Isolate network, use MSW, never hit real APIs in unit/integration.
Quarantine flaky tests to slow lane with ownership tags, not indefinite retry.
Time freezing: vi.useFakeTimers() or faketime for date-sensitive tests.

Key metric: flake rate < 1%. Above that, people ignore CI failures.

12. Coverage — a tool, not a goal

80% line coverage is fine as a floor; don't chase 100%.
Mutation testing (Stryker) reveals tests that "pass but don't assert" — more useful than coverage alone.
Focus coverage on critical paths: checkout, auth, payment.

13. The 2026 ideal stack

Layer	Tool
Unit	Vitest
Component	Testing Library + Storybook 8 + Vitest
Network mocks	MSW
Visual regression	Chromatic
E2E	Playwright
A11y	`axe-core` everywhere
Contract	Zod + OpenAPI codegen
Performance	Lighthouse CI

12-item checklist

Unit tests run under 10 seconds for a PR?
E2E runs under 3 minutes with sharding?
Flake rate under 1%?
Every component has a Storybook story?
Visual regression runs per PR?
A11y failures fail build?
MSW used for all network mocking?
Contract tests generated, not handwritten?
Performance budget in CI?
Coverage thresholds set per directory?
Test runner integrated with IDE (VSCode/WebStorm)?
Test writing included in AI pair-programming workflow?

10 anti-patterns

Testing implementation details (checking CSS classes).
Too many e2e tests — pyramid inverted.
setTimeout in tests.
Real API calls in unit tests.
Visual regression with zero tolerance — every tiny pixel change triggers.
100% coverage as goal — drives useless tests.
Shared mutable state across tests.
Large beforeAll setup — test order dependence.
Snapshots without review — green-pass hides real regressions.
Flaky tests "fixed" by retrying 3× — masks root cause.

Next episode

Season 6 Episode 12: Frontend CI/CD & Deployment 2025 — GitHub Actions, Turborepo Remote Cache, Vercel, Netlify, Cloudflare, preview environments, canary, feature flags, SLSA/SBOM.

— End of Frontend Testing.