Chaos and Order

💡 왼쪽 원문을 읽으면서 오른쪽에 따라 써보세요. Tab 키로 힌트를 받을 수 있습니다.

원문 렌더가 준비되기 전까지 텍스트 가이드로 표시합니다.

Prologue — "Deploy = Release" Still Doesn't Hold in 2026

Engineer: "I finished deploying the new feature this week."

PM: "So users have it now?"

Engineer: "No, it's behind a flag. We turn it on for 5% next week."

PM: "Then it's not really shipped, is it?"

Engineer: "...The deploy is done. The release isn't."

This dialogue still happens in 2026, and the gap between **deploy (putting code in production)** and **release (turning it on for users)** is the heart of modern delivery. The tool that makes the split possible is the **feature flag**.

Feature flags are no longer just "an if-statement trick." In 2026, a feature flag platform bundles **kill switches + gradual rollouts + A/B experimentation + multi-arm bandits + holdout analysis + targeting + segmentation + audit logs** into one product, and **OpenFeature** — a CNCF standard SDK — sits as a vendor-neutral layer on top. The market splits cleanly into three camps: managed SaaS (LaunchDarkly, Statsig), open source (GrowthBook, Unleash, Flagsmith), and all-in-one platforms (PostHog).

This post maps the 2026 landscape. We place sixteen products and standards, walk through the five core patterns (kill switch, gradual rollout, A/B, bandit, holdout), look at real usage at Toss, Kakao, and Mercari, and end with concrete picks per team profile.

1. The 2026 Feature Flag Map — Three Camps

The big picture: in 2026 the market splits cleanly into three camps.

**1) Managed SaaS** — LaunchDarkly, Statsig, ConfigCat, Split, DevCycle, AB Tasty, Eppo, Optimizely Rollouts

- Hosted-first. Enterprise SLAs, SSO, SOC 2, HIPAA, BAA, FedRAMP.

- Priced by MAU, flag count, or environment count.

**2) Open source** — GrowthBook, Unleash, Flagsmith, Bucketeer

- Self-hosting is the first-class citizen. SaaS is the add-on.

- Licenses: GrowthBook (MIT), Unleash (Apache 2.0), Flagsmith (BSD-3), Bucketeer (Apache 2.0).

- Choose when avoiding vendor lock-in, keeping data on your own infrastructure, or matching a fixed budget matters more than convenience.

**3) All-in-one** — PostHog

- Analytics + session replay + feature flags + experimentation in one product.

- Doesn't treat flags as a separate tool; sees them as toggles on the same plane as events.

Above all of this sits **OpenFeature (CNCF)** as the standard SDK layer. The vendor underneath can change; your code's evaluation API doesn't. This is the biggest shift between 2024 and 2026 — it's no longer "pick a vendor for life" but "pick a standard SDK and swap providers behind it."

| --- | --- | --- | --- | --- |

2. OpenFeature (CNCF) — The Standard SDK

**OpenFeature** started in 2022, entered CNCF Incubating in 2023, and accelerated rapidly through 2024 and 2025. By 2026, nearly every major flag vendor — LaunchDarkly, Split, ConfigCat, Flagsmith, Unleash, DevCycle, GrowthBook — ships an official OpenFeature provider.

The core idea: **unify the SDK API across vendors and make the backend (provider) swappable.** What OpenTelemetry did for telemetry, OpenFeature does for feature flags.

A typical Node/TypeScript snippet:

await OpenFeature.setProviderAndWait(

new LaunchDarklyProvider(process.env.LD_SDK_KEY!)

)

const client = OpenFeature.getClient()

const showNewCheckout = await client.getBooleanValue(

'new-checkout',

false, // default fallback

{ targetingKey: userId, plan: user.plan }

)

if (showNewCheckout) {

return <NewCheckout />

}

return <LegacyCheckout />

Want to switch to Unleash tomorrow? Replace `LaunchDarklyProvider` with `UnleashProvider`. Every `getBooleanValue` / `getStringValue` call stays untouched.

**What OpenFeature standardizes**:

- Evaluation API — `getBooleanValue`, `getStringValue`, `getNumberValue`, `getObjectValue`.

- Evaluation context — the user and request attributes used for targeting (`targetingKey` plus arbitrary attrs).

- Hooks — `before` / `after` / `error` / `finally` lifecycle hooks.

- Providers — vendor backend adapters.

- Telemetry — automatic OpenTelemetry spans for evaluation events.

**What OpenFeature does not standardize**:

- Flag definition format — each vendor's console can differ.

- Targeting rules — vendor-specific.

- Experiment design and statistics — provider territory.

When can you skip OpenFeature? Small teams committed to a single vendor where fast onboarding trumps everything. Otherwise, in almost every greenfield project in 2026, the default is to put OpenFeature at the entrance and plug the provider in behind it.

3. LaunchDarkly — The Enterprise Leader

**LaunchDarkly** launched in 2014 and is the de facto original of managed feature flag SaaS, as well as the enterprise default. In 2026, a large share of the Fortune 500 uses it, and it ticks the enterprise checklist — SOC 2, HIPAA, BAA, FedRAMP, SAML SSO, SCIM — better than any competitor.

LaunchDarkly's strengths:

- **The widest SDK matrix** — Node, TS, Go, Java, Python, Ruby, .NET, PHP, Rust, Erlang, Elixir, Swift, Kotlin, Flutter, React, Vue, Angular, iOS, Android, edge workers.

- **Client-side safety** — server-side evaluation, edge relay proxy, "private attributes" so PII never leaks to the console.

- **Permissions and approval workflows** — who can flip which flag in which environment, RBAC, approvals, change requests.

- **Audit log and compliance** — every change records who, when, and why.

- **Experimentation** — a built-in stats engine (Bayesian plus frequentist), CUPED, and sequential testing.

- **Enterprise integrations** — Datadog, Slack, Jira, ServiceNow, Terraform provider.

A typical LaunchDarkly call:

const ldClient = LDClient.init(process.env.LD_SDK_KEY!)

await ldClient.waitForInitialization()

const context = {

kind: 'user',

key: userId,

plan: user.plan,

country: user.country,

}

const variant = await ldClient.variation('checkout-redesign', context, 'control')

switch (variant) {

case 'treatment-a':

return <CheckoutA />

case 'treatment-b':

return <CheckoutB />

default:

return <CheckoutControl />

}

Downsides: pricing (MAU and flag-count based, scales up quickly), SaaS reliance (self-host exists but as a separate contract), and a feature set that overshoots small teams.

When LaunchDarkly: enterprise, regulated industries (finance, healthcare), many environments and SDKs, approval workflows, generous budget.

4. Statsig — Flags + Experiments + Analytics

**Statsig** was founded in 2021 by a team that spun out of Meta. It started with the deliberate goal of **packaging feature flags, experimentation, and analytics into a single product**. By 2026, OpenAI, Notion, Atlassian, Figma, and Whatnot are among the well-known users of fast-growing startups and scale-ups.

What sets Statsig apart:

- First-class primitives — **gate (boolean flag) + experiment (A/B) + dynamic config (runtime configuration) + holdout**.

- A homegrown **stats engine** — frequentist (t-test, delta method) by default, sequential testing, CUPED, automatic sample ratio mismatch (SRM) detection.

- A **built-in event pipeline** — flag evaluation, exposure logging, and metric calculation all live in one system.

- A **generous free tier** — up to 1M events per month, friendly to startups.

- **Pulse** — automatic detection of interesting metric movements in production with alerts.

A typical Statsig call:

await Statsig.initialize(process.env.STATSIG_SERVER_KEY!)

const user = { userID: userId, email: user.email, country: user.country }

if (await Statsig.checkGate(user, 'new_checkout_enabled')) {

// gradual rollout

}

const experiment = await Statsig.getExperiment(user, 'checkout_redesign_v2')

const layout = experiment.get('layout', 'vertical')

const buttonColor = experiment.get('button_color', 'blue')

await Statsig.logEvent(user, 'checkout_completed', orderTotal, { variant: layout })

**Holdout** is the feature Statsig is most often praised for. You permanently exclude a user group from all experimental changes, then each quarter compare metrics between the holdout and everyone else to see the cumulative effect (or regression). A single experiment may show stat-sig lift; only the holdout reveals the cumulative system effect.

When Statsig: a fast-growing product team running many experiments weekly, where having flags, metrics, and analytics in one tool matters. The internal flow most closely resembles Meta's internal tooling.

5. GrowthBook (Open Source) — Series Seed

**GrowthBook** launched as open source in 2020, is MIT licensed, went through Y Combinator, and raised Series Seed funding. It has solidified the position of **"the experimentation platform of the open source camp"** more firmly than any competitor.

GrowthBook's posture:

- **Data warehouse-backed experiments** — doesn't collect metrics itself. Pulls them directly via SQL from BigQuery, Snowflake, Redshift, ClickHouse, Postgres, MySQL, or Databricks.

- **Bayesian + frequentist** stats engines — both supported, Bayesian by default.

- **Self-hosting first** — `docker compose up` and you're running. SaaS is the add-on.

- **Both flags and experiments** — from simple toggles to full experiment analysis.

- **Visualization** — uplift, confidence intervals, time-series, and breakdowns by dimension look polished.

A typical GrowthBook JS snippet:

const gb = new GrowthBook({

apiHost: 'https://gb.example.com',

clientKey: process.env.GB_CLIENT_KEY!,

attributes: { id: userId, country: user.country },

})

await gb.loadFeatures()

const showNewCheckout = gb.isOn('new-checkout')

const experiment = gb.evalFeature('checkout-redesign')

const variant = experiment.value // 'control' | 'treatment-a' | 'treatment-b'

gb.trackingCallback = (experiment, result) => {

analytics.track('experiment_viewed', {

experimentId: experiment.key,

variationId: result.variationId,

})

}

The **data warehouse-backed posture** is the crux. Statsig owns the event pipeline; GrowthBook says, "We don't collect metrics — we analyze the events already in your warehouse." That keeps **data governance clean** and lets you reuse **metrics already defined in dbt or Airflow** directly.

When GrowthBook: you already have a warehouse and analytics pipeline and want to layer experiments on top. OSS, self-hosting preferred. Collaboration between data and experiment teams is central.

6. Unleash — The Norwegian Open Source

**Unleash** started in Norway in 2014 as an open source feature flag platform under Apache 2.0. In 2026, in the OSS camp it stands alongside GrowthBook, but the posture is different — Unleash is positioned as **"enterprise-friendly OSS."**

Unleash's posture:

- **Self-hosting first** — Docker, Helm, Terraform are all official.

- **Enterprise features released into OSS** — environment separation, project-level permissions, audit log, change requests (request + approval workflow) all in OSS.

- **Strategies** — beyond simple booleans, gradual rollouts, userId-based and custom strategies are configured in the console.

- **Edge proxy** — Unleash Edge brings client evaluations close to data centers or edge nodes.

- **International adoption** — Norwegian government, Finnish government, BMW, Audi, Lufthansa — many EU enterprise use cases.

A typical Unleash call:

const unleash = initialize({

url: 'https://unleash.example.com/api/',

appName: 'web-checkout',

customHeaders: { Authorization: process.env.UNLEASH_API_TOKEN! },

})

unleash.on('ready', () => {

const enabled = unleash.isEnabled('new-checkout', {

userId: userId,

properties: { plan: user.plan, country: user.country },

})

const variant = unleash.getVariant('checkout-experiment', { userId })

// variant.name: 'control' | 'a' | 'b' | 'disabled'

})

The line between SaaS and OSS is **deliberately blurry** here. Unleash Enterprise exists as SaaS, but enough features have been released into OSS that "self-host + OSS" can take you a long way. EU GDPR and data-sovereignty-conscious government, finance, and manufacturing domains adopt it especially often.

When Unleash: you need enterprise-friendly features in OSS, EU or regulated industry, mandatory self-host. If GrowthBook is "the OSS strong on experiments," Unleash is "the OSS strong on flag operations."

7. ConfigCat / Flagsmith / Split — The Others

**ConfigCat** — A managed SaaS from a Hungarian team. Their posture is **"feature flags are just toggles"** — experimentation is deliberately omitted, and prices stay low. The free tier is generous (10M evaluations per month), and the basics — targeting, gradual rollout, environment separation — are all there. SDKs cover 20+ languages. Attractive for teams that say, "Run experiments somewhere else; we only need flags."

**Flagsmith** — OSS (BSD-3) plus SaaS from a UK team. Similar posture to Unleash and GrowthBook, but **API/headless-first** is the differentiator. REST + GraphQL + SDKs + a Terraform provider are all first-class. Self-host is one `docker compose` away. Integrations with PostHog, Segment, Datadog, and Slack are clean.

**Split** — Founded in 2015, an early managed SaaS that bundled experiments and flags. It was **acquired by Harness in 2024** and folded into **Harness Feature Management & Experimentation**. In 2026, the natural framing is no longer "Split standalone" but "a module inside the Harness platform." The strongest argument is that flags now sit in the same platform as CI/CD and deploy verification.

**AB Tasty** — A French managed SaaS, marketing and product team-friendly. **Insight Partners took a majority stake in 2024**, and **AB Tasty was acquired by digital agency DEPT in September 2024**. In 2026, the posture is closer to marketing experimentation; engineers as the first-class user is more familiar territory for LaunchDarkly or Statsig.

**Optimizely Rollouts** — Optimizely's (the former web experimentation leader) **free feature flag product**. Experiments live in the paid tier; simple flagging lives in Rollouts. In 2026, this combination shows up in large organizations that want marketing-friendly experimentation alongside engineer-friendly flagging.

8. PostHog — All-in-One (Analytics + Flags + Replay)

**PostHog** is an OSS (MIT core) product analytics platform that started in 2020. Its identity is **packing everything into one product** — analytics, funnels, retention, session replay, feature flags, experiments, surveys, data warehouse, LLM observability. By 2026 it sits among YC's top companies, and both managed SaaS and self-host are first-class.

PostHog from a feature flag perspective:

- **Toggles on the same plane as analytics** — flipping a flag immediately ties it to cohorts, funnels, retention, and replay, with no glue code in between.

- The full **flag to experiment to metric to decision** loop in one product.

- **Session replay coupling** — see the actual click patterns of users who got the new feature.

- **LLM observability** — added in 2025–2026, tracking prompt-level latency, cost, and outcomes on the same plane.

- A **very generous free tier** — up to 1M events and 1M flag calls per month.

A typical PostHog call:

const client = new PostHog(process.env.POSTHOG_KEY!, { host: 'https://app.posthog.com' })

const enabled = await client.isFeatureEnabled('new-checkout', userId, {

groups: { organization: orgId },

personProperties: { plan: user.plan, country: user.country },

})

const variant = await client.getFeatureFlag('checkout-redesign', userId)

// variant: 'control' | 'treatment-a' | 'treatment-b'

client.capture({

distinctId: userId,

event: 'checkout_completed',

properties: { variant, orderTotal },

})

PostHog's killer pitch is **"one product for all of it."** A small team can start with PostHog alone instead of buying Mixpanel + LaunchDarkly + Optimizely + Hotjar + a Datadog LLM add-on separately. The downside: in any single area — analytics, replay, flags — PostHog may not match the depth of a specialist (Amplitude, Mixpanel, LaunchDarkly, FullStory). The posture is "spread out and reasonably deep."

When PostHog: startups and scale-ups that want analytics, replay, and flags in one product from day one, with OSS and self-host options available. One of the **most common defaults for new SaaS startups in 2026**.

9. DevCycle / Hypertune (TS) / Eppo / Bucketeer

The "differentiated by posture" group of followers.

**DevCycle** — Rebranded from Taplytics in 2023. **Edge evaluation is first-class** — flag evaluations finish in milliseconds at Cloudflare Workers, AWS Lambda@Edge, or CloudFront Functions. The SDKs auto-sync flag definitions so evaluation happens 100% locally (no network round trip). One of the most-cited managed SaaS options for mobile, edge, and serverless environments.

**Hypertune** — The differentiator is **TypeScript type safety**. Flag definitions are codegen'd into TS types, so writing `flags.checkout.layout` gives you IDE autocomplete and instant type checking. Built by a team with Vercel roots, it's especially attractive to Next.js + TS full-stack teams. The posture goes beyond simple boolean toggles to **structured config (objects, nested enums) handled with type safety**.

// Hypertune generates types like this via codegen

const hypertune = createHypertune(/* ... */)

const layout = hypertune.checkout({ user }).layout()

// 'vertical' | 'horizontal' (enum type)

const buttonColor = hypertune.checkout({ user }).buttonColor()

// 'blue' | 'green' | 'red'

**Eppo** — An **experimentation-first** managed SaaS. Similar warehouse-on-top posture to GrowthBook but SaaS. Advanced stats — CUPED, sequential testing, SRM detection, heterogeneous treatment effects — are first-class. Built by Airbnb and Stitch Fix alumni; the posture is "data scientists doing deep experimentation."

**Bucketeer** — Built as an internal tool at CyberAgent (Japan), open-sourced under Apache 2.0 in 2022. Self-hostable OSS. Positioned as **"Japan's LaunchDarkly"**, with adoption in Japanese enterprise and gaming. Ships an official OpenFeature provider.

10. Vercel Edge Flags / Cloudflare Workers

**Vercel** chose an **adapter model** rather than building its own backend. A thin library called `@vercel/flags` plus Edge Config (a globally distributed, read-optimized KV store) lets you talk to any flag vendor (LaunchDarkly, Statsig, Hypertune, Optimizely, etc.) through a Vercel adapter, with **millisecond-level evaluation inside Edge Functions**. It stabilized in 2025 and is becoming the default pattern for Next.js full-stack teams in 2026.

// app/page.tsx

const showNewCheckout = flag({

key: 'new-checkout',

decide: async () => {

const ld = await getLaunchDarklyClient()

return ld.variation('new-checkout', context, false)

})

export default async function Page() {

const enabled = await showNewCheckout()

return enabled ? <NewCheckout /> : <LegacyCheckout />

}

**Cloudflare Workers** — Cloudflare itself does not ship a managed flag product (as of May 2026). Instead, the pattern is to build it yourself by combining **Workers KV + D1 + Durable Objects**, with evaluation via Workers adapters from Statsig, LaunchDarkly, DevCycle, or Flagsmith as standard. **Hyperdrive + Workers Analytics Engine** is starting to show up as the analytics layer for experimentation.

The core idea: **evaluation at the edge must be in milliseconds.** Request to edge to flag eval to response branching all happens in one region. To do that, SDKs default to "sync flag definitions ahead of time, then evaluate 100% locally." External API calls in the evaluation path are off-limits.

11. Patterns — Kill Switch / Gradual Rollout / A/B / Multi-Arm Bandit / Holdout

Once you've picked a tool, the next thing is **patterns**. The five most common in 2026 practice.

1) Kill switch (operational safety net)

**The simplest and most important pattern.** Turn the new feature on, watch the monitors, and **kill it immediately** if anything goes wrong. This isn't experimentation — it's incident response.

- Operational rule: every new feature should sit behind a kill switch for the first one to four weeks.

- Guards: SRE / on-call can flip it off without engineering or PM approval.

- Monitoring: error rate, latency, and conversion of flagged-on users should auto-alarm.

2) Gradual rollout (progressive delivery)

**Increase traffic slowly by percent.** 1% to 5% to 10% to 25% to 50% to 100%. Each step is followed by a **canary period** to confirm metrics look healthy.

- Deterministic hashing on userId, sessionId, or deviceId — the same user always lands in the same bucket.

- Common framing isn't "5% of everyone" but "5% of new signups" or some other segmentation.

- Targeting rules narrow by country, plan, platform, or app version.

3) A/B test (hypothesis validation)

Measures the **causal effect of two (or more) variants on a metric**. The essential difference from rollouts: A/B is an attempt to draw a **statistical conclusion**.

- Pre-design: hypothesis, primary metric, secondary metrics, minimum detectable effect (MDE), sample size, duration.

- Mid-run hygiene: automatic SRM detection, no peeking (sequential testing helps).

- Decision: lift + confidence interval + business impact as a triplet.

4) Multi-arm bandit (adaptive exploration)

Different posture from A/B. Instead of fixed splits running to the end, **traffic increasingly shifts to whichever arm is winning**. Thompson sampling is the most common algorithm.

- Good when: short-term decisions (headlines, thumbnails, push notifications), a single clear metric, fast feedback.

- Bad when: changes whose effects unfold over time (pricing, onboarding flow), or when you need heterogeneous treatment effects.

- Tools: Statsig, Eppo, LaunchDarkly, and GrowthBook all support bandit mode.

5) Holdout (cumulative effect measurement)

**Permanently exclude a fraction of users from all experimental changes.** Each quarter, compare metric movement between that holdout group and everyone else to see "the cumulative effect of every change we shipped last quarter."

- Why: individual experiments each showed +0.5% lift, but quarterly retention dropped. System-level effects don't show up in per-experiment validation.

- Operation: randomly select 1–5% of users at the start of a quarter and exclude them from every experiment.

- Tools: Statsig, LaunchDarkly, and Eppo support holdouts as first-class.

12. Korea / Japan — Toss, Kakao, Mercari

Looking only at overseas SaaS gives you half the picture. Real usage at Korean and Japanese big tech.

Korea — Toss

**Toss (Viva Republica)** is known to have built its feature flag system in-house. Per Toss SLASH conference talks and the Toss tech blog:

- A homegrown traffic-routing system runs the rollout and A/B routing decisions.

- **Financial-domain safety** requirements — a wrong route directly causes incidents, so permissions, approvals, and audit are strict.

- Tightly coupled with the in-house data infrastructure, so flag evaluation, exposure, and metrics flow through one pipeline.

Toss's posture summarizes as **"a regulated industry's in-house build, deeply integrated with the data team."** Built internally rather than buying external SaaS.

Korea — Kakao

**Kakao** is large and not uniform across business units, but the if/kakao conference and the Kakao tech blog show the following currents:

- An **in-house feature flag gateway pattern** — flag evaluation is unified at a layer in front of the microservices.

- The **A/B infrastructure** behind KakaoTalk, KakaoPay, and KakaoMobility has been presented internally.

- Some business units **mix internal builds with external SaaS like PostHog or LaunchDarkly**.

In a large organization, per-business-unit tooling differs, and an internal gateway layers over the top.

Japan — Mercari

**メルカリ (Mercari)** has been relatively open in its engineering blog about how it runs feature flags:

- **LaunchDarkly was used at one point**, with some domains later moving to internal builds.

- In a microservices environment, the pattern preserves **flag-evaluation consistency across services** — one request hitting the same flag in multiple services produces the same result.

- A confluence of **data governance + in-app experimentation + microservices** where building it yourself feels increasingly natural.

**Bucketeer (CyberAgent)** showing up often at Japanese conferences is also a feature of the Japanese market — a Japanese OSS adopted by Japanese enterprise as an OpenFeature provider.

**Common pattern**: Korean and Japanese big tech **don't use only external SaaS.** They mix in-house builds and self-hosted OSS, then customize around data governance, approval workflows, and domain specifics (finance, e-commerce, telecom). **The OpenFeature standard shines brightest in these mixed environments** — external SaaS, in-house builds, and OSS can sit behind one SDK.

13. Who Should Pick What — Small Team / Growth / Enterprise / Self-Hosted

The tool comparison is done. The answer that fits your team.

Small team (5–20 people) — Single product, fast start

- **First pick: PostHog** — analytics, flags, and replay in one. The free tier is generous, and a small team can't run five separate tools.

- **OpenFeature should be wired in** — small teams are more likely to switch vendors later.

- **If you don't worry about lock-in: Statsig** — generous free tier, posture matches well when you're running experiments weekly.

- **Only need simple toggles: ConfigCat** — very cheap, flags without experimentation.

Growth stage (20–200 people) — Experimentation culture

- **Many experiments per week: Statsig** — flags + experiments + analytics in one tool.

- **Warehouse-first: GrowthBook or Eppo** — when metrics already live in BigQuery / Snowflake.

- **Next.js + TS full stack: Hypertune + Vercel Edge Flags** — type safety and edge evaluation are the core.

- **Lean even harder on OpenFeature** — the stage where tools are most likely to change.

Enterprise (200+, regulated industry) — Permissions, approvals, audit

- **First pick: LaunchDarkly** — best at ticking the enterprise checklist (SOC 2, HIPAA, BAA, FedRAMP, SSO, SCIM).

- **EU / GDPR-first: Unleash Enterprise** — EU-based, rich self-host options.

- **Tied to CI/CD: Harness Feature Management (formerly Split)** — deploy + flag + verification in one platform.

- **OpenFeature is mandatory** — multiple vendors coexisting inside the company is the norm.

Self-hosted (legal or technical requirement) — Data sovereignty

- **OSS-first by default — GrowthBook, Unleash, Flagsmith, or Bucketeer depending on fit**:

- GrowthBook — strong experimentation, warehouse-integrated.

- Unleash — strong flag operations and approval workflows.

- Flagsmith — API / headless-first, easy to integrate with other tools.

- Bucketeer — Japanese OSS, official OpenFeature provider.

- **PostHog self-host** — when you want analytics and replay self-hosted too.

Four axes of the decision

| Axis | Side A | Side B |

| --- | --- | --- |

| Hosting | SaaS (LaunchDarkly, Statsig, ConfigCat) | self-host (Unleash, GrowthBook, Flagsmith, PostHog) |

| Scope | flags only (LaunchDarkly, ConfigCat) | platform (Statsig, PostHog, GrowthBook) |

| Data model | own event pipeline (Statsig, PostHog) | warehouse first (GrowthBook, Eppo) |

| Affinity | marketing / PM (AB Tasty, Optimizely) | engineering / data (LaunchDarkly, Statsig, GrowthBook) |

Put OpenFeature in the middle and every axis above becomes a **swappable decision** rather than a one-way door. **The 2026 default is "vendor plus OpenFeature."**

Epilogue — Flags Are an Operational Safety Net; Experimentation Is a Learning System

The one-line summary:

> **Feature flags are an operational safety net that separates deploy from release; experimentation is a learning system that validates hypotheses with data. They often live in one tool, but the priorities differ — every team needs flags; only teams with an experimentation culture need experimentation.**

In 2026 the market sits as:

- **LaunchDarkly** — enterprise default.

- **Statsig** — experiments + flags + analytics unified, for fast-growing teams.

- **GrowthBook / Unleash / Flagsmith / Bucketeer** — OSS camp, self-hosting and data sovereignty.

- **PostHog** — all-in-one analytics + flags + replay.

- **ConfigCat / Split (Harness) / DevCycle / Hypertune / Eppo / Optimizely Rollouts / AB Tasty** — each with their own posture.

- **OpenFeature** — the CNCF standard that binds all of the above.

Ten common traps:

1. Creating flags and never deleting them — 1,000 zombie flags six months later.

2. Shipping new features without a kill switch — the only recourse is a code rollback.

3. No permission separation — anyone can flip a production flag, mistakes become incidents.

4. Ignoring SRM — what you thought was 50:50 was actually 60:40.

5. Peeking — checking results daily without sequential testing.

6. Making permanent decisions from a single experiment without holdout verification.

7. External API calls in the flag-evaluation path — latency explodes.

8. PII entering the flag evaluation context — leaks straight to the console.

9. Strongly locked into one vendor — every code path has to be rewritten on the switch (no OpenFeature).

10. Starting without pre-defined hypotheses and metrics — results drive hypotheses (HARKing).

Coming next

Candidate next posts: **OpenFeature deep dive — writing a provider yourself**, **experimentation stats deep dive — CUPED, sequential, SRM**, **tying kill switches into canary deploys and the SRE on-call**.

> "Deploy is moving code; release is turning it on for users. Feature flagging is the job of shrinking the distance between the two while keeping it safe."

— Feature Flags & Experimentation Platforms 2026, end.

References

- [OpenFeature — CNCF](https://openfeature.dev/)

- [OpenFeature GitHub — open-feature/spec](https://github.com/open-feature/spec)

- [OpenFeature CNCF Incubating announcement](https://www.cncf.io/projects/openfeature/)

- [LaunchDarkly](https://launchdarkly.com/)

- [LaunchDarkly Documentation](https://docs.launchdarkly.com/)

- [Statsig](https://statsig.com/)

- [Statsig Documentation](https://docs.statsig.com/)

- [Statsig — Holdouts](https://docs.statsig.com/experiments-plus/holdouts/)

- [GrowthBook](https://www.growthbook.io/)

- [GrowthBook GitHub — growthbook/growthbook](https://github.com/growthbook/growthbook)

- [Unleash](https://www.getunleash.io/)

- [Unleash GitHub — Unleash/unleash](https://github.com/Unleash/unleash)

- [ConfigCat](https://configcat.com/)

- [Flagsmith](https://flagsmith.com/)

- [Flagsmith GitHub — Flagsmith/flagsmith](https://github.com/Flagsmith/flagsmith)

- [PostHog — Feature Flags](https://posthog.com/feature-flags)

- [PostHog GitHub — PostHog/posthog](https://github.com/PostHog/posthog)

- [Split — Now Harness Feature Management](https://www.harness.io/products/feature-flags)

- [DevCycle](https://devcycle.com/)

- [Hypertune](https://www.hypertune.com/)

- [Eppo](https://www.geteppo.com/)

- [Bucketeer GitHub — bucketeer-io/bucketeer](https://github.com/bucketeer-io/bucketeer)

- [Bucketeer — CyberAgent OSS](https://bucketeer.io/)

- [Vercel Flags SDK](https://vercel.com/docs/feature-flags)

- [Vercel Edge Config](https://vercel.com/docs/storage/edge-config)

- [Cloudflare Workers — KV](https://developers.cloudflare.com/kv/)

- [AB Tasty](https://www.abtasty.com/)

- [Optimizely Rollouts](https://www.optimizely.com/products/intelligence/rollouts/)

- [Toss SLASH Conference](https://toss.tech/slash)

- [Kakao Tech Blog](https://tech.kakao.com/)

- [Mercari Engineering Blog](https://engineering.mercari.com/en/)

- [Microsoft — Progressive Experimentation](https://learn.microsoft.com/en-us/azure/architecture/example-scenario/data/feature-flags)

- [Martin Fowler — Feature Toggles](https://martinfowler.com/articles/feature-toggles.html)

- [Statsig — CUPED](https://docs.statsig.com/stats-engine/methodologies/cuped)

- [Eppo — Sequential Testing](https://www.geteppo.com/sequential-testing)

- [Thompson Sampling — Multi-armed bandit](https://en.wikipedia.org/wiki/Thompson_sampling)