Skip to content
Published on

Workflow Engines in 2026 — Temporal, Inngest, Trigger.dev, Hatchet, Restate, DBOS, Airflow Deep Dive (Mapping the Durable Execution Era)

Authors

Prologue — Why workflow engines are tier-1 infrastructure again

Back in 2018, workflow engines were "that thing the data team uses instead of cron." Backend engineers did not look at the category every day. By May 2026, the same category has suddenly climbed back into tier-1 infrastructure for every backend org. The reason is brutally simple — AI agents made long-running operations the default, not the exception.

The days when one LLM call took 5 seconds are over. An agent makes six tool calls, waits for user approval, polls another system before moving to the next step. One cycle takes 30 seconds, 5 minutes, sometimes two hours. At the same time retries, idempotency, and replay become non-negotiable: models throw with 1% probability, OpenAI and Anthropic occasionally return 429, and users close their tabs in between. Anything that solves long-running plus resumable plus idempotent in one library is, by definition, a workflow engine — and that is why the category exploded back into relevance in 2026.

This article maps all twelve major workflow engines as of May 2026 — Temporal, Inngest, Trigger.dev, Hatchet, Restate, DBOS, Cadence, Airflow, Prefect, Dagster, Argo, Conductor. What is the same and what is different, who to pick for AI agents vs payments vs ETL vs onboarding, and what production teams in Korea and Japan are actually running.


Chapter 1 · The 2026 map of workflow engines

One-line summary first. Workflow engines have split into two camps. One is backend business logic — Temporal, Inngest, Trigger.dev, Hatchet, Restate, DBOS, Cadence, Conductor. The other is data engineering — Airflow, Prefect, Dagster, Argo. Both camps share the "DAG execution" idea, but their primary users, interfaces, and operational models diverge.

Coordinates as of May 2026.

EngineCampCore ModelBackendPrimary SDKsLicense
TemporalBusiness logicWorkflow as code, event sourcingCassandra, Postgres, MySQL, SQLiteGo, Java, TS, Python, .NET, PHP, RubyMIT
InngestBusiness logicEvent-driven, serverless-friendlySelf-hostedTypeScript, Python, GoSource Available
Trigger.dev v3Business logicTask as code, Vercel-style DXPostgres, self-hostedTypeScriptApache 2.0
HatchetBusiness logicPostgres-backed queue plus workflowsPostgresTypeScript, Python, GoMIT
RestateBusiness logicVirtual Objects, deterministic execSelf (Rust)TypeScript, Java, Kotlin, Go, Rust, PythonBSL transitioning to Apache 2.0 (4 years)
DBOSBusiness logicDB-resident control planePostgresTypeScript, PythonMIT
CadenceBusiness logicTemporal ancestorCassandra, MySQLGo, JavaApache 2.0
Conductor (Orkes)Business logicJSON DSL, microservice orchestrationRedis, Cassandra, PostgresPolyglot via HTTPApache 2.0
Airflow 3Data engineeringPython DAG, scheduler decoupledPostgres, MySQLPythonApache 2.0
Prefect 3Data engineeringPythonic flows, dynamic DAGPostgres, SQLitePythonApache 2.0
Dagster 1.10Data engineeringAsset-centric, not DAG-centricPostgres, SQLitePythonApache 2.0
Argo WorkflowsData engineeringk8s-native, YAML/CRDsetcd via k8sYAML, PythonApache 2.0

Three takeaways.

First, the business-logic camp is where the explosion is. Long-running backend logic — AI agents, payments, onboarding, subscriptions — used to be cobbled together with BullMQ or SQS plus Lambda. Between 2024 and 2026 the industry crystallized around the view that this pattern is too fragile to keep rewriting. So Temporal, Inngest, Trigger.dev, Hatchet, Restate, and DBOS all surged at once.

Second, the data-engineering camp is in steady state. Airflow 3.0 in October 2024 finished the scheduler-executor split, Prefect 3.0 in Q3 2024 nailed dynamic DAGs, and Dagster finished its asset-centric differentiation. Not that there are no new entrants — just that positions within the category are largely settled.

Third, "workflow as code" is now the default design pattern. Workflows in 2018 were defined in JSON, YAML, or visual editors (Step Functions, Airflow DSL, Conductor JSON). Workflows in 2026 are defined in Go, TS, or Python. IDE support, type safety, testability, readable diffs, and code review all just work — even Conductor, which still ships JSON DSL, now bundles SDK layers.


Chapter 2 · What is durable execution — restart, retry, idempotency, replay

Before comparing engines, let us unpack the single word that ties this category together — durable execution. The short definition: "an execution model in which all workflow progress is persisted to external storage, so that even if the process dies and is restarted, execution resumes exactly where it stopped." Four properties follow.

  1. Crash resumption. If a worker OOMs or its host reboots, a fresh worker reads the durable event log and continues the same workflow.
  2. Retries. Each activity or task inside a workflow is automatically retried on failure. Backoff, max attempts, and timeout are declared, not hand-rolled.
  3. Idempotency. For a given workflow ID, repeated execution of the same activity yields the same result. Workflow ID plus step ID usually serves as the idempotency key.
  4. Replay. Workflow code must be deterministic. Replaying the same event log must produce the same decisions. So instead of calling Math.random() or new Date() directly, you use deterministic versions provided by the SDK.

Why does this matter? Picture the usual queue-plus-worker setup (BullMQ, Sidekiq, SQS plus Lambda, RabbitMQ). For a four-step onboarding workflow — "user signed up, send welcome email, create Stripe customer, sync Salesforce" — practically every backend team has built this pattern by hand. Four queues, each worker pushing to the next, failed messages dropped into a DLQ, an operator picking from the DLQ and reprocessing. It works, but the same bugs reappear every time:

  • "Stripe customer got created, then the worker died right before Salesforce sync — where do we resume?"
  • "An email got sent twice — how do we prevent duplicates?"
  • "How do we visualize progress per workflow ID?"
  • "Onboarding grew to five steps, but how do we finish the in-flight workflows that started under the old definition?"

Durable execution engines answer all of this at the library or platform layer. That is why they exploded into payments, onboarding, subscriptions, and AI agents.


Chapter 3 · Temporal — the category standard

Start with Temporal. Founded in 2019 by Maxim Fateev and Samar Abbas — the people who built Cadence at Uber — it is effectively Cadence Generation 2. As of May 2026 it is the de facto standard of the category, running in production at Snap, DoorDash, Datadog, Stripe, Coinbase, Box, parts of Netflix, and HubSpot.

Core model: workflow functions equal deterministic code. Activities equal code with side effects. Every decision inside a workflow function — branches, sleeps, signal waits — is recorded as an event log, and workflow code can replay that log to reconstruct current state. So workflow functions must not make any external calls directly; everything that touches the outside world is delegated to activities.

Minimal Go example.

package main

import (
    "context"
    "time"

    "go.temporal.io/sdk/workflow"
)

func OnboardingWorkflow(ctx workflow.Context, userID string) error {
    ao := workflow.ActivityOptions{
        StartToCloseTimeout: 10 * time.Second,
        RetryPolicy: &temporal.RetryPolicy{
            MaximumAttempts: 5,
            InitialInterval: time.Second,
            BackoffCoefficient: 2.0,
        },
    }
    ctx = workflow.WithActivityOptions(ctx, ao)

    if err := workflow.ExecuteActivity(ctx, SendWelcomeEmail, userID).Get(ctx, nil); err != nil {
        return err
    }
    if err := workflow.ExecuteActivity(ctx, CreateStripeCustomer, userID).Get(ctx, nil); err != nil {
        return err
    }
    if err := workflow.ExecuteActivity(ctx, SyncSalesforce, userID).Get(ctx, nil); err != nil {
        return err
    }
    return workflow.Sleep(ctx, 24*time.Hour) // deterministic sleep — resumes 24h later
}

The miracle is that workflow.Sleep(24*time.Hour) actually sleeps for 24 hours. The worker process can die and come back, infrastructure can be redeployed in the meantime — when something wakes the workflow up 24 hours later, the SDK replays the event log and resumes from this exact line. That is the essence of durable execution.

Temporal's strengths — broad polyglot SDK coverage (Go, Java, TS, Python, .NET, PHP, Ruby all GA), seven years of production references, the largest community in the category, and a solid SLA on Temporal Cloud. Weaknesses — needs its own backend (Cassandra, Postgres, MySQL, or SQLite), and the determinism constraints are not intuitive at first, producing a steep learning curve. Pricing is also non-trivial — Temporal Cloud charges per action and per event, so heavy workloads see steep invoices.

When to pick Temporal: long-running, mission-critical workflows; polyglot environments; payments, billing, and reconciliation domains where correctness is absolute; teams that already have ops headcount and accept owning a stateful backend.


Chapter 4 · Inngest — TypeScript-first, serverless-friendly

If Temporal sits at the "polyglot plus own infrastructure" pole, Inngest occupies the opposite axis — TypeScript first, serverless friendly. Founded in 2021, headquartered in San Francisco. CEO Tony Holdstock-Brown is ex-GraphCMS and SoundCloud. Core insight: "Backends running on Vercel, Cloudflare, and AWS Lambda have short per-invocation runtimes. Workflows must be redesigned for that reality."

Inngest's model is event-driven step functions. You write workflows as JS functions, but each step (step.run, step.sleep, step.waitForEvent) is invoked separately by the Inngest backend. The code looks like this.

import { Inngest } from "inngest";

export const inngest = new Inngest({ id: "my-app" });

export const onboarding = inngest.createFunction(
  { id: "user-onboarding" },
  { event: "user/signed-up" },
  async ({ event, step }) => {
    await step.run("send-welcome-email", async () => {
      await sendWelcomeEmail(event.data.userID);
    });

    await step.run("create-stripe-customer", async () => {
      await createStripeCustomer(event.data.userID);
    });

    await step.sleep("wait-1-day", "1d"); // works even on serverless

    await step.run("send-day-1-followup", async () => {
      await sendDayOneFollowup(event.data.userID);
    });
  }
);

How does this work? When step.run fires, Inngest executes the callback and persists the result in its backend. When the function reaches the next step, the function returns; Inngest later makes a fresh HTTP call to the same function for the same workflow instance. On that second call, the SDK sees the cached result of step 1 and only runs step 2. Each invocation stays short, so it fits Vercel Edge Functions and Cloudflare Workers cleanly.

Strengths — TS first, plays nicely with Vercel and Cloudflare, a polished local dev UI, generous free tier (50k steps per month). Weaknesses — polyglot story is thin (TS, Python, Go only), determinism constraints are softer than Temporal but bad step segmentation kills performance, and the company is younger (2021) so enterprise trust still trails Temporal.

When to pick Inngest: TypeScript-only SaaS, serverless hosting (Vercel, Cloudflare, Netlify, Render), event-triggered workflows, AI agents that run for "a bit long but not extremely long."


Chapter 5 · Trigger.dev v3 — Vercel-style DX

Founded in 2022 in London. The original product looked like Inngest, but a full rewrite for v3 in 2024 shifted the model significantly. v3 is "task as code plus their own worker containers" — you write task code and Trigger.dev runs it in isolated containers on their infrastructure (or self-hosted). That removes the typical serverless time limits — default 1 hour, up to 24 hours per task. The 60-second or 5-minute walls of Vercel Functions are gone.

The code looks like this.

import { task } from "@trigger.dev/sdk/v3";

export const onboardingTask = task({
  id: "user-onboarding",
  maxDuration: 3600, // 1 hour
  run: async (payload: { userID: string }) => {
    await sendWelcomeEmail(payload.userID);

    const stripeCustomer = await createStripeCustomer(payload.userID);

    await wait.for({ seconds: 86400 }); // sleep 1 day

    await sendDayOneFollowup(payload.userID);

    return { stripeCustomer };
  },
});

The DX move is "push code Vercel-style and the platform auto-builds and registers triggers." No worker process to babysit if you use Trigger.dev Cloud. Another big v3 addition is the Realtime API — clients can subscribe live to workflow progress. That is decisive for AI-agent UIs where ChatGPT-style streaming progress matters.

Strengths — the smoothest DX of the category (the Vercel CLI vibe), the ability to run a 1-hour task without contortions, the Realtime API, and Apache 2.0 self-hosting. Weaknesses — TS only at scale (Python and Go SDKs entered beta in 2025 but production share is small), and the v2-to-v3 migration was almost a rewrite, which burned v2 users.

When to pick Trigger.dev: TS full-stack SaaS, indie devs, AI-agent side projects, teams that want Vercel-grade DX, single-task runtimes between 5 minutes and 1 hour, and products that need to stream task progress to a client UI (AI agents, long analyses, code generators).


Chapter 6 · Hatchet — the Postgres-backed newcomer

Hatchet is a 2024 Y Combinator W24 alum. Founders Alexander Belanger and Gabriel Ruttner came from Porter. Their thesis is direct — "Postgres already does queues, message brokers, KV, and transactions well. Don't drag in new infrastructure (Kafka, Redis, Cassandra) — use Postgres as the control plane."

Hatchet has two layers. The task queue (Hatchet Queue) plus the workflow engine (Hatchet Workflows). The queue runs on top of SELECT ... FOR UPDATE SKIP LOCKED in Postgres — a distributed job dispatcher — and workflows are a DAG engine layered on top.

TypeScript example.

import { Hatchet } from "@hatchet-dev/typescript-sdk";

const hatchet = Hatchet.init();

const onboarding = hatchet.workflow({
  name: "user-onboarding",
  on: { event: "user:signed-up" },
});

onboarding
  .step({ name: "send-email" }, async (ctx) => {
    await sendWelcomeEmail(ctx.input.userID);
    return { sent: true };
  })
  .step({ name: "create-stripe", parents: ["send-email"] }, async (ctx) => {
    return createStripeCustomer(ctx.input.userID);
  })
  .step({ name: "sync-salesforce", parents: ["create-stripe"] }, async (ctx) => {
    return syncSalesforce(ctx.input.userID);
  });

Strengths — self-hosting is trivial if you already run Postgres. No RabbitMQ, no Kafka, no Redis. Throughput is good — 10k+ tasks per second on a single node as of May 2026 — and pricing is sensible (a generous free tier on Hatchet Cloud). Weaknesses — the company is young (2024), so production track record is short. SDKs cover TS, Python, and Go but no Java or .NET. And it stops short of Temporal-grade deterministic replay — task idempotency is guaranteed, but code-level determinism is on the user.

When to pick Hatchet: you already run Postgres and refuse to add another infrastructure component; self-host first; AI agents, payments, onboarding — anything backend; you trust a YC newcomer to scale with you.


Chapter 7 · Restate — Rust core, Virtual Objects

Restate launched in 2023 from Berlin, co-founded by Stephan Ewen, a former Apache Flink PMC member. The core is written in Rust, and the design philosophy is — "unify durable execution and the actor model in one runtime." The key abstraction is the Virtual Object — a distributed stateful entity that guarantees serialized execution per key, while providing durable state and deterministic behavior in one bundle.

The code looks like this.

import * as restate from "@restatedev/restate-sdk";

const cart = restate.object({
  name: "Cart",
  handlers: {
    addItem: async (ctx: restate.ObjectContext, item: string) => {
      const items = (await ctx.get<string[]>("items")) ?? [];
      ctx.set("items", [...items, item]);
    },
    checkout: async (ctx: restate.ObjectContext) => {
      const items = (await ctx.get<string[]>("items")) ?? [];
      const orderId = ctx.rand.uuidv4();
      await ctx.run("charge", () => stripeCharge(items));
      await ctx.run("ship", () => shipItems(items, orderId));
      ctx.clear("items");
      return orderId;
    },
  },
});

restate.endpoint().bind(cart).listen();

Here ctx.run("charge", ...) is the unit of deterministic, idempotent execution. ctx.rand.uuidv4() returns the same value on replay. ctx.get and ctx.set persist durable state keyed by the object key.

Strengths — the actor model feels natural for state-heavy domains (IoT, gaming, session management), Rust core keeps latency low (p99 under 5ms in their 2026 benchmarks), and self-host operations are simple (single binary plus RocksDB). Weaknesses — you have to be comfortable with the actor paradigm. SDK coverage has grown to TS, Java, Kotlin, Go, Rust, and Python, but the community is smaller than Temporal or Inngest. And the BSL license — Apache 2.0 after four years — bothers strict OSS purists.

When to pick Restate: stateful workflows (carts, sessions, game matchmakers, IoT device state machines), low-latency workflows, teams wanting single-binary self-hosting, and Rust-friendly infra organizations.


Chapter 8 · DBOS — Postgres as the control plane

DBOS was co-founded in 2022 by DB researchers from MIT, Stanford, and CMU (Mike Stonebraker, Matei Zaharia, and Joe Hellerstein advise). Their thesis is deliberately provocative — "a database can replace the traditional OS. Use Postgres as the control plane and you get workflows, messaging, KV, and transactions in one shot."

Code is decorator-based.

from dbos import DBOS, WorkflowContext

@DBOS.workflow()
def onboarding(ctx: WorkflowContext, user_id: str) -> None:
    DBOS.invoke(send_welcome_email, user_id)
    customer = DBOS.invoke(create_stripe_customer, user_id)
    DBOS.sleep(86400)  # sleep 1 day
    DBOS.invoke(send_day_one_followup, user_id, customer.id)

@DBOS.communicator()
def send_welcome_email(user_id: str) -> None:
    smtp_send(user_id)

Under the hood @DBOS.workflow() wraps execution in Postgres transactions plus a durable event log. If the function dies, another worker resumes the workflow ID from where it stopped. All progress lives in Postgres. There is no separate workflow backend — your Postgres is the workflow engine.

Strengths — just Postgres (zero added infrastructure), lightweight (the workflow engine is essentially a library), strong academic provenance. Weaknesses — the company and product are still young, throughput is bound to a single Postgres instance (1,000 to 3,000 workflows per second on a single node as of 2026; partitioning lifts that, but at operational cost), and polyglot coverage is thin (TS and Python only).

When to pick DBOS: you want every workload on one Postgres; research, experiments, and small SaaS where you want as few infrastructure components as possible; TS or Python single-stack teams.


Chapter 9 · Cadence — Uber's original, Temporal's parent

Cadence was created by Uber in 2017 and open-sourced. Maxim and Samar branched off to Temporal in 2019, and Cadence stayed inside Uber Ops. As of 2026 it still runs billions of workflow executions per day inside Uber. The API and design closely resemble Temporal — natural, since the same people built both.

The differences — Cadence is more conservative. SDK breadth is narrower (Go and Java dominate), new features land slower. But it has seven-plus years of in-house validation at Uber. External adopters have largely migrated to Temporal, although Uber, parts of Coinbase, and pieces of HashiCorp still run Cadence.

When to pick Cadence: you run an Uber-scale ops team and prefer self-management — in practice nearly all new adoption goes to Temporal.


Chapter 10 · Airflow, Prefect, Dagster — the data-engineering camp

Enough of the business-logic camp. Shift to data engineering, where the coordinates are Airflow, Prefect, and Dagster.

Airflow 3.0 (GA October 2024). Created at Airbnb in 2015 and donated to Apache. As of May 2026 it is the de facto data-engineering standard. Airflow 3.0's biggest change — scheduler, DAG processor, and executor are fully decoupled, making multi-tenant operations easier; the Task SDK split out, making workers lighter. Managed options include Astro, MWAA on AWS, and Cloud Composer on GCP.

from airflow.decorators import dag, task
from datetime import datetime, timedelta

@dag(schedule="@daily", start_date=datetime(2026, 1, 1), catchup=False)
def daily_etl():
    @task
    def extract():
        return fetch_from_source()

    @task
    def transform(data):
        return clean_and_aggregate(data)

    @task
    def load(transformed):
        write_to_warehouse(transformed)

    load(transform(extract()))

daily_etl()

Prefect 3.0 (GA September 2024). Built by PrefectHQ as a Pythonic alternative to Airflow's static DAGs. Workflows can branch dynamically at runtime, not just at compile time. 3.0's biggest changes — a simpler self-host story with Postgres or SQLite backends, plus stronger transactional guarantees.

Dagster 1.10 (Q3 2025). Built by Elementl with an asset-centric model. Where Airflow and Prefect ask "which task runs when," Dagster asks "which asset — table, file, model — gets refreshed how." Data lineage and observability are first-class. In 1.10 the Software-Defined Asset model deepened, and dbt, Snowflake, and Databricks integrations became first-class.

Where the three diverge.

  • Airflow — biggest community, biggest ecosystem, most integrations. Downsides: dynamic DAGs are awkward, operations are heavy, scheduler scalability used to bite (improved in 3.0).
  • Prefect — Pythonic, dynamic, gentle learning curve. Downsides: fewer integrations than Airflow.
  • Dagster — asset model plus lineage plus observability. Downsides: steep learning curve (the asset abstraction takes a beat to click), overkill for simple cron replacements.

When to pick which. Existing data team with Airflow expertise → stay with Airflow. New data team, value Pythonic DX → Prefect. Care about data lineage, observability, and asset-centric thinking → Dagster.


Chapter 11 · Argo Workflows — Kubernetes-native

Argo Workflows came out of Intuit and was donated to CNCF. Every workflow step is a Kubernetes Pod, and the workflow definition itself is a CRD. Heavy use in ML pipelines, CI/CD, and data processing (distinct from Argo CD, but same family).

apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  generateName: ml-training-
spec:
  entrypoint: pipeline
  templates:
    - name: pipeline
      steps:
        - - name: preprocess
            template: preprocess-data
        - - name: train
            template: train-model
        - - name: evaluate
            template: evaluate-model
    - name: preprocess-data
      container:
        image: my-org/preprocess:latest
        command: [python, preprocess.py]
    - name: train-model
      container:
        image: my-org/train:v2
        command: [python, train.py]
        resources:
          limits:
            nvidia.com/gpu: 1

Strengths — k8s-native, each step is a container so any language or runtime is fine, ML and CI/CD friendly. Weaknesses — YAML grows quickly, debugging is harder than code-based workflows, and Pod startup overhead per step is rough on workflows with many short tasks.

When to pick Argo: you already run Kubernetes and want infrastructure unified there; ML and CI and data pipelines with independent containers per step; simple DAGs that read naturally in YAML.


Chapter 12 · Conductor — Netflix to Orkes

Conductor was created at Netflix in 2016 and open-sourced as a microservice orchestration engine. The original creators founded Orkes in 2022 to ship the enterprise edition and Cloud. The model is JSON DSL — workflows live in JSON, and each step is fetched by workers polling over HTTP.

{
  "name": "user_onboarding",
  "version": 1,
  "tasks": [
    {
      "name": "send_welcome_email",
      "taskReferenceName": "send_email_ref",
      "type": "SIMPLE"
    },
    {
      "name": "create_stripe_customer",
      "taskReferenceName": "stripe_ref",
      "type": "SIMPLE"
    },
    {
      "name": "wait_one_day",
      "taskReferenceName": "wait_ref",
      "type": "WAIT",
      "inputParameters": {
        "duration": "1d"
      }
    }
  ]
}

Task types include SIMPLE (worker-polled), HTTP (direct REST), FORK_JOIN (parallel), SWITCH (branch), DECISION, and more. Inside Netflix it powered content-encoding pipelines, payments, and push-notification systems at massive scale.

Strengths — language independence (JSON DSL makes any language a viable worker), a visual editor, and Netflix-scale validation. Weaknesses — JSON becomes unwieldy as workflows grow, and it cuts against the workflow-as-code current. Orkes addresses this by shipping code SDKs, but the base model is still JSON.

When to pick Conductor: polyglot microservice environments where language-independent orchestration matters; operations teams that need a visual editor; Netflix-style massive async pipelines.


Chapter 13 · Which engine to pick — a workload-by-workload decision tree

We have toured all twelve. Time to map which engine fits which workload.

AI agents and LLM pipelines

The hottest workload. One cycle ranges from 5 seconds to 30 minutes, heavy external API dependencies (OpenAI, Anthropic, Replicate, tool calls), retries and idempotency mandatory, and the client UI must reflect progress.

  • First pick — Trigger.dev v3 or Inngest. TS-first, Realtime API (Trigger), step model (Inngest) — both fit LLM step segmentation naturally.
  • Second — Temporal. Python and Go SDKs wrap LLM calls cleanly, and replay-based debugging is unbeatable.
  • Third — Hatchet. If single-infrastructure Postgres is attractive.

Payments, billing, reconciliation

Domains where correctness, auditability, and replayability are absolute. Retry policies, deduplication, and event logs are core.

  • First — Temporal. Seven years of production validation, deterministic replay, polyglot SDKs. Stripe, Coinbase, and DoorDash run it in production exactly here.
  • Second — Cadence. Same model since Temporal forked from it; Uber's internal payments still ride Cadence.
  • Third — Restate. Virtual Objects fit account-or-wallet stateful payments well.

ETL and data engineering

Batch jobs, schedules, dependency management, plus dbt, Snowflake, and Databricks integration.

  • First — Airflow. Standard, biggest integration library. If a data team exists, the choice almost makes itself.
  • Second — Dagster. Asset-centric model, lineage, observability.
  • Third — Prefect. Small Python teams with dynamic-DAG needs.

User onboarding, subscriptions, notifications

Sequential steps, 1- to 30-day sleeps, event triggers.

  • First — Inngest. Event-driven model fits cleanly.
  • Second — Trigger.dev. If you are all-in on TS.
  • Third — Temporal. If you already run it.

ML training and inference pipelines

GPU nodes, container isolation, k8s-native.

  • First — Argo Workflows. k8s-native, container steps, GPU resource declarations.
  • Second — Dagster plus a GPU backend like Modal.
  • Third — Temporal. Less for training itself, more for the meta-orchestration around it.

Microservice orchestration with polyglot stacks

Many languages, JSON DSL natural, visual editor required.

  • First — Conductor (Orkes). JSON DSL plus polyglot workers.
  • Second — Temporal. If SDK breadth solves the language problem.

Small SaaS and minimal infrastructure footprint

Indie or small teams unwilling to add components beyond Postgres.

  • First — Hatchet or DBOS. Both treat Postgres as the control plane.
  • Second — Inngest Cloud. Zero infrastructure to run.

Chapter 14 · Production adoption in Korea and Japan

Enough theory. Who runs what in production — collected from Korea and Japan.

Toss (Korea)

Toss expanded Temporal adoption across 2023 and 2024 inside payments, transfers, and brokerage — domains where correctness is non-negotiable. Public talks (SLASH 2024) covered Toss Securities migrating parts of its order, execution, and settlement pipelines to Temporal and turning retries and idempotency into a shared library. As of May 2026, multiple teams across Toss Group run Temporal Cloud or self-hosted Temporal. Data engineering is on Airflow separately.

Uber Eats Japan

Uber HQ's Cadence backbone extends into Uber Eats Japan. Parts of ordering, dispatch, and payments run on Cadence. Late-2024 internal Uber talks mentioned an in-progress migration of certain domains to Temporal.

LINE and LINE Yahoo

LINE has long run its own workflow stack, but since 2024 LINE Engineering posts have described adopting Argo Workflows and Airflow for some domains. Messaging and push notifications still ride a homegrown async queue (LINE Async), but new payment and reconciliation projects are reportedly evaluating Temporal.

Mercari (Japan)

Mercari's standard for data engineering is Airflow — heavily documented on Mercari Engineering. Backend business logic is gRPC-microservice centric, so Temporal-grade adoption is limited, but specific domains (C2C payments, search indexing) combine Apache Beam, Argo, and homegrown workflow libraries.

ZOZO (Japan)

ZOZO runs Argo Workflows across search and recommendation indexing pipelines (covered on ZOZO Tech Blog). A rare case where all data and ML pipelines are unified on top of Kubernetes.

CyberAgent (Japan)

CyberAgent's many subsidiaries (gaming, media, advertising) mean tool diversity is high. Temporal in ad bidding, Cadence in some game backends, Airflow and Dagster mixed for analytics. One of the most published Japanese stacks in this space.


Chapter 15 · Wrap-up — five lenses on workflow engines in 2026

  1. AI agents dragged the category back into tier-1. Once long-running, resumable, idempotent work — from 5 seconds to 30 minutes — became daily life for every backend, hand-rolled queues and workers stopped being a real answer.

  2. Workflow as code is the standard. JSON and YAML DSLs persist where visual editors are non-negotiable (Conductor, Argo), but elsewhere Go, TS, and Python code are first-class citizens. IDEs, types, tests, diffs, and reviews all work — the conclusion is mechanical.

  3. The two camps blur. Business logic and data engineering remain separate, but AI and ML pipelines have created a gray zone in between. Teams build ML orchestration on Temporal; teams write backend business workflows on Dagster.

  4. Minimizing infrastructure components is a strong signal. Hatchet and DBOS rise on the thesis that "Postgres is enough." The pull is to avoid running Kafka, Redis, and Cassandra separately. Restate's single-binary self-host fits the same arc.

  5. The pick is, in the end, about your team's shape. Temporal is often the "most correct" answer, but Inngest or Trigger.dev fits TS-only SaaS naturally, Argo fits k8s shops, and Airflow remains the standard inside data teams. "The right engine for your team right now" beats "the best engine in the abstract."

In 2018 we did everything with cron, queues, and workers. In 2026 we have added a durable execution layer on top. Before writing the next line — if this workflow needs to keep running a year from now, that is the workflow engine's job, not yours.


References