- Published on
From Issue to Deploy — Building an Automation Pipeline Where AI Reviews and CI/CD Verifies and Ships (2025)
- Authors

- Name
- Youngju Kim
- @fjvbn20031
Prologue — a loop, not a line
In the previous post we covered the parts of AI development automation — which agents exist, how they integrate with GitHub, and how you hand them tickets.
This post assembles those parts into a single pipeline. The goal is this:
A human files one Issue. They step away. A short while later, a verified deployment is live in production. The only thing the human did was press the merge button — the decision.
And more importantly: this is a loop, not a line. Deployment is not the end. When monitoring detects an anomaly, it rolls back automatically, and that incident becomes a new Issue that re-enters the pipeline at its entrance. The system fixes itself.
This post splits that loop into 7 stages, designs the gate for each stage, and implements all of it with GitHub Actions. The core principle is a single one:
"Automate the work, gate the decisions."
Let AI write, review, and deploy code — but for irreversible decisions (production merge, schema changes, security logic), always put a human or a strong automated check in the way.
Chapter 0 · Bird's-eye view of the whole pipeline
The big picture first. The 7 stages and who owns each one.
┌────────────────────────────────────────────────────────────────┐
│ │
│ [S1] Issue ──ai-ready label──▶ AI agent ──▶ Draft PR │
│ ▲ │ │
│ │ ▼ │
│ │ [S2] AI code review │
│ │ (generate≠review) │
│ │ │ │
│ │ ▼ │
│ │ [S3] CI verification │
│ │ lint·type·test·build·E2E │
│ │ │ │
│ │ ┌── fail ──────┤ │
│ │ ▼ ▼ pass │
│ │ AI self-heal [S4] Human Gate │
│ │ (read logs, repush) merge decision │
│ │ │ │
│ │ ▼ │
│ │ [S5] CD: Preview │
│ │ → Canary → Production │
│ │ │ │
│ │ ▼ │
│ │ [S6] Auto-verification │
│ │ smoke·E2E·SLO gate │
│ │ │ │
│ │ ┌── fail ──────┤ │
│ │ ▼ ▼ pass │
│ └──── auto Issue creation ◀ [S7] Monitoring & feedback │
│ (auto rollback + incident) post-deploy observe, │
│ rollback trigger │
│ │
└────────────────────────────────────────────────────────────────┘
| Stage | Name | Owner | Output |
|---|---|---|---|
| S1 | Issue → PR | AI agent | Draft PR |
| S2 | AI code review | AI reviewer (model different from generator) | Inline comments + APPROVE/REQUEST_CHANGES |
| S3 | CI verification gate | CI system | Green light / red light |
| S4 | Human Gate | Human | Merge decision |
| S5 | CD deployment | CD system | Preview → Canary → Production |
| S6 | Auto-verification | Verification system | Promote or roll back |
| S7 | Monitoring & feedback | Observability system | Healthy or auto Issue |
The rest of this post digs deep into each stage, one at a time.
Chapter 1 · The 5 principles of pipeline design
Nail down the principles before implementing. If these wobble, the pipeline becomes dangerous.
Principle 1 — Every stage is fail-closed
When a stage fails or its judgment is uncertain, it stops. It does not let things through. AI review is confused → REQUEST_CHANGES. CI is flaky → red light. Verification is ambiguous → roll back. When in doubt, do not proceed.
Principle 2 — Every stage must be reversible
A PR can be closed, a merge can be reverted, a deployment can be rolled back. If a stage has no "cancel button," that stage must not be automated.
Principle 3 — Every stage must be observable
Logs record who did what, when, and why. Especially what AI did. Automation you cannot debug is a time bomb.
Principle 4 — Idempotency
If the same Issue is triggered twice, you must not get two PRs. If the same commit is deployed twice, it must be safe. Retries must be safe for automation to be safe.
Principle 5 — The trust ladder
Do not automate everything from the start. Automate low-risk, high-frequency work first, and as success-rate data accumulates, widen the automation scope. Auto-merge for doc fixes → auto-merge for dependency bumps → … one rung at a time.
Chapter 2 · S1 — Issue → PR (the generation stage)
This was covered in the previous post, so just the essentials. The ai-ready label is the trigger; the AI agent creates a branch, implements, and opens a Draft PR.
The real goal of this stage is not "writing code" but "producing a reviewable PR." Every stage downstream depends on PR quality.
The conditions for a PR to be "born reviewable"
- Created as a Draft — not mergeable until a human promotes it to "Ready for review."
- Issue link —
Closes #123in the PR body. The Issue closes automatically on merge. - Work summary — the agent records "what and why" in the body. The reviewer's starting point.
- Consistent PR title — like
[AI] fix: .... Downstream stages branch on the title. ai-generatedlabel — for tracking.
Idempotency — preventing duplicate PRs
# If a PR already exists for the same issue, don't create a new one
- name: Check existing PR
id: check
run: |
EXISTING=$(gh pr list --search "in:title #${{ github.event.issue.number }}" --json number --jq 'length')
echo "exists=$EXISTING" >> "$GITHUB_OUTPUT"
- name: Run agent
if: steps.check.outputs.exists == '0'
uses: anthropics/claude-code-action@v1
# ...
Chapter 3 · S2 — AI code review automation (the core stage)
This is the heart of this pipeline. Insert a layer of AI review before human review.
Why AI review before human review
- Noise filter — catch obvious mistakes (missing error handling, type mismatches, convention violations) before a human sees them.
- Saves human review time — humans focus on judgment like "is this the right approach." AI handles the touch-ups.
- 24/7 — even if a PR lands at 3 a.m., it gets reviewed immediately.
Iron rule: generate ≠ review (different model, different agent)
Do not let the agent that created the PR review its own PR. It cannot see the blind spots in its own code. Where possible, review with a different model. If Claude wrote it, have a different tool review it, or at minimum a separate session with a separate prompt.
Generation agent (Claude/Copilot) ──▶ PR
│
Review agent (different model/tool) ──▶ review comments
│
Human ─────────────────────────────▶ final decision
Tool landscape
| Tool | Form | Characteristics |
|---|---|---|
| CodeRabbit | GitHub App | Auto-reviews every PR, configured via .coderabbit.yaml, summary + inline |
| Greptile | GitHub App | Review based on full-codebase context |
| GitHub Copilot review | Built-in | Copilot can be designated as a PR reviewer |
| Claude Code Action | Actions | Review via @claude or a workflow, strongly customizable |
What AI review catches well / poorly
| Catches well | Catches poorly |
|---|---|
| Missing null / error handling | "Is this the right architecture" |
| Convention / naming violations | Whether business requirements are met |
| Obvious security patterns (SQLi, hardcoded secrets) | Subtle domain-specific bugs |
| Test coverage gaps | Subtle performance trade-offs |
| Common bug patterns | The team's implicit context |
→ This is exactly why the human gate (S4) is needed. AI review does not replace humans — it reduces the human's burden.
Implementation — review with Claude Code Action
name: AI Review
on:
pull_request:
types: [opened, synchronize, ready_for_review]
jobs:
ai-review:
# only when not a draft — don't review work-in-progress PRs
if: github.event.pull_request.draft == false
runs-on: ubuntu-latest
permissions:
contents: read
pull-requests: write
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0
- uses: anthropics/claude-code-action@v1
with:
anthropic_api_key: ${{ secrets.ANTHROPIC_API_KEY }}
prompt: |
Review this PR. Focus on:
- Correctness: logic bugs, edge cases, missing error handling
- Security: input validation, secrets, permissions
- Tests: is coverage sufficient for the change
- Conventions: does it follow CLAUDE.md
Leave issues as inline comments on the relevant lines.
At the end, state APPROVE or REQUEST_CHANGES with a summary.
Mark trivial style points with nit: so a human can ignore them.
Making AI review a "required check"
If review only leaves comments, it gets ignored. Make it a status check and put it in the branch protection rules — then a merge is blocked outright when it is REQUEST_CHANGES. Configure the review action to report pass/fail through the GitHub Checks API.
Chapter 4 · S3 — The CI verification gate
The stage where a machine verifies the code AI wrote. If CI is weak, this whole pipeline is weak.
Layered checks
fast ──────────────────────────────────▶ slow
lint → typecheck → unit test → build → integration → E2E
│ │ │ │ │ │
seconds seconds sec~min min min min~tens of min
Put the fast checks up front so it fails fast. There's no need to go all the way to E2E to learn something will break at lint.
Making CI "readable by AI"
This is the key part, and it's often missed. If CI failure messages are friendly, AI fixes them itself.
Error: test failed(bad) →Error: expected status 200, got 401 at auth.test.ts:42(good)- Print the failed command, the file and line, the expected and actual values.
- The AI agent reads this log and self-heals in the next commit.
The self-heal loop
PR push ──▶ CI runs ──▶ fail
│
▼
agent reads the CI log
│
▼
identify cause → push a fix commit
│
▼
CI re-runs ──▶ pass
Put a ceiling on this loop — if it's still red after 3 attempts, call a human. Infinite self-healing is a cost bomb.
# On CI failure, give the agent the logs and let it attempt a fix —
# but cap the number of attempts with a label
- name: Self-heal on CI failure
if: failure() && !contains(github.event.pull_request.labels.*.name, 'ai-heal-exhausted')
uses: anthropics/claude-code-action@v1
with:
anthropic_api_key: ${{ secrets.ANTHROPIC_API_KEY }}
prompt: |
CI failed. Read the logs below, fix the cause, and commit.
Don't guess — only fix what the logs point to.
Branch protection — enforce the gate
Settings → Branches → main protection rules:
✅ Require status checks: ci/lint, ci/test, ci/build, ai-review
✅ Require branches to be up to date before merging
✅ Require a pull request before merging
✅ Require approvals: 1+ (S4 human gate)
✅ Dismiss stale approvals when new commits pushed
✅ Do not allow bypassing the above
Chapter 5 · S4 — The Human Gate (the gate a human guards)
The core of automation is, paradoxically, clearly defining "where the human stands guard."
What the human must decide
- The merge decision itself — the responsibility for "this goes into main" is the human's.
- Architecture direction — "is this approach right" is something AI cannot judge.
- Whether business requirements are met — did this genuinely satisfy the Issue's intent.
- Security and data-sensitive changes — areas where the cost of a mistake is high.
Things that make human review fast
For S4 not to become a bottleneck, when a human receives a PR, it should already have:
- ✅ AI review done (all touch-ups handled)
- ✅ CI green (machine verification passed)
- ✅ The PR is small (a reviewable size)
- ✅ A good description (what and why)
- ✅ A Preview deployment URL attached (you can actually click through it)
Then human review only needs to do "judgment." It's done in 5 minutes.
Auto-merge — when is it OK
GitHub's auto-merge means "merge automatically once all required checks pass and required approvals are complete." It's risky, but for certain change types it's safe.
| Auto-merge OK | Auto-merge forbidden |
|---|---|
| Doc / comment fixes | Application logic |
| Dependency patch bumps (when CI passes) | Schema migrations |
| Lint / format auto-fixes | Security / auth logic |
| Generated type / SDK updates | Infrastructure / IaC |
# If it's a dependency PR and CI passes + AI review APPROVE, enable auto-merge
- name: Enable auto-merge for safe deps
if: |
contains(github.event.pull_request.labels.*.name, 'dependencies') &&
github.event.pull_request.user.login == 'dependabot[bot]'
run: gh pr merge --auto --squash "${{ github.event.pull_request.number }}"
Escalation path
When AI (whether generation or review) is uncertain, it tags a human. It automatically leaves a comment like "this change touches auth logic — needs review by a security owner" and attaches the needs-human label.
Chapter 6 · S5 — CD: Preview → Canary → Production
It's merged. Now, deployment. The key is not deploying everything at once.
Preview deployment — per PR
When a PR opens, deploy to an isolated real environment and give it a unique URL. Vercel, Netlify, and Cloudflare provide this out of the box; for your own infrastructure, implement it with a per-PR namespace.
→ The human reviewer in S4 sees a working artifact, not code. The AI reviewer can also run E2E against the Preview URL.
After merge — progressive delivery
merge to main
│
▼
Canary deploy (5% of traffic)
│
▼ [S6 auto-verification]
smoke test + SLO observation (5~15 min)
│
├── fail ──▶ auto rollback (remove Canary)
│
▼ pass
Production promotion (100% of traffic)
- Canary — only a slice of traffic (5~10%) goes to the new version. Even if a problem hits, the blast radius is small.
- Feature Flag — deploy the code but keep the feature off. Turning it on is a separate decision.
- Environment promotion — dev → staging → prod. Each step is a gate.
Implementing the gate with GitHub Environments
name: Deploy
on:
pull_request:
types: [closed]
branches: [main]
jobs:
deploy-canary:
if: github.event.pull_request.merged == true
runs-on: ubuntu-latest
environment: canary # protection rules: none (automatic)
steps:
- uses: actions/checkout@v4
- run: ./scripts/deploy.sh canary
promote-production:
needs: [deploy-canary, verify-canary]
runs-on: ubuntu-latest
environment: production # protection rules: required reviewers = human gate
steps:
- run: ./scripts/deploy.sh production
Put required reviewers on environment: production and human approval is enforced for production promotion — another Human Gate baked right into the workflow.
Chapter 7 · S6 — Auto-verification
Deploying isn't the end. A machine confirms that what was deployed actually works.
Post-deploy verification layers
| Verification | What | When |
|---|---|---|
| Smoke test | A few core paths (login, health check) | Immediately after deploy |
| E2E | Major user scenarios | Canary stage |
| Synthetic monitoring | Periodic pings from outside (Checkly, Datadog) | Continuous |
| SLO observation | Error rate, latency, availability | During the Canary window |
| Visual regression | UI pixel diffs | Preview/Canary |
The verification gate — promote the Canary or roll back
verify-canary:
needs: deploy-canary
runs-on: ubuntu-latest
outputs:
verdict: ${{ steps.gate.outputs.verdict }}
steps:
- name: Smoke test
run: ./scripts/smoke-test.sh https://canary.example.com
- name: Observe SLO for 10 minutes
id: gate
run: |
sleep 600
ERROR_RATE=$(curl -s "$METRICS_API/error-rate?env=canary&window=10m")
P99=$(curl -s "$METRICS_API/latency-p99?env=canary&window=10m")
# fail if error rate exceeds 1% or p99 exceeds 500ms
if (( $(echo "$ERROR_RATE > 0.01" | bc -l) )); then
echo "verdict=rollback" >> "$GITHUB_OUTPUT"; exit 1
fi
echo "verdict=promote" >> "$GITHUB_OUTPUT"
rollback-canary:
needs: verify-canary
if: failure()
runs-on: ubuntu-latest
steps:
- run: ./scripts/rollback.sh canary
If verification fails → the Canary is removed automatically. Production traffic was only ever 5% affected, and even that is reclaimed immediately.
Chapter 8 · S7 — Monitoring & the feedback loop (closing the loop)
This is where the pipeline goes from a line to a loop. Even after deployment, the system keeps watching.
Auto-rollback triggers
Observation continues even after production promotion. Break the SLO and it rolls back automatically.
- Error budget burn rate — if it burns down fast, roll back.
- Latency spike — p99 exceeds the threshold.
- Core business metric drop — payment success rate, signup conversion rate, etc.
The self-healing loop — an incident becomes an Issue again
This is the core of the loop. When an auto-rollback happens, the incident is turned into a new Issue and sent back to the pipeline's entrance (S1).
name: Monitor & Auto-Rollback
on:
schedule:
- cron: '*/5 * * * *' # check SLO every 5 minutes
workflow_run:
workflows: ["Deploy"]
types: [completed]
jobs:
slo-check:
runs-on: ubuntu-latest
steps:
- name: Check production SLO
id: slo
run: |
BURN=$(curl -s "$METRICS_API/error-budget-burn?env=prod&window=1h")
echo "burn=$BURN" >> "$GITHUB_OUTPUT"
- name: Rollback + file issue if breaching
if: ${{ steps.slo.outputs.burn > 2.0 }}
run: |
./scripts/rollback.sh production
gh issue create \
--title "Auto-rollback: error budget burn rate ${{ steps.slo.outputs.burn }}" \
--label "ai-ready,incident,priority-high" \
--body "Auto-rolled back due to a production SLO violation.
Last deploy: ${{ github.sha }}
burn rate: ${{ steps.slo.outputs.burn }}
The agent should analyze the diff of the rolled-back commit, diagnose the cause,
and submit a fix PR."
This Issue has the ai-ready label attached, so → S1 triggers again. AI analyzes the rolled-back commit, creates a fix PR, AI review → CI → human gate → redeploy. The loop is closed.
What a closed loop means
Issue ──▶ PR ──▶ review ──▶ CI ──▶ merge ──▶ deploy ──▶ verify ──▶ monitor
▲ │
└──────────── incident becomes a new Issue ◀───────────────────────┘
The system takes the problem it created as its own input and fixes it. The human still guards the merge gate, but is freed from the repetitive labor of detection, diagnosis, kicking off the fix, and redeploying.
Chapter 9 · The full implementation — wiring it together with GitHub Actions
The stages so far, as actual files. 5 workflows connected by events.
Workflow map
| File | Trigger | Role |
|---|---|---|
ai-resolve.yml | issues: labeled | S1: Issue → PR |
ai-review.yml | pull_request: opened/synchronize | S2: AI review |
ci.yml | pull_request: opened/synchronize | S3: verification gate |
deploy.yml | pull_request: closed (merged) | S5: Canary → Production |
monitor.yml | schedule + workflow_run | S6/S7: verify, rollback, feedback |
How they connect via events
GitHub Actions has no central orchestrator. Events are the connecting wires.
issue.labeled('ai-ready') ──▶ ai-resolve.yml ──▶ (PR created)
│
pull_request.opened ◀──────────────────────────────────┘
├──▶ ai-review.yml (AI review → report check)
└──▶ ci.yml (verification → report check)
│
[branch protection: all checks + human approval awaited]
▼
pull_request.closed(merged) ──▶ deploy.yml ──▶ (Canary → verify → Production)
│
workflow_run('Deploy' completed) ──▶ monitor.yml │
schedule('*/5') ──▶ monitor.yml ◀─────┘
└──▶ on SLO violation: rollback + issue.create('ai-ready')
│
└──▶ (back to ai-resolve.yml — loop complete)
ai-resolve.yml — S1
name: AI Resolve
on:
issues:
types: [labeled]
jobs:
resolve:
if: github.event.label.name == 'ai-ready'
runs-on: ubuntu-latest
permissions:
contents: write
pull-requests: write
issues: write
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0
- name: Skip if PR already exists
id: dedup
run: |
N=$(gh pr list --search "#${{ github.event.issue.number }} in:body" --json number --jq 'length')
echo "exists=$N" >> "$GITHUB_OUTPUT"
env:
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
- name: Run agent
if: steps.dedup.outputs.exists == '0'
uses: anthropics/claude-code-action@v1
with:
anthropic_api_key: ${{ secrets.ANTHROPIC_API_KEY }}
prompt: |
Issue #${{ github.event.issue.number }}: ${{ github.event.issue.title }}
${{ github.event.issue.body }}
Implement this issue. Create a branch, make the changes, get the tests passing,
and open a Draft PR that includes "Closes #${{ github.event.issue.number }}".
Follow the CLAUDE.md conventions, and summarize what and why in the PR body.
ci.yml — S3
name: CI
on:
pull_request:
types: [opened, synchronize, ready_for_review]
concurrency:
group: ci-${{ github.event.pull_request.number }}
cancel-in-progress: true # cancel the previous run on a new push — saves cost
jobs:
verify:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with:
node-version: 20
cache: pnpm
- run: pnpm install --frozen-lockfile
- run: pnpm lint # fast things first
- run: pnpm typecheck
- run: pnpm test
- run: pnpm build
deploy.yml — S5 (see Chapter 6, essentials only)
name: Deploy
on:
pull_request:
types: [closed]
branches: [main]
jobs:
canary:
if: github.event.pull_request.merged == true
runs-on: ubuntu-latest
environment: canary
steps:
- uses: actions/checkout@v4
- run: ./scripts/deploy.sh canary
verify:
needs: canary
runs-on: ubuntu-latest
steps:
- run: ./scripts/smoke-test.sh https://canary.example.com
- run: ./scripts/observe-slo.sh canary 600 # observe for 10 min
production:
needs: verify
runs-on: ubuntu-latest
environment: production # required reviewers = Human Gate
steps:
- uses: actions/checkout@v4
- run: ./scripts/deploy.sh production
rollback:
needs: [canary, verify]
if: failure()
runs-on: ubuntu-latest
steps:
- run: ./scripts/rollback.sh canary
Repo setup checklist
- Secrets:
ANTHROPIC_API_KEY, deployment credentials. - Environments:
canary(automatic),production(required reviewers designated). - Branch protection (main): include
ci/verifyandai-reviewin required checks. Required approvals 1+. - Labels:
ai-ready,ai-generated,needs-human,incident,dependencies.
Chapter 10 · The decision-gate matrix — what to automate and what the human handles
The real design of a pipeline is "how far do you automate." You set different gates per change type.
| Change type | AI generation | AI review | CI | Human gate | Auto-merge | Deploy |
|---|---|---|---|---|---|---|
| Docs / comments | ✅ | ✅ | ✅ | optional | ✅ possible | automatic |
| Dependency patch bump | ✅ | ✅ | ✅ required | optional | ✅ conditional | Canary automatic |
| Bug fix (narrow scope) | ✅ | ✅ | ✅ | ✅ 1 person | ❌ | Canary → auto-promote |
| Feature addition | ✅ | ✅ | ✅ | ✅ 1+ person | ❌ | Canary → human promote |
| Refactoring | ✅ | ✅ | ✅ + coverage | ✅ 1 person | ❌ | Canary → auto-promote |
| DB migration | ⚠️ AI assists | ✅ | ✅ | ✅ required | ❌ | human-triggered |
| Security / auth logic | ⚠️ AI assists | ✅ | ✅ | ✅ 2 people | ❌ | human-triggered |
| Infrastructure / IaC | ⚠️ AI assists | ✅ | ✅ plan diff | ✅ required | ❌ | human-triggered |
Legend: ✅ automatic / ⚠️ AI assists only, human leads / ❌ not done
How to climb the trust ladder
Start with every cell conservative — all auto-merge off, all deployments human-triggered. Then raise one cell at a time with data:
- Turn on auto-merge for doc PRs for a month → 0 incidents → keep it.
- Turn on auto-merge for dependency bumps → watch the CI pass rate and rollback rate → if stable, keep it.
- Turn on Canary auto-promotion for bug fixes → watch the escape rate (missed bugs).
- …
Do not widen the automation scope without success-rate data. Trust is accrued, not declared.
Chapter 11 · Safety mechanisms & failure modes
The faster the automation, the faster the accidents. Put a defense on each failure mode.
The "AI approves AI" problem
The most dangerous anti-pattern. If the generation agent reviews and approves its own PR, verification is zero. Defenses:
- Generation and review are different tools/models.
- An AI review's APPROVE cannot replace human approval — the "required approval" in branch protection must be a human account.
- Configure bot account approvals to be excluded from the required approval count.
Prompt injection through the pipeline
Issue bodies, PR comments, code, CI logs — all of it is input to the agent. An attacker can plant commands there.
- No auto-triggering from external users' Issues/PRs — only the
ai-readylabel applied by a member triggers. - Minimal-privilege agent tokens — no access to production or secrets.
- Detect workflow file changes — if a PR touches
.github/workflows/, it always requires a mandatory human review.
Cost runaway
- Step ceilings, a self-heal attempt ceiling (3 times).
- Eliminate duplicate runs with
concurrency+cancel-in-progress. - Limit the number of concurrently running agents.
- Daily/weekly cost alarms.
Every stage is fail-closed + rollback
| Stage | On failure | Rollback means |
|---|---|---|
| S1 generation | No PR created, comment on the issue | — |
| S2 review | REQUEST_CHANGES, merge blocked | — |
| S3 CI | Red light, merge blocked | — |
| S4 human | Not merged | — |
| S5 deploy | Halt at the Canary stage | rollback.sh canary |
| S6 verification | Not promoted | Remove the Canary |
| S7 monitor | Auto rollback + filed as an issue | rollback.sh production |
Auditing — who, when, what
Every stage's actions must be recorded as logs. What AI did must be traceable via the commit trailer (Co-Authored-By:), the PR label (ai-generated), and the workflow run history. If you can't answer "why did this get deployed," it's better to turn that automation off.
Chapter 12 · Operations — metrics and gradual trust
Turning the pipeline on isn't the end. You measure and tune.
Metrics to track
| Metric | Meaning | Healthy direction |
|---|---|---|
| Lead time (Issue→deploy) | Time for one lap | ↓ |
| % auto-merged | Share merged without a human | ↑ (but watch the escape rate) |
| AI review precision | Share of AI review flags that are actually valid | ↑ |
| Escape rate | Share of bugs that got through the pipeline | ↓ (most important) |
| Rollback rate | Share of deployments rolled back | low and stable |
| Self-heal success rate | Share of CI failures AI fixed | ↑ |
| Human review wait time | S4 queue wait | ↓ |
Applying DORA metrics to this pipeline
The traditional 4 DORA metrics apply directly — Deployment Frequency, Lead Time, Change Failure Rate, MTTR. The goal of an AI pipeline is to raise Throughput without breaking Stability. If the Change Failure Rate rises, shrink the automation scope.
The bottleneck is almost always S4
Even if 10 agents create 10 PRs in 5 minutes, if human review handles 5 a day, throughput is 5 a day. The real ceiling is review capacity, not generation speed. So the focus of tuning is:
- Strengthen AI review (S2) to reduce the human review burden.
- Split PRs smaller to make review faster.
- Increase auto-merge for low-risk changes (while watching the data) to route around S4.
Gradual trust — climb the ladder with data
Look at the metrics every week. If the escape rate is low and stable → climb one rung of the trust ladder (Chapter 10). If the escape rate rises or the rollback rate spikes → step down one rung. The automation scope is not a fixed value but a dial you adjust with data.
Epilogue — the pipeline is the shape of the team
Once you implement this post's pipeline fully, the way the team works changes.
- Developers type less code and write Issues well and review PRs well.
- Touch-up reviews go to AI, judgment reviews go to humans.
- The repetitive labor of detection, diagnosis, kicking off the fix, and redeploying goes to the system.
- Humans guard the gate of irreversible decisions.
Summed up in three core insights.
-
It's a loop, not a line. Deployment is not the end. When monitoring catches an incident and turns it back into an Issue, the system takes its own problem as input and fixes it.
-
"Automate the work, gate the decisions." Let AI write, review, and deploy code — but for irreversible decisions like merge, schema, and security, put a human or a strong automated check in the way. Fail-closed is the default.
-
Trust is accrued. Don't automate everything from the start; start with low-risk work and climb one rung at a time, watching the metrics (especially the escape rate). The automation scope is a dial you adjust with data.
Paradoxically, the ultimate purpose of all this automation is not to take humans out of the work, but to focus humans on the most important place — judgment and decisions. When the pipeline absorbs the repetitive labor, the team's thinking rises from "how do we build it" to "what and why do we build."
A 14-item checklist
- Have you identified all 7 stages and assigned an owner to each?
- Is every stage fail-closed?
- Does every stage have a rollback/cancel means?
- Are the generation agent and review agent separated (different models)?
- Is AI review wired in as a required status check?
- Are CI failure messages friendly enough for AI to read?
- Does the self-heal loop have an attempt ceiling?
- Does branch protection enforce required checks + human approval?
- Is human approval counted only from human accounts, not bots?
- Are Preview, Canary, and Production separated in stages?
- Does the
productionenvironment have required reviewers? - Does auto-verification (smoke, SLO) gate the Canary promotion?
- On an SLO violation, do auto-rollback + auto Issue creation happen (the loop)?
- Do you track the escape rate and adjust the automation scope with that data?
10 anti-patterns
- The generation agent reviews and approves its own PR.
- An AI review APPROVE counted as a human approval.
- Auto-merging application logic.
- 100% production deploy the moment a merge happens (no Canary).
- No post-deploy verification (no smoke / SLO gate).
- Auto-rollback exists but the incident doesn't come back as an Issue (the loop isn't closed).
- CI failure messages are unfriendly, so AI self-healing is impossible.
- No ceiling on the self-heal loop → cost bomb.
- External users' Issues auto-trigger the pipeline.
- Widening the automation scope without measuring the escape rate.
Next post preview
Candidates for the next post: Building your own AI code reviewer — a custom review bot that learns the team's conventions, Progressive Delivery deep dive — SLO-based auto-promotion with Argo Rollouts and Flagger, Agent orchestration — turning a multi-stage pipeline into a state machine with LangGraph.
"The best pipeline doesn't replace humans. It absorbs all the rest of the repetition so that humans only make decisions."
— From Issue to Deploy, building an automation pipeline, done.