- Published on
Modern Code Review and Merge Pipelines — PR, Merge Queue, Stacked PRs, Monorepo, AI Review, Trunk-Based, Husky, Semgrep Deep Dive (2025)
- Authors

- Name
- Youngju Kim
- @fjvbn20031
Code review is half the job — why do we barely talk about it?
Assume an engineer reviews 3–5 PRs per day. Five days a week, fifty weeks a year, and you are up to 1,000 PRs annually. Reading, judging, and suggesting on other people's code may exceed time spent writing new features. Yet we say remarkably little about it. "I balloon PRs because I dread reviews," "I sat unapproved for two weeks," "a reviewer missed the bug that caused the outage," "AI review is spam" — every day.
2025 is reshaping the landscape. Cursor, Copilot Review, CodeRabbit, and Greptile ushered in the "first-pass review is machine" era. Graphite, Sapling, and Jujutsu pushed Stacked PRs into mainstream workflow. Merge Queue became a GitHub native feature. Monorepo tools (Nx, Turborepo, Moon, Bazel, Buck2) had a generation change. Trunk-based Development moved from "ideal" to "default."
This post dissects 2025 code review and merge pipelines.
A continuation of the Platform Engineering and Observability posts. Platform is "self-service delivery"; code review is the "quality gate for code change."
Part 1. PR Sociology — How Not to Be a Blocker
1.1 Why reviews hurt
- Too big — nobody reviews a 1,000-line PR properly
- Missing context — the body never explains why the change matters
- Emotional load — "don't do it like this" reads as attack
- Async ping-pong — 24h per round, 1 week elapsed
1.2 Four traits of a good PR
- Small — under 400 lines (research shows peak defect detection)
- Single concern — don't mix refactor with feature
- Context in the body — what, why, how tested
- Self-review first — author comments on their own diff first
1.3 Four reviewer principles (Google Code Review Guide)
- Understand author intent — "better than before" beats perfect
- Principle-based comments — not "I prefer" but "codebase convention"
- Start with questions — "was this intentional?" not assertions
- Blocking vs Nit — prefix with
nit:,question:,blocking:
1.4 Conventional Comments
Prefixes separate blocking intent from emotion.
praise: thorough test cases
nitpick: clearer name - userId -> userIdentifier
suggestion: extracting this into util would help reuse
issue: race condition here - TOCTOU
thought: A/B test needed later?
question: any reason retry count is 3?
Part 2. Code Owners and Reviewer Assignment
2.1 CODEOWNERS file
# /auth/** requires security team review
/auth/** @org/security
/packages/payments/** @org/payments-team
/infrastructure/terraform/** @org/platform
*.md @org/docs
- GitHub and GitLab native
- Combine with branch protection to auto-request required approvers
2.2 Auto reviewer selection
- Pullrequest rotation — GitHub team round-robin
- ReviewBot, Toast — custom algorithms (experience, load)
- Graphite Merge — reviewer recommendation
- Internal tools: suggest "recent committers of changed files" (Facebook mention_bot origin)
2.3 Review load balancing
- Reviews piling on one senior -> that senior becomes a bottleneck
- Distribute: pair reviewer mandate, 1 junior + 1 senior
- Dashboard visibility for "open review count"
Part 3. Merge Queue — 2024–2025 Default
3.1 The problem
- PR A and B both pass CI against main
- A merges first
- B may actually break on the new main (silent semantic conflict)
- -> CI failure discovered only after merge
3.2 What Merge Queue does
- Queue the merge request
- At queue head, simulate build on "current main + this PR"
- On pass, perform the actual merge
- On fail, send back to author
Google and Facebook have run this internally for 10+ years. GitHub native since 2023, default-recommended in 2024.
3.3 Tools
- GitHub Merge Queue (GA 2023) — default choice
- Mergify — Python rule-based, complex policies
- Aviator — stacked PR + merge queue
- Graphite — integrated (Stacked PR + Merge + Code Review)
- Bors NG / Trainium — OSS alternatives
3.4 Batched Merge
Large monorepos batch multiple PRs per merge. On failure, bisect to find the culprit PR. Needed only at Meta/Google scale.
Part 4. Stacked PRs — Splitting Big Changes
4.1 Problem
- Big features accumulate 2,000 lines at once
- Reviewer cannot see it -> rubber-stamp -> bugs
4.2 Solution
Stack changes into multiple PRs, each small. When the leading PR merges, the next auto-rebases onto main.
main <- PR1 (schema) <- PR2 (API) <- PR3 (UI)
Review each PR independently. Manual stack management becomes rebase hell.
4.3 Tools
- Graphite (gt) — dominant in industry, TypeScript/Python ecosystems
- Sapling (Meta 2022 OSS) — Mercurial-based, Meta internal tool
- Jujutsu (jj) (Google 2023 OSS) — Git-compatible, next-gen candidate
- Spr (legacy Facebook tool) — CLI
- ghstack (PyTorch team)
- git-branchless — personal use
4.4 Why Jujutsu matters
- Layers on top of Git repos transparently
- "First-class conflict" — merge conflicts live as commit state
- Powerful revset queries (
jj log -r 'ancestors(@)') - Operation log enables perfect undo
- 2025 Google internal primary plan, external interest exploding
Part 5. Monorepo vs Polyrepo — 2025 Verdict
5.1 Monorepo wins when
- Multiple services/libraries with strong internal dependencies
- Simultaneous changes (schema + API + client) are common
- Tool standardization matters at scale
5.2 Polyrepo wins when
- Services are genuinely independent
- Teams are fully separate
- Build tooling unification cost is too high
5.3 Pragmatic consensus
Google/Meta run 100k-engineer monorepos. Startups often "start small -> merge into monorepo as they grow." 2024–2025 trend: "services in monorepo, OSS libraries in separate repos."
5.4 Monorepo essentials
- Fast build cache (Remote Cache)
- Affected Detection — only build/test changed projects
- Merge Queue — parallel merge of heavy PRs
- Auto Code Owner routing
- Scale-grade Git — partial clone, LFS, VFS
Part 6. Monorepo Build Tools — Nx, Turborepo, Moon, Bazel, Buck2, Pants, Lerna's End
6.1 JavaScript/TypeScript-centric
- Nx — Nx Cloud, graph UI, rich plugins
- Turborepo (Vercel) — fast, simple, pnpm + Next.js team focus
- Moon (Rust) — language-neutral, competes with Turbo
- Lerna — officially deprecated, now maintained by Nx team
6.2 Language-neutral
- Bazel — Google, top performance, top learning cost
- Buck2 (Meta 2023 OSS, Rust) — faster than Bazel
- Pants (Twitter) — strong for Python
- Please — small teams
- Earthly — Dockerfile-like DSL for reproducible builds
6.3 Selection tree
- pure JS/TS monorepo,
<=50 packages -> Turborepo - JS/TS + UI team + plugins -> Nx
- Multi-language (JS/Go/Rust), mid-size -> Moon
- Mega-scale, build engineering investment -> Bazel or Buck2
- Python-heavy -> Pants
6.4 Remote Cache
Common battleground. Turborepo/Nx via Vercel/Nx Cloud, Bazel via BuildBuddy/Remote Build Execution, Buck2 has its own protocol. Teams going from 5 min CI to 30 sec usually did it via Remote Cache.
Part 7. AI Code Review — 2024–2025 Explosion
7.1 What AI does well
- Null safety, off-by-one, unused variables (static analysis range)
- "This function name is ambiguous" style
- Pattern recognition ("this codebase uses X but you used Y")
- Test case suggestions
- Unit test coverage estimation
7.2 What AI cannot do
- Architecture-level judgment
- Business context ("does this matter to customers?")
- Team tacit conventions
- Security context (trust boundary of external input)
7.3 Tools
- CodeRabbit — auto PR summary + comments, free OSS mode
- Greptile — whole-codebase RAG-based context-aware review
- Ellipsis — particularly strong at "auto-generate PR description"
- Cursor BugBot — for Cursor users
- Copilot Review (GitHub 2024+) — Copilot Enterprise
- Qodo (Codium) — Python/JS test generation strength
- Sourcery — Python refactoring
- Graphite AI — reviews stacked PRs
7.4 AI review adoption tips
- Auto-comment only for a week -> team measures valuable-comment rate
- Over 5% spam ratio -> tune (false positives poison review culture)
- Do NOT give AI approval authority — human review mandatory
- Many orgs start with PR summary only (safest entry point)
Part 8. Trunk-Based Development
8.1 Definition
- Everyone works near main (lifespan hours)
- No long-lived feature branches
- Hide features behind flags (LaunchDarkly/Statsig)
- Main is always deployable
8.2 Why it wins
- Fewer merge conflicts
- Shorter deploy cycles -> lower MTTR
- Continuous Integration becomes meaningful
- Required for elite DORA metrics
8.3 Death of Git Flow
- nvie/gitflow (2010) effectively deprecated by its author in 2020
- GitLab Flow, GitHub Flow are simpler alternatives
- Enterprise finance still keeps release branches
8.4 Feature flag development
- Deploy
!=Release separation - A/B test, gradual rollout, kill switch unified
- Untended flags become flag debt — 6-month+ flags need cleanup rituals
Part 9. Git Techniques — Rebase, Squash, Linear History
9.1 Merge vs Rebase debate
- Merge commit — preserves history, chronological
- Rebase — linear, bisect-friendly
- Squash and Merge — one PR = one commit
9.2 Team policies
- Linus Torvalds / Linux kernel: rebase preferred, linear
- GitHub default: Squash and Merge (simplest)
- Google: linear history, rebase assumed
- Meta: Mercurial -> Sapling, rebase native
9.3 No force-push to shared branches
git rebase + git push --force on shared branches is disaster. Defend with --force-with-lease. GitHub now blocks force-push on protected branches by default.
9.4 Conventional Commits
feat(auth): add passkey support
fix(payments): handle stripe timeout
refactor(db): extract repository interface
chore: bump deps
docs: update README
- Auto-generated changelog
- semantic-release for version bump
- Team discipline × tool ecosystem
Part 10. Pre-commit / Pre-push Hooks — Local CI Speed
10.1 Hook managers
- Husky — JS ecosystem standard, Node-only
- lefthook — Go binary, language-neutral, fast
- pre-commit (Python) — language-neutral, global use
- Soft-serve, cog — new experiments
10.2 Essential hook set
repos:
- repo: local
hooks:
- id: lint
name: eslint
entry: pnpm lint --fix
language: system
- id: typecheck
entry: pnpm typecheck
language: system
- id: test
entry: pnpm test:affected
language: system
- id: secrets
entry: gitleaks protect --staged
language: system
10.3 Why hooks get hated
- Slow -> bypassed with
--no-verify-> hooks must finish in 10 seconds - Use
pnpm test:affectedfor changed-file only - Format staged files only via
lint-staged - Build/test on pre-push only; format/lint on pre-commit
Part 11. Static Analysis — Semgrep, SonarQube, CodeQL, ESLint
11.1 Semgrep
- Language-neutral pattern matching (YAML rules)
- Supply-chain, Secrets, Security rule sets
- Semgrep Cloud Platform (SaaS) — fast growth 2023–2024
11.2 SonarQube / SonarCloud
- Quality Gate, Technical Debt, Coverage
- Enterprise standard
- "Clean as You Code" is the recent recommendation
11.3 GitHub CodeQL
- Dataflow-based vulnerability analysis
- Free for OSS, paid for private orgs
- SQL-like query language
11.4 ESLint, Biome, Oxlint
- ESLint — JS/TS standard but slow
- Biome (2023 Rome fork) — Rust, linter + formatter unified, 10–50x faster
- Oxlint — Rust, Biome competitor, ESLint rule compat
- dprint — Rust, language-neutral formatter
11.5 Security-specialized
- gitleaks — pre-commit secret detection
- trufflehog — deep scan
- Syft, Grype — SBOM + vulns
- Trivy — container + IaC scanning
Part 12. CI Speed — Why PR Merge Must Be Under 10 Minutes
12.1 Compound effect
- 1 hour CI -> developer context-switches
- 10 min -> "wait before next task" is viable
- Under 5 min -> flow preserved
12.2 Strategies
- Remote Cache (Nx/Turborepo/Bazel)
- Parallel matrix (5-shard test split)
- Affected-only build
- Docker layer cache
- Warm runners (Depot, BuildJet, Namespace, Blacksmith, RunsOn) — GitHub Actions ARM/X86 + NVMe cache
- Earthly / Docker BuildKit multi-stage
12.3 Flaky tests
- Worst productivity killer
- Retry mechanism + isolated tracking
- Trunk.io flaky test detection — 30-day fail-rate auto-skip
- Test retries are stopgap; root-cause tracking mandatory
Part 13. Practice — Pipelines by Team Size
13.1 5-person team
- GitHub Flow + Squash and Merge
- Minimal CODEOWNERS, 1 required reviewer
- pre-commit hook + ESLint/Prettier
- CI: GitHub Actions + Turborepo
- AI review: CodeRabbit free
13.2 50-person team
- Merge Queue mandatory
- Paid AI review (CodeRabbit/Greptile)
- Internal runbook: "PR under 400 lines"
- Security scan: Semgrep CI + gitleaks
- If monorepo: Nx + Remote Cache
13.3 500+ engineers
- Graphite/Aviator to standardize stacked PRs
- Buck2/Bazel Remote Execution
- CodeQL + SonarQube enterprise
- Trunk-based + Feature Flag enforced
- Dedicated "Dev Productivity" team
Part 14. Checklist 12, Antipatterns 10
Checklist 12
- Average PR size under 400 lines?
- PR merge P50 under 24h?
- CODEOWNERS current and functional?
- Merge Queue blocking semantic conflicts?
- Stacked PRs a normal team workflow?
- Conventional Commits enforced?
- pre-commit/pre-push hooks fast (under 10s) and useful?
- Switched to fast linters like Biome/Oxlint?
- CI under 10 min average?
- Flaky-test detection/isolation system?
- AI reviewer spam rate under 5%?
- Trunk-based + Feature Flag standard?
Antipatterns 10
- Rubber-stamping a 1,000-line PR
- Empty PR title/body
- Long-lived feature branch over 1 month
- Habitual
--no-verify - Merging on AI approval alone
- Concurrent merges without Merge Queue -> silent conflict
- Papering over flaky tests with
if (retryCount < 3) - Unmaintained CODEOWNERS -> ghost accounts as reviewers
- Entire team's reviews funneling to one senior
- Rebase/Squash policy inconsistent -> history chaos
Next post — "The Engineering Blog Era: Technical Writing, RFC, ADR, Design Doc, Blog Operations, Communication"
If code review is one lever, technical writing is the next. RFC, ADR, Design Doc, internal wiki, external blog. Engineers who write well have 10x blast radius.
- Amazon 6-pager culture — why PowerPoint was banned
- Design Doc templates — Google/Stripe public templates
- RFC process — Rust vs Ember vs IETF
- ADR (Architecture Decision Record)
- Internal Wiki — Notion, Confluence, Outline, GitBook
- Engineering Blog ops — Stripe, Shopify, Uber, Airbnb styles
- Changelog and Release Notes
- Slack/Email async communication
- Writing in the LLM era — using AI as a tool while keeping your voice
- Tech influencer economics — when one post changes a career
Code survives between people. The next post looks at that survival strategy.