The 30x AI Engineer Is Defined by Taste, Not Speed

Introduction — Why This Essay Is Everywhere Right Now
The Core Argument — From 10x to 30x, and the Trap in the Number
When Code Becomes a Commodity — What Gets Expensive
Anatomy of Taste — Three Components
Concrete Training Methods for Building Taste
Spec Writing Is the New Coding
The Ability to Write Evals
Advice for Junior Developers — Balancing Fundamentals and Tools
Anti-patterns — What Not to Become
A Critical View — The Limits of the Taste Discourse
Case Study — Same Tool, Different Outcomes
Transplanting Taste into the Team — From Personal Skill to System
A 12-Week Taste Training Curriculum
Practical Application — What You Can Start This Week
Frequently Asked Questions
Closing
References

Introduction — Why This Essay Is Everywhere Right Now

One of the most widely shared essays in developer communities in the first half of 2026 is the Substack post "How to be a 30x AI engineer with taste". It hit the Hacker News front page and sparked lively discussion on GeekNews, the Korean tech aggregator.

The timing explains much of the resonance. In 2026, AI coding agents like Claude Code, Codex, and Copilot are no longer novelties — they are the default working environment. A frontier generation of models can work autonomously for hours at a stretch, OpenAI models and Codex are now available on AWS, and Microsoft has entered the coding-model race with MAI-Code-1.

The raw speed of producing code is now essentially free for everyone. Yet some engineers use AI tools to lift an entire team, while others use the exact same tools to mass-produce technical debt. Asked what makes the difference, the essay answers with a single word: taste.

In this post I will summarize the argument and then decompose the fuzzy word taste into concrete skills and training methods.

The Core Argument — From 10x to 30x, and the Trap in the Number

The essay makes four central claims.

AI has commoditized code production. The scarcity value of writing code quickly is collapsing.
The bottleneck has moved from production to evaluation. Deciding what to build, and judging whether what was built is good, is the new scarce resource.
The sum of these evaluative abilities is taste, and the gap between engineers with and without it widens as the tools get stronger.
Therefore 30x does not mean typing thirty times faster. It means the compounding value difference created by consistently good judgment.

If the 10x engineer discourse was about one person's coding throughput, the 30x discourse is about leverage. Given the enormous lever of an AI agent, the difference between knowing where to place the fulcrum and not knowing grows with the length of the lever.

  Then: production was the bottleneck     Now: evaluation is the bottleneck

  idea ──> [coding: slow] ──> result      idea ──> [coding: AI, fast] ──> result
               ^                                        ^
        this used to be expensive               this is now cheap
                                                        |
                                        [deciding what to build / is it good]
                                                        ^
                                              the new bottleneck

Of course, the number 30 should not be taken too literally. It is a rhetorical device, not a measurement. We will return to this in the criticism section.

When Code Becomes a Commodity — What Gets Expensive

There is an old pattern in economics: when a good becomes a commodity, the value of its complements rises. When photography became free, the eye for what to photograph became expensive. When writing tools became ubiquitous, editorial judgment about what to write became expensive.

Code is following the same path. Looking at what has actually become expensive in 2026 engineering work:

Commoditized	Rising in value
Writing boilerplate	Designing system boundaries
API integration code	Judging which API to use at all
Generating test code	Knowing what needs testing
Executing refactors	Judging whether a refactor is worth it
Drafting documentation	Selecting what readers actually need
Building prototypes	Designing the hypothesis a prototype tests

The left column is work you can delegate to an AI agent. The right column is work that causes accidents if you delegate it. What the right column has in common is evaluation and judgment — the territory of taste.

One important nuance: this does not mean the left column no longer matters. Without hands-on experience doing the left column, you lose the basis for the judgments in the right column. Commoditization does not destroy the value of knowing how to do the work — it destroys the value of the time spent doing it manually.

Anatomy of Taste — Three Components

Taste sounds elegant but vague. Broken down for practical use, it is a composite of at least three abilities.

1. Product Sense

The intuition for what creates value for users and the business. Concretely, it is the ability to answer these questions quickly and accurately.

If we build this feature, who uses it, when, and why
Can this problem be solved without writing code at all
Where is the 20 percent of scope that delivers 80 percent of the value
What happens if we do not build this now (if the answer is nothing, do not build it)

An AI agent builds what it is told. It never asks whether the thing was worth telling. Put a powerful agent in the hands of someone without product sense, and things that never needed to exist get built at incredible speed.

2. Quality Intuition

The ability to look at generated code and smell something wrong within seconds. AI-generated code usually looks plausible, so you need intuition that pierces surface plausibility. Consider encountering this code.

def get_user_orders(user_id: int) -> list[dict]:
    orders = []
    all_orders = db.query("SELECT * FROM orders")  # loads whole table
    for order in all_orders:
        if order["user_id"] == user_id:            # app-level filtering
            user = db.query(
                f"SELECT * FROM users WHERE id = {order['user_id']}"
            )                                       # query in loop + injection
            order["user_name"] = user[0]["name"]
            orders.append(order)
    return orders

This code works. It may even pass tests. But a trained eye immediately sees three problems: a full table scan, an N+1 query inside the loop, and SQL injection risk from string formatting. A reviewer with taste expected something like this instead.

def get_user_orders(user_id: int) -> list[dict]:
    return db.query(
        """
        SELECT o.*, u.name AS user_name
        FROM orders o
        JOIN users u ON u.id = o.user_id
        WHERE o.user_id = %s
        """,
        (user_id,),
    )

Quality intuition looks at cost structure, not just whether it runs. What happens to this code with a million rows? With a thousand concurrent requests? What will trip up the person modifying this six months from now?

3. Tradeoff Judgment

Every engineering decision is a tradeoff. The third component of taste is knowing which tradeoff is right in this specific situation.

How much testing does this prototype deserve (answer: almost none)
How much testing does this payment module deserve (answer: relentless)
Introduce the abstraction now, or wait for the third duplication
Use the library, or avoid the dependency with 200 lines of your own code

AI tends to output the average of best practices. But best practices are only best on top of context, and context always lives on the human side.

Concrete Training Methods for Building Taste

Taste is not innate. It is built through deliberate exposure and repeated evaluation. Extending the suggestions from the original essay, here are four training routines.

Routine 1 — Read Excellent Code Closely

Just as people who read good prose recognize good sentences, people who read good code recognize good code. The recommended approach:

Pick one well-crafted open source project and read its core modules end to end. SQLite, Redis, Flask, and the Tailwind CSS codebase are frequent recommendations.
While reading, write down why might they have done it this way. The stranger a decision looks, the more likely there is a reason.
Compare the source of two libraries solving the same problem. The connection handling in requests versus httpx is an excellent textbook.

Two hours a week is enough. The key is not volume but the density of the question why.

Routine 2 — Treat Code Review as Evaluation Training

Code review is the best repeated drill for evaluative ability — but only structured review, not drive-by review. I recommend this checklist.

Review checklist (in priority order)
1. Does this change actually solve the problem it claims to solve? (product sense)
2. Could it have been solved smaller? Was any code deleted?
3. Failure paths: how does this code die? What happens when it dies?
4. What breaks at 10x the data and 10x the traffic?
5. What will surprise a colleague reading this in six months?
6. Style and naming (last, and delegate to tooling where possible)

When reviewing an AI-generated PR, add one more step: write down separately why this code looks plausible and why it is actually correct. Training that distinction is the core muscle of review in the AI era.

Routine 3 — Use Postmortems as Textbooks

Incident postmortems are compressed textbooks for tradeoff judgment, because they show the consequences of bad judgment more vividly than anything else.

When reading internal postmortems, trace which decision at which point in time planted the seed of the incident.
From public postmortem collections (the danluu/post-mortems repository on GitHub is famous), read one per week and ask yourself: would I have made the same call in their position?
It does not have to be an outage. Abandoned projects and rewritten architectures teach the same lessons.

Routine 4 — Watch Many Demos and Products

Product sense is built by observing good and bad products in volume.

Watch several new product demos each week. Hackathon demos, showcase videos, and Product Hunt launches are good sources.
Evaluate three things every time: what problem does it solve, why now, and what would I cut if I built it.
The last question matters most. Half of taste is the ability to cut.

Spec Writing Is the New Coding

In the age of AI agents, the practical programming interface is the natural language spec. In 2026, writing CLAUDE.md or AGENTS.md files to feed agents context has become standard practice, and the term prompt engineering has been displaced by context engineering and loop engineering.

Compare a bad spec and a good spec.

Bad spec:
"Build order cancellation"

Good spec:
## Goal
Users can cancel an order before shipping starts.

## Scope
- In: single-order cancel, full refund, optional cancellation reason
- Out: partial cancel, exchanges, cancel-after-shipping (next quarter)

## Invariants
- Refunds must be idempotent. Two identical cancel requests, one refund.
- Order state may transition only from PAID to CANCELLED. Never after SHIPPED.
- Cancel and refund are not one transaction boundary. On refund failure,
  set the order to CANCEL_PENDING and enqueue a retry.

## Done criteria
- Concurrent cancellation test passes
- Refund API timeout scenario test passes
- No schema change to the existing order query API

Notice that the good spec contains every hard decision in the system without a single line of code: idempotency, state transition rules, transaction boundaries, failure behavior. These are exactly the judgments that cannot be delegated to AI, which is why spec writing is the new coding.

Writing good specs is the same skill as understanding problems precisely, so it shares roots with taste training.

The Ability to Write Evals

If specs encode judgment on the input side, evals are the technique of freezing output-side judgment into code. Instead of eyeballing AI output every time, you write down the definition of good in executable form.

The simplest form looks just like familiar tests.

# A simple eval validating an AI-generated SQL migration
def eval_migration(migration_sql: str) -> list[str]:
    violations = []
    lowered = migration_sql.lower()

    if "drop table" in lowered or "drop column" in lowered:
        violations.append("destructive change - requires manual approval")
    if "alter table" in lowered and "concurrently" not in lowered:
        if "create index" in lowered:
            violations.append("index creation without CONCURRENTLY - lock risk")
    if "not null" in lowered and "default" not in lowered:
        violations.append("NOT NULL without default - may fail on existing rows")

    return violations

Evals that grade LLM output itself usually separate the rubric from the grader.

RUBRIC = """
Grade the generated API error message from 1 to 5.
5: cause, fix, and doc link all present; no internal details leaked
3: cause present but the fix is vague
1: stack trace leaked or message is meaningless
"""

def eval_error_messages(samples: list[str]) -> float:
    scores = [grade_with_llm(RUBRIC, s) for s in samples]
    return sum(scores) / len(scores)

An interesting loop appears here. To write the rubric, you need an opinion about what a good error message is. In other words, eval writing is the codification of taste — without taste, you cannot write evals. This is exactly why the essay names evaluation as the new core competency.

Advice for Junior Developers — Balancing Fundamentals and Tools

Juniors occupy the most awkward position in this discourse. Taste comes from experience, and AI is taking away exactly the simple tasks that used to provide that experience.

My view of the balance point:

Stage	Delegate to AI	Do yourself
Years 1-2	Environment setup, repetitive boilerplate	Full debugging process, close code reading, small features end to end
Years 3-5	First drafts, test scaffolding	Design decisions, reviews, leading incident response
Year 6 plus	Most implementation	Specs, architecture, evals, mentoring

Two principles I especially recommend during the junior years.

Always debug yourself. Debugging is nearly the only forcing function for learning how systems actually behave. Before asking AI to fix it, at minimum form a hypothesis and verify it yourself.
Never merge AI output you have not read. Merge only when you can explain every line. The habit of merging unexplainable code destroys the very soil in which taste grows.

A junior who skips fundamentals and learns only tool usage cannot even tell when the tool is wrong. A junior who refuses the tools becomes isolated from a team workflow rebuilt around them. Avoid both.

Anti-patterns — What Not to Become

Anti-pattern 1: The Tool Collector

Switches to every new AI tool, encyclopedic about tool comparisons, but has no deep output to show. Switching costs are larger than they look, and taste is built by pushing one tool to its limits. Evaluating a tool change deliberately once a quarter is plenty.

Anti-pattern 2: The Uncritical Accepter

Ships the first answer AI produces. Fast in the short term, but two compounding costs accrue. First, average code accumulates and the system loses coherence. Second, your own evaluation muscles atrophy. The defense is a habit: ask at least once per AI output, was there another way to solve this.

Anti-pattern 3: The Taste Performer

Floods every PR with subjective style nits and wields taste as power. Real taste shows in distinguishing what matters from what does not. A review that spends thirty minutes on a naming debate while missing a transaction boundary problem is not taste — it is noise.

A Critical View — The Limits of the Taste Discourse

Even if you agree with the argument, fairness requires naming its limits.

First, unmeasurability. Taste is by definition hard to quantify. Claiming that an unmeasurable competency is the core competency is also unfalsifiable, and in hiring and performance reviews it can become a channel through which merely plausible people pass as people with taste.

Second, survivorship bias. Narratives of succeeding through taste are mostly post-hoc interpretation. The people who failed with the same taste do not write essays.

Third, the marketing quality of the number 30x. As with 10x before it, 30x is not a measurement but attention-grabbing rhetoric. On Hacker News, the most upvoted comments were precisely the cynical ones about this number. The healthy reading is to discard the number and keep the direction.

Fourth, taste has a shelf life too. If evaluative ability is also a learnable pattern, we cannot rule out AI eventually becoming good at evaluation itself. Research on automatic eval generation is advancing quickly. Nothing guarantees taste is a permanent moat.

Nevertheless, the observation that evaluation has become scarcer than production right now matches what practitioners see daily. Take the direction while staying aware of the limits.

Case Study — Same Tool, Different Outcomes

To make this concrete, compare two hypothetical engineers given the same task: add CSV export to an internal admin tool. Both use the same AI coding agent.

Engineer A (speed-first)               Engineer B (taste-first)

09:00 Tells the agent "build CSV      09:00 Clarifies requirements: who uses it?
      export"                               -> finance team, monthly, up to 500k rows
09:20 Checks generated code, works    09:20 Judgment: 500k rows cannot be a
09:30 Merges, reports done                  synchronous response; needs streaming
                                            or an async job
  -- two weeks later --               09:40 Writes a one-page spec
                                            (encoding, delimiter, PII column
Finance exports 450k rows                    masking, timeout policy included)
-> request timeout, memory spike      10:00 Hands spec to agent, delegates build
-> incident ticket, hotfix,           10:40 Reviews output: streaming confirmed,
   lost trust                                masking missing -> orders fix
                                      11:30 Merges, done
Total: 30 minutes plus two days
of incident response                  Total: 2.5 hours, zero incidents

Engineer A was not lazy. A used the tool quickly, and the output worked perfectly in the demo environment. The entire difference came from one question asked up front: who uses this, with what data. That question is product sense; concluding that 500k rows rules out a synchronous response is quality intuition; and drawing the line at streaming instead of a full async pipeline is tradeoff judgment.

Note that B's advantage did not come from coding skill. The agent did the coding for both. The entire gap was created outside the code.

Transplanting Taste into the Team — From Personal Skill to System

If taste lives only in one person's head, team quality drops the moment that person goes on vacation. The senior's job is to move personal taste into team systems. Three methods proven effective as of 2026:

First, encode taste into agent context files. CLAUDE.md or AGENTS.md is not mere configuration — it is the channel for injecting team taste into the model.

# Example CLAUDE.md (team taste, codified)

## Our code rules
- Introduce a new abstraction only after the same pattern repeats 3 times
- Every external API call declares a timeout and retry policy
- Error messages must include the cause and the next action
- Destructive migration changes go in a separate PR

## Do not
- Create utility functions with a single call site
- Write tests for the sake of tests (coverage-number theater)
- Explain code with comments (rewrite the code to be readable)

Second, accumulate review comments in reusable form. If you have made the same comment three times, it should become a lint rule, an eval, or a line in the context file. The automation rate of your feedback is your leverage as a senior.

Third, run lightweight decision records (ADRs). Even five lines on why something was not built, or which alternative was rejected, is enough. Half of taste is the record of rejections.

A 12-Week Taste Training Curriculum

Bundling the routines into a schedule, designed to stay under three hours per week:

Weeks	Focus	Concrete activity
1-2	Quality intuition	Close-read a core open source module, 10 notes on design decisions
3-4	Evaluation drill	4 checklist-based reviews, notes separating plausible from correct
5-6	Failure learning	Analyze 4 public postmortems, trace decision branch points
7-8	Product sense	Watch 12 demos, note what you would cut
9-10	Spec writing	Write specs first for 2 real tasks, then delegate to the agent
11-12	Evals	Write eval scripts for 2 recurring tasks, share with the team

After week 12, do not loop back to week 1. Keep only the highest-impact activity from each area and compress it into a personal routine. The key property: every activity produces an artifact (notes, scripts, specs). Training without artifacts can be neither measured nor reviewed.

Practical Application — What You Can Start This Week

Instead of a grand plan, small routines you can start immediately.

Run one review this week using the checklist above, writing one line per item.
Pick one task you delegated to AI and, before merging, force yourself to imagine one alternative approach.
Read one public postmortem and note the decision branch point.
Write a five-line eval script for one task you delegate frequently.
For your next feature request, write a one-page spec in the format above before any code is generated.

All five together take under three hours a week. But compounded over a year, the gap between you and a colleague using the same tools is a gap no tool will close for them.

Frequently Asked Questions

Q. How do I tell taste apart from plain stubbornness?

By whether the basis can be updated. Taste is judgment that updates in the face of new evidence (benchmarks, incidents, user reactions); stubbornness is preference that persists regardless of evidence. Try recalling the last time one of your judgments changed. If you cannot remember, that is a warning sign.

Q. How do I demonstrate taste in an interview?

Show artifacts instead of asserting it. A collection of review comments, specs you wrote, eval scripts, and ADRs recording rejected designs with reasons are the strongest evidence. This is exactly why the 12-week curriculum is designed so its artifacts double as a portfolio.

Q. What if the whole team's taste has hardened in a bad direction?

Context files and evals are double-edged. Bad standards get automated and amplified just the same. That is why standards documents must ship with an update procedure — quarterly review, a channel for objections. Codifying taste is not a one-time task but an operation.

Closing

In the age of AI coding tools, both the anxiety that developer value is disappearing and the optimism that tool mastery alone makes you a super-developer are half right. The tools democratized production — and in doing so, dramatically raised the value of judgment.

Taste is not a mystical gift. It is the accumulation of hours spent reading good code, doing structured reviews, analyzing failures, and evaluating many products. And that taste becomes a team asset when codified as specs and evals.

Speed has now been handed to everyone. One question remains: where will you go with it. The person who has that answer is the one who will be called 30x.

References

How to be a 30x AI engineer with taste (original essay): https://pakodas.substack.com/p/how-to-be-a-30x-ai-engineer-with-a-taste
GeekNews discussion: https://news.hada.io/topic?id=30338
Hacker News: https://news.ycombinator.com/
danluu's public postmortem collection: https://github.com/danluu/post-mortems
Claude Code best practices by Anthropic: https://www.anthropic.com/engineering/claude-code-best-practices
OpenAI Evals framework: https://github.com/openai/evals
Building effective agents by Anthropic: https://www.anthropic.com/research/building-effective-agents
Stanford CS336 Language Modeling from Scratch: https://stanford-cs336.github.io/
Paul Graham, Taste for Makers: https://www.paulgraham.com/taste.html
Stack Overflow Developer Survey (AI tool usage): https://survey.stackoverflow.co/