Skip to content

필사 모드: Online Schema Changes on Large Tables — gh-ost, pt-online-schema-change, and Native Online DDL

English
0%
정확도 0%
💡 왼쪽 원문을 읽으면서 오른쪽에 따라 써보세요. Tab 키로 힌트를 받을 수 있습니다.
원문 렌더가 준비되기 전까지 텍스트 가이드로 표시합니다.

Introduction — When a One-Line ALTER Becomes an Outage

On a small table, everybody runs ALTER TABLE with confidence. Adding a column takes less than a second, and creating an index is over in a blink. But the moment you point the same command at an orders table with hundreds of millions of rows, the story changes completely. The command does not finish for minutes or even hours, and during that time a write lock sits on the table while the application starts spitting out timeouts. The ops channel fills with 5xx alerts, someone frantically hits Ctrl-C, and it is already too late.

This post is about avoiding exactly that moment. It covers how to change the schema of a large table without taking the service down, which is to say online schema change. We look at how far MySQL native Online DDL takes you and where it stops, how the tools that grew up to fill that gap (pt-online-schema-change and gh-ost) work under the hood, and what PostgreSQL offers instead, all with ASCII diagrams and real command examples. Finally we cover the operational details that make or break a migration: chunked backfills, throttling, cut-over, rollback, and a practical checklist.

Let me state the conclusion up front. DDL on a large table should be treated not as a single command but as an operational procedure. Knowing the tools is not enough. You have to design load control, a cut-over strategy, and a plan to roll back along with it.

Why Large ALTERs Are Dangerous

Running an ALTER directly against a large table carries three broad risks.

First, locks. Many DDL operations take a metadata lock or a write lock while they rebuild the table. Write transactions that arrive while the lock is held pile up in a queue, and as the wait grows, the connection pool drains until even read queries stall. This is the textbook path by which a single table's DDL spreads into an outage across the whole database.

Second, replication lag. A DDL that ran long on the primary runs just as long on the replicas. Traditional MySQL replication in particular applies DDL serially, so an ALTER that took thirty minutes on the primary produces thirty minutes of lag on the replica. If your service spreads read traffic across replicas, that lag means stale reads or broken replica routing for the duration.

Third, disk and load. DDL that rewrites the table temporarily requires extra disk roughly equal to the original table size. For a 100 GB table you may need another 100 GB during the change. At the same time it burns through I/O and CPU, degrading the performance of every other query on the same server.

+-------------------------------------------------------------+

| How a direct ALTER TABLE turns into an outage |

| |

| ALTER starts |

| | |

| v |

| [acquire metadata / write lock] |

| | |

| +--> write transactions pile up in a queue |

| | | |

| | v |

| | connection pool drains --> reads time out too |

| | |

| +--> same DDL applied serially on replica --> lag |

| | |

| +--> heavy I/O and CPU --> other queries slow down |

| |

| => one table's change spreads into a full DB outage |

+-------------------------------------------------------------+

MySQL Native Online DDL — How Far Does It Go

Online DDL, introduced in MySQL 5.6, lets many changes proceed without a full table lock. The heart of it is two clauses: ALGORITHM and LOCK.

ALGORITHM takes three values. COPY is the old approach that rebuilds the whole table by copying it. INPLACE changes the table in place without copying it. INSTANT only edits metadata in the data dictionary and finishes immediately. With the INSTANT algorithm added in MySQL 8.0, some operations such as adding a column genuinely finish in an instant.

The LOCK clause specifies the level of concurrency allowed during the change. NONE permits both reads and writes, SHARED permits reads only, and EXCLUSIVE blocks both.

-- INSTANT: 8.0+, adding a column finishes immediately (metadata only)

ALTER TABLE orders

ADD COLUMN coupon_code VARCHAR(32) NULL,

ALGORITHM=INSTANT;

-- INPLACE + LOCK=NONE: build an index with no copy, reads/writes allowed

ALTER TABLE orders

ADD INDEX idx_customer_created (customer_id, created_at),

ALGORITHM=INPLACE, LOCK=NONE;

There is one operational habit worth forming here: spell out ALGORITHM and LOCK explicitly. If the change cannot be done with the algorithm you named, MySQL errors out and stops. If you leave them out, MySQL may silently fall back to the COPY approach and take a table lock. Writing ALGORITHM=INPLACE, LOCK=NONE blocks that dangerous fallback before it happens.

Limits of Native Online DDL

Native Online DDL is powerful but not omnipotent. The limits you hit in practice are these.

The set of operations INSTANT supports is narrow. It covers adding a column, changing a column default, renaming a column, and a few others. Heavy operations such as changing the PRIMARY KEY, changing a column type, or changing the character set still go through INPLACE or COPY.

Even INPLACE changes can take a long time. With LOCK=NONE, writes are allowed during the change, but a short metadata lock is needed the moment the change completes, and if a long-running transaction exists at that point, the lock wait can stretch out.

The biggest limit is load control and the ability to abort. Once native DDL starts, it is hard to stop it gracefully midway or to tune its pace. It will not slow itself down when replication lag gets bad. This is exactly where external tools become necessary.

+----------------------------------------------------------+

| Choosing a MySQL native Online DDL algorithm |

| |

| Change type Recommended algo Rebuild? |

| ---------------------- --------------- -------- |

| Add a column (at end) INSTANT No |

| Change a column default INSTANT No |

| Add a secondary index INPLACE No |

| Change a column type COPY Yes |

| Change the PRIMARY KEY COPY Yes |

| Change the character set COPY Yes |

+----------------------------------------------------------+

pt-online-schema-change — The Trigger-Based Approach

Percona Toolkit's pt-online-schema-change (pt-osc for short) implements online change using triggers. The core idea is: create an empty table with the new structure, copy data over from the original slowly, and let triggers keep up with changes that happen in the meantime.

The sequence goes like this. First, create a new table with the same structure as the original and apply the desired ALTER to it. Next, install INSERT, UPDATE, and DELETE triggers on the original so that every incoming change is also reflected in the new table. Then copy the original's existing rows into the new table in small chunks. When the copy is complete, atomically swap the names of the original and new tables (RENAME). Finally, clean up the triggers and the old table.

+--------------------------------------------------------------+

| pt-online-schema-change (trigger-based) |

| |

| 1) create _orders_new + apply the desired ALTER |

| |

| 2) install triggers on the original orders |

| INSERT/UPDATE/DELETE --> propagate to _orders_new |

| |

| application writes |

| | |

| v |

| [ orders ] --(trigger)--> [ _orders_new ] |

| ^ |

| | 3) copy existing rows in chunks |

| +--- backfill: INSERT ... SELECT, 1000 rows each |

| |

| 4) atomic RENAME: swap orders <-> _orders_new |

| |

| 5) drop triggers + drop old table _orders_old |

+--------------------------------------------------------------+

A real command looks like this. You put only the clause, such as ADD COLUMN, into the alter option. The tool prepends the ALTER TABLE keyword for you.

pt-online-schema-change \

--alter "ADD COLUMN coupon_code VARCHAR(32) NULL" \

--host=db-primary.internal \

--user=migrator \

--ask-pass \

--chunk-size=1000 \

--max-load "Threads_running=50" \

--critical-load "Threads_running=120" \

--max-lag=2 \

--check-slave-lag=db-replica.internal \

D=shop,t=orders \

--execute

A few options deserve attention. chunk-size sets how many rows to copy at a time. max-load pauses the copy when the server's count of concurrently running threads passes a level, controlling load. critical-load is a more serious threshold, and crossing it aborts the operation entirely. max-lag and check-slave-lag watch replication lag, slowing the backfill and waiting for lag to recover when the replica falls behind.

Pros and Cons of the Trigger-Based Approach

The advantages are that it works without any binary log access, and that it is a long-proven, stable tool. The drawback is that the triggers insert themselves directly into the original table's write path. Every INSERT, UPDATE, and DELETE now passes through a trigger once more, which adds write latency. It is also awkward on tables that already have triggers, and tables with foreign keys need extra handling. Above all, remember that the RENAME at cut-over time creates a short but distinct lock window.

gh-ost — The Binary Log Approach

gh-ost, built by GitHub, uses no triggers at all. Instead it reads MySQL's binary log (binlog) to keep up with changes. gh-ost attaches to the primary pretending to be a replica, receives binlog events as a stream, and applies those events to the new table (the ghost table).

It works like this. First, create a ghost table _orders_gho with the same structure as the original and apply the ALTER to it. At the same time, gh-ost subscribes to the binlog and applies every change event flowing into the original table onto the ghost table. Separately, it runs a backfill that copies the original's existing rows in chunks. Once binlog application and the backfill catch up, it performs the cut-over and swaps the table names.

+--------------------------------------------------------------+

| gh-ost (binlog-based, no triggers) |

| |

| application writes |

| | |

| v |

| [ orders ] ---> MySQL binlog |

| ^ | |

| | v |

| | gh-ost (subscribes to binlog as a replica) |

| | | |

| | +-------+-------+ |

| | | | |

| | apply binlog events backfill rows (chunks) |

| | | | |

| | v v |

| | [ _orders_gho ghost table ] |

| | |

| +--- 4) cut-over: swap orders <-> _orders_gho |

+--------------------------------------------------------------+

A real command example.

gh-ost \

--host=db-primary.internal \

--user=migrator \

--password=secret \

--database=shop \

--table=orders \

--alter="ADD COLUMN coupon_code VARCHAR(32) NULL" \

--chunk-size=1000 \

--max-load="Threads_running=50" \

--critical-load="Threads_running=120" \

--max-lag-millis=1500 \

--throttle-control-replicas="db-replica.internal" \

--allow-on-master \

--postpone-cut-over-flag-file=/tmp/ghost.postpone \

--execute

One of gh-ost's powerful operational features is interactivity during execution. You can send commands while it runs through a unix socket or panel. For example, you can slow the backfill, force a throttle, or push the cut-over earlier or later.

force a throttle during execution (pause the backfill)

echo "throttle" | nc -U /tmp/gh-ost.shop.orders.sock

release the throttle

echo "no-throttle" | nc -U /tmp/gh-ost.shop.orders.sock

change chunk size on the fly

echo "chunk-size=200" | nc -U /tmp/gh-ost.shop.orders.sock

The postpone-cut-over-flag-file is especially useful. While this flag file exists, gh-ost waits and does not cut over even after the backfill is fully complete. That lets you finish the heavy backfill work during the day in advance, and trigger the actual table swap during low traffic simply by deleting one file.

pt-osc versus gh-ost

The difference between the two tools, in a table.

| Aspect | pt-online-schema-change | gh-ost |

| --- | --- | --- |

| Change tracking | Triggers | Binary log |

| Effect on write path | Direct via triggers | None (asynchronous) |

| Where extra load lands | At write time on primary | On the gh-ost process |

| Control during execution | Limited | Dynamic via socket |

| Postpone cut-over | Hard | Supported via flag file |

| Foreign key handling | Partial via options | Limited direct support |

| Operational maturity | Very long-standing | Proven at GitHub scale |

In short, gh-ost has the edge for tables with very high write load or when you want precise control over cut-over timing. Conversely, pt-osc can be simpler when binlog access is hard or the change is very simple.

Online DDL in PostgreSQL

PostgreSQL follows a different model from MySQL. Much DDL already runs inside a transaction, and thanks to MVCC some changes are inherently less invasive. But PostgreSQL has its own pitfalls, and dedicated syntax to avoid them.

CREATE INDEX CONCURRENTLY

In PostgreSQL, a plain CREATE INDEX takes a write lock on the table and blocks all writes while the index is built. On a large table this is fatal. The remedy is the CONCURRENTLY option. With it, the index is built without blocking writes. The trade-off is that it scans the table twice, so it is slower, and it cannot run inside a transaction block.

-- build an index without blocking writes (run outside a transaction block)

CREATE INDEX CONCURRENTLY idx_orders_customer_created

ON orders (customer_id, created_at);

-- check whether any index was left INVALID after a failure

SELECT indexrelid::regclass AS index_name

FROM pg_index

WHERE indisvalid = false;

-- drop the INVALID index with CONCURRENTLY as well

DROP INDEX CONCURRENTLY idx_orders_customer_created;

When CONCURRENTLY fails, it leaves behind an index in the INVALID state. Because that index is never used by queries yet still occupies disk, you must find and clean it up.

Stopping Lock Pile-Ups with lock_timeout

In PostgreSQL, ALTER TABLE often requires an ACCESS EXCLUSIVE lock, even if only briefly. The problem is that while it waits to acquire that lock, every other query queued behind it is also blocked. The lock wait builds a lock queue, and the lock queue builds an outage.

The key to preventing this is setting a short lock_timeout. If the lock is not acquired within a set time, the DDL fails cleanly, preventing other queries from being blocked indefinitely. You simply retry afterward.

-- if the lock is not acquired within 3 seconds, fail this DDL

SET lock_timeout = '3s';

-- on 11+, adding a column without NOT NULL is instant (no table rewrite)

ALTER TABLE orders ADD COLUMN coupon_code VARCHAR(32);

SET lock_timeout = '3s';

-- adding a column with a DEFAULT is also a metadata-only change on 11+

ALTER TABLE orders ADD COLUMN status SMALLINT NOT NULL DEFAULT 0;

From PostgreSQL 11, adding a column with a default no longer rewrites the entire table; it records the default in the catalog and handles it fast. This was a big improvement. Note, however, that adding a NOT NULL constraint to an existing column still requires a full scan, so you should use the validation-splitting technique covered next.

Splitting Constraint Adds with NOT VALID

Adding a foreign key or CHECK constraint to a large table all at once is risky too. PostgreSQL lets you split this into two steps. First, adding the constraint with NOT VALID applies it only to incoming rows without validating existing ones. Then, during a quiet window, VALIDATE CONSTRAINT checks the existing rows, and that validation takes only a weak lock and does not block writes.

-- step 1: add the constraint skipping existing-row validation (short lock)

ALTER TABLE order_items

ADD CONSTRAINT fk_order

FOREIGN KEY (order_id) REFERENCES orders (id)

NOT VALID;

-- step 2: validate existing rows when quiet (weak lock, writes not blocked)

ALTER TABLE order_items

VALIDATE CONSTRAINT fk_order;

Chunked Backfill — Filling Data Slowly

After changing the schema you usually need to populate the new column with values. A common mistake here is trying to update hundreds of millions of rows in a single UPDATE. That giant transaction summons long locks, enormous undo and WAL, and replication lag all at once. The proper way is to split into small chunks and loop.

-- wrong: updating hundreds of millions of rows in one transaction (never)

-- UPDATE orders SET coupon_code = 'NONE' WHERE coupon_code IS NULL;

-- right: chunk by PK range and repeat

-- (an application/script loop increments batch_start by 1000 each time)

UPDATE orders

SET coupon_code = 'NONE'

WHERE id >= 1000000 AND id < 1001000

AND coupon_code IS NULL;

A backfill loop usually follows these principles. Pick a sensible chunk size (hundreds to a few thousand rows) and put a short rest between chunks. And check replication lag at every chunk, so that when lag passes a threshold the backfill pauses and resumes once it recovers. This keeps the backfill from overwhelming production traffic.

// pseudo-code level chunked backfill loop (with replication lag watch)

func backfill(db *sql.DB, maxID, batch int) error {

for start := 0; start < maxID; start += batch {

for replicationLagSeconds(db) > 2 {

time.Sleep(500 * time.Millisecond) // wait until lag recovers

}

_, err := db.Exec(

"UPDATE orders SET coupon_code = 'NONE' "+

"WHERE id >= ? AND id < ? AND coupon_code IS NULL",

start, start+batch,

)

if err != nil {

return err

}

time.Sleep(50 * time.Millisecond) // short rest between chunks

}

return nil

}

Throttling and Cut-Over

Throttling — Tuning Speed by Watching Load

The core safety mechanism of online schema change is throttling. Throttling means observing server load and replication lag in real time, and automatically slowing or stopping the backfill the moment a danger sign appears. pt-osc's max-load and max-lag, and gh-ost's max-load and max-lag-millis, all handle this.

A good throttle configuration sets two thresholds. One is a soft threshold that slows things down when crossed (pt-osc's max-load). The other is an emergency threshold that stops the work entirely when crossed (pt-osc's critical-load). Replication lag goes under the same watch, so that when a replica starts to fall behind, the backfill slows down to keep the lag from accumulating.

Cut-Over — The Most Dangerous Moment

Cut-over is the moment the new table and the original swap names. It happens after the backfill and change tracking are both complete, and it is usually very short, but this moment alone needs a lock. Cut-over is dangerous because if a long-running transaction is holding the table at that point, the name swap waits on that transaction and builds a lock queue.

So it is best to perform the cut-over during a window with light traffic and no long transactions. gh-ost's postpone-cut-over-flag-file is the very device for this. You finish the heavy backfill in advance and run only the short, sensitive cut-over moment at a time a human deliberately chooses.

+--------------------------------------------------------------+

| Cut-over timeline (the ideal case) |

| |

| backfill in progress =====================> (daytime, throttled) |

| | |

| [waiting to cut over, flag file] |

| | |

| low-traffic window ----------------------> [run cut-over] |

| | short lock |

| v |

| swap orders <-> ghost |

| | |

| clean up the old table |

+--------------------------------------------------------------+

Rollback — Plan to Reverse First

Rollback in online schema change means something different at each stage. Before cut-over, rollback is easy. You simply discard the new table (the ghost table or the _new table) and remove the triggers; the original was never changed, so it reverts cleanly. Stopping gh-ost or pt-osc midway is the rollback at this stage.

After cut-over the story changes. The names have already been swapped and the new structure is in production, so to revert you must apply the change again in the opposite direction. That is why a truly safe migration separates the schema change from the application deploy and splits it into steps that are compatible in reverse too.

A representative pattern is expand-contract. Take renaming a column as an example: first add the new column (expand), have the application write to both columns, backfill the data, move reads to the new column, and finally drop the old column (contract). Because each step deploys independently and can be reversed at each step, you can stop safely no matter where a problem appears.

+--------------------------------------------------------------+

| The expand-contract pattern |

| |

| 1) expand: add the new column (keep the old one as is) |

| 2) dual write: app writes both old and new columns |

| 3) backfill: fill the new column for existing rows, chunked|

| 4) switch reads: deploy app to read the new column |

| 5) contract: drop the old column after it stabilizes |

| |

| => each step deploys independently + per-step rollback |

+--------------------------------------------------------------+

Integration with Migration Tools

Online schema change is usually operated alongside a migration management tool. Tools like Flyway, Liquibase, and golang-migrate manage the version and apply order of migrations, but by default they just run SQL as written. So for large-table changes, instead of issuing the ALTER directly, these tools often wrap a call to gh-ost or pt-osc.

The key is to think of two things as separate. One is version control: which change was applied in which order. The other is the execution strategy: how that change is run safely. The migration tool handles the former, and the online DDL tool handles the latter. It is clean to divide them so the migration tool runs light changes like a safe column add directly, while delegating heavy table rewrites to the online DDL tool.

A Collection of Real-World Pitfalls

Pitfalls you run into again and again in the field.

First, forgetting disk capacity. Every approach that rewrites the table temporarily requires extra disk roughly equal to the original size. Always confirm free space before the change. If the disk fills up mid-change, the work dies at the worst possible time.

Second, ignoring foreign keys. Both trigger-based and binlog-based tools need extra handling on tables with foreign keys. If a child table's foreign key points at the original table, you must review the tool's foreign key options to keep the foreign key from breaking during the name swap.

Third, leaving long transactions in place. If a long-running transaction exists just before cut-over, the cut-over waits on it and builds a lock queue. Check that there are no long-running queries or uncommitted transactions before the change.

Fourth, setting throttle thresholds too loose. You want it to stop when load appears, but if the threshold is too high, the throttle only kicks in after the service has already slowed. It is safer to set thresholds conservatively low at first and raise them while you observe.

Fifth, running straight into production without a rehearsal. Running it once against staging data of the same size lets you gauge duration, disk usage, and load impact in advance. Running it for the first time in production is reckless.

Sixth, in PostgreSQL, not cleaning up the INVALID index left by CONCURRENTLY. A failed CONCURRENTLY quietly leaves a useless index, so always inspect for and clean up INVALID indexes after the work.

Operational Checklist

Check the following before running an online schema change on a large table.

[ ] Classified the change as light (INSTANT/instant) or heavy (rewrite)?

[ ] Can the light change be handled directly by native DDL / migration tool?

[ ] Decided whether to use gh-ost or pt-osc for the heavy change?

[ ] Secured extra disk roughly equal to the original table size?

[ ] Checked for foreign keys / triggers and prepared the options?

[ ] Set throttle thresholds (load / replication lag) conservatively?

[ ] Specified the replica lag watch target in the tool?

[ ] Planned the cut-over for a low-traffic window (using the postpone flag)?

[ ] Confirmed no long / uncommitted transactions just before cut-over?

[ ] Documented the rollback procedure (before / after cut-over)?

[ ] Completed a rehearsal on same-scale staging?

[ ] For PostgreSQL, planned lock_timeout and INVALID index inspection?

[ ] Separated app deploy from schema change with expand-contract?

Closing Thoughts

Changing the schema of a large table is one of the most nerve-racking jobs in database operations. But the source of that tension is clear: locks, replication lag, disk, and irreversibility. Design in advance how you will handle each of these four, and the job that scared you turns into a procedure.

To restate the core: finish light changes fast with native Online DDL and the INSTANT algorithm, and delegate heavy changes to tools like gh-ost or pt-osc, controlling load with chunked backfills and throttling. In PostgreSQL, lean on the native techniques of CONCURRENTLY, lock_timeout, and NOT VALID splitting. And above all, make cut-over a short moment that a human controls, and design migrations that can be stopped at any time using expand-contract.

The tools only assist. What makes a migration safe is, in the end, procedure, rehearsal, and a plan to roll back.

References

- [MySQL 8.0 Reference Manual — Online DDL Operations](https://dev.mysql.com/doc/refman/8.0/en/innodb-online-ddl-operations.html)

- [MySQL 8.0 Reference Manual — Online DDL Overview](https://dev.mysql.com/doc/refman/8.0/en/innodb-online-ddl.html)

- [Percona Toolkit — pt-online-schema-change](https://docs.percona.com/percona-toolkit/pt-online-schema-change.html)

- [gh-ost — GitHub Online Schema Migrations for MySQL](https://github.com/github/gh-ost)

- [PostgreSQL Documentation — CREATE INDEX (CONCURRENTLY)](https://www.postgresql.org/docs/current/sql-createindex.html)

- [PostgreSQL Documentation — ALTER TABLE](https://www.postgresql.org/docs/current/sql-altertable.html)

- [PostgreSQL Documentation — Client Connection Defaults (lock_timeout)](https://www.postgresql.org/docs/current/runtime-config-client.html)

- [Amazon RDS — Working with online DDL operations](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_MySQL.Managing.html)

- [Flyway Documentation](https://documentation.red-gate.com/flyway)

- [Liquibase Documentation](https://docs.liquibase.com/)

- [golang-migrate](https://github.com/golang-migrate/migrate)

현재 단락 (1/273)

On a small table, everybody runs ALTER TABLE with confidence. Adding a column takes less than a seco...

작성 글자: 0원문 글자: 21,914작성 단락: 0/273