Skip to content

필사 모드: PostgreSQL Version Upgrade Strategies — pg_upgrade and Logical Replication

English
0%
정확도 0%
💡 왼쪽 원문을 읽으면서 오른쪽에 따라 써보세요. Tab 키로 힌트를 받을 수 있습니다.
원문 렌더가 준비되기 전까지 텍스트 가이드로 표시합니다.

Introduction

PostgreSQL ships a new major version every year. And each major version has a clearly defined lifetime of five years. Five years after release, all support, including security patches, comes to an end. In other words, as long as you operate a database, major version upgrades are an unavoidable, recurring operational task.

Yet in practice major version upgrades are often pushed back again and again, until they are finally rushed through right before end of life (EOL). The reason is simple: upgrades are scary. Touching a multi-terabyte production database, and the downtime that might occur in the process, is frightening.

This article compares the three representative strategies for a PostgreSQL major version upgrade: the simplest dump/restore, the file-level and fast pg_upgrade, and the near-zero-downtime logical replication approach. We will cover how each works, the downtime trade-offs, the actual command-line procedures, and the pitfalls people commonly fall into. We will also look at considerations for managed environments such as RDS and Aurora.

What this article covers is "major" version upgrades. Going from 15.3 to 15.8 is a minor upgrade, a simple task that is done by swapping the binaries and restarting. That is because the data directory format does not change. But a major upgrade from 15 to 16 changes the internal catalog structure and the data directory layout, so it requires a dedicated strategy.

Why major upgrades are hard

First, you need to understand why a major upgrade is not just a restart. With every major version, PostgreSQL may change the internal format of data files, the system catalog structure, the WAL format, and more.

The key is the `PG_VERSION` file and the catalog version. The data directory contains information about which major version created it, and a server of a newer version cannot directly open a data directory created by a lower version. Minor versions are guaranteed to preserve this format, but major versions are not.

A major upgrade is therefore essentially the task of "moving old-version data into the new-version format." Depending on how that move is done, you end up with dump/restore, pg_upgrade, or logical replication.

[Minor upgrade] 15.3 -> 15.8

Swap binaries + restart -> done (same data format)

[Major upgrade] 15 -> 16

Data must be converted/migrated into the new format

- dump/restore : logically export everything and reload

- pg_upgrade : convert only the catalog, reuse data files

- logical replication: stand up the new version and sync changes live

Overview of the three strategies

1. dump/restore

This is the most classic and simplest approach. You logically extract all data from the old version with `pg_dump` or `pg_dumpall`, and load it into the new version with `pg_restore` or `psql`.

The advantages are clear. It is the safest and most portable, and because the data is completely rewritten, table bloat is naturally cleaned up. It works even when the architectures differ between old and new (for example, 32-bit to 64-bit).

The downsides are just as clear. It is slow. The entire data set has to be inserted row by row and indexes rebuilt, so the time grows explosively in proportion to data size. Even at a few hundred GB, downtime becomes a matter of hours.

2. pg_upgrade

This is the in-place upgrade tool PostgreSQL officially provides. The core idea is to "leave the data files mostly as they are and convert only the system catalog."

In particular, when you use `--link` mode, the new version reuses the old version's data files by hard-linking to them. That means it does not copy multiple terabytes of data. As a result, almost regardless of data size, the upgrade typically finishes within a few minutes.

The downsides are that it must run on the same server (the same file system), and that with `--link`, an immediate rollback to the old version becomes difficult.

3. Logical replication

This leverages the publication/subscription-based logical replication introduced in PostgreSQL 10. You stand up the new-version instance in advance, set the old version as publisher and the new version as subscriber, and sync the data in real time. Once all the initial data is replicated and the changes have caught up, you switch only the application connection over to the new version in a brief moment.

The advantage is that downtime is reduced to seconds or tens of seconds. Because the old version keeps serving until cutover, you also gain ample time to verify. The downside is that the procedure is complex, and items that logical replication does not support (such as automatic sequence syncing) must be handled manually.

Downtime trade-off comparison

The biggest difference between the three approaches ultimately comes down to the trade-off between downtime and operational complexity.

| Item | dump/restore | pg_upgrade (--link) | Logical replication |

| --- | --- | --- | --- |

| Downtime | Very long (scales with data) | Short (usually minutes) | Almost none (seconds to tens of seconds) |

| Data size impact | Very large | Almost none | Only on initial sync |

| Extra disk usage | As much as the new version | Almost none (hard links) | As much as the new version |

| Rollback difficulty | Easy (old version intact) | Hard (with link) | Easy (until cutover) |

| Procedure complexity | Low | Medium | High |

| Version jumps | Free | Adjacent/multi-step possible | Publisher must be 10 or higher |

| Bloat cleanup effect | Yes | No | Yes |

Simplified, the selection criteria are as follows.

Small data (a few GB) with room for downtime

-> dump/restore (simplest, safest)

Large data, but you can take a short maintenance window (minutes)

-> pg_upgrade --link

Almost no downtime allowed / 24x7 service

-> logical replication

pg_upgrade in detail

Now let us look at the most widely used pg_upgrade at the actual command level. The premise is that the binaries of both the old version (e.g., 15) and the new version (e.g., 16) are installed on the same server.

Preparation

First, install the new-version binaries and initialize a new data directory. It must be initialized with the same locale/encoding as the old version.

Old and new data directory paths

OLD_BIN=/usr/lib/postgresql/15/bin

NEW_BIN=/usr/lib/postgresql/16/bin

OLD_DATA=/var/lib/postgresql/15/main

NEW_DATA=/var/lib/postgresql/16/main

Initialize the new data directory (same locale/encoding as old)

"$NEW_BIN/initdb" -D "$NEW_DATA" \

--encoding=UTF8 --locale=en_US.UTF-8

Pre-flight compatibility check (--check)

Before the actual upgrade, you must always verify compatibility first in `--check` mode. This step touches no data at all and reports problems that could block the upgrade.

Run while both servers are stopped

"$NEW_BIN/pg_upgrade" \

--old-bindir="$OLD_BIN" \

--new-bindir="$NEW_BIN" \

--old-datadir="$OLD_DATA" \

--new-datadir="$NEW_DATA" \

--check

Typical problems `--check` reports include removed data types (e.g., the old `abstime`), incompatible extension modules, and extensions not installed on the new version. Do not proceed with the actual upgrade until you have resolved every warning raised here.

The actual upgrade (--link)

Once the check passes, proceed with the actual upgrade. Adding `--link` greatly shortens the time by creating hard links instead of copying data files.

Make sure both the old and new servers are stopped!

"$NEW_BIN/pg_upgrade" \

--old-bindir="$OLD_BIN" \

--new-bindir="$NEW_BIN" \

--old-datadir="$OLD_DATA" \

--new-datadir="$NEW_DATA" \

--link

When the upgrade finishes, two scripts are generated. One re-collects statistics (the new version does not carry over planner statistics, so ANALYZE is required), and the other deletes the old-version data directory.

After starting the new-version server, always re-collect statistics

"$NEW_BIN/vacuumdb" --all --analyze-in-stages

Delete old-version data only after verification is complete

./delete_old_cluster.sh

Key caveats of --link mode

`--link` is powerful, but it is just as dangerous. The most important rule is that **the moment you start the new-version cluster, the old version is no longer safe**.

Because they share the same data files through hard links, when the new version starts writing, the old version's data changes along with it. That is, once you start the new version normally even once and a write occurs, rolling back to the old version becomes practically impossible.

[The danger zone of --link mode]

pg_upgrade --link completes

|

v

New version starts (write occurs) --- no old-version rollback past this point

|

v

Verification OK -> production

Verification NG -> must restore from backup (cannot just start the old version)

For that reason, before a `--link` upgrade you must secure a trustworthy backup (or snapshot). If you have disk headroom, using the default copy mode instead of `--link` is also a safe choice. Copy mode is slower, but the old version stays intact, so rollback is free.

One more thing: with `--link`, the old and new data directories must be on the **same file system**. Hard links cannot cross file system boundaries.

Zero-downtime upgrade with logical replication

If your environment can barely tolerate any downtime, logical replication is the answer. The big picture is as follows.

[Logical replication upgrade flow]

Old (15) Publisher New (16) Subscriber

| |

| 1. create publication |

| ---------------------------> | 2. create schema in advance

| | 3. create subscription

| 4. initial data copy -----> |

| 5. stream changes --------> | (catch up in real time)

| |

| 6. confirm lag ~= 0 |

| 7. block writes + sync seqs |

| 8. switch application -----> | (new version serves)

Step 1: Prepare the old version

You need to enable logical replication on the publisher side. Raise `wal_level` to `logical`, which requires a restart.

postgresql.conf (old version)

wal_level = logical

max_replication_slots = 10

max_wal_senders = 10

Step 2: Migrate the schema in advance

Logical replication replicates only data (rows). It does not automatically create the schema — table definitions, indexes, constraints, functions. So you must create the schema on the new version beforehand.

Extract schema only (no data) from the old version

pg_dump -h OLD_HOST -U postgres -d appdb \

--schema-only --no-owner -f schema.sql

Load the schema into the new version

psql -h NEW_HOST -U postgres -d appdb -f schema.sql

Step 3: Publication and subscription

Now create a publication on the old version and a subscription on the new version.

-- Run on the old version (Publisher)

CREATE PUBLICATION pub_all FOR ALL TABLES;

-- Run on the new version (Subscriber)

CREATE SUBSCRIPTION sub_all

CONNECTION 'host=OLD_HOST dbname=appdb user=repl password=secret'

PUBLICATION pub_all

WITH (copy_data = true);

When you create it with `copy_data = true`, the initial data copy starts as soon as the subscription is created, and once the copy finishes it automatically switches to streaming changes.

Step 4: Monitor sync progress

Monitor whether replication is catching up. You can check the replication lag on the publisher side.

-- On the old version (Publisher): check replication lag

SELECT

slot_name,

pg_size_pretty(

pg_wal_lsn_diff(pg_current_wal_lsn(), confirmed_flush_lsn)

) AS lag

FROM pg_replication_slots;

-- On the new version (Subscriber): per-table sync state

SELECT srrelid::regclass AS table_name, srsubstate

FROM pg_subscription_rel;

When `srsubstate` is `r` (ready), that table has finished its initial copy and entered the streaming phase.

Step 5: Synchronize sequences

This is the biggest pitfall of logical replication. **Logical replication does not replicate sequence values.** The table data has caught up, but the current value (`last_value`) of each sequence remains in its initial state on the new version. If you cut over in this state, you get PK collisions.

Just before cutover, with writes blocked, you must manually align every sequence.

-- Generate SQL on the old version that captures each sequence's current value

SELECT

'SELECT setval(' || quote_literal(schemaname || '.' || sequencename)

|| ', ' || COALESCE(last_value, 1) || ');'

FROM pg_sequences;

Run the resulting setval statements on the new version to align the sequences. For safety, it is also a good idea to leave a little headroom (for example, by adding 1000 to the current value).

Step 6: Cutover

Now for the switch. Keep the procedure short and decisive.

[Cutover order]

1. Block application writes (switch to read-only or stop traffic)

2. Final confirm replication lag = 0 (wait for all remaining changes)

3. Synchronize sequence values (Step 5)

4. Disable/drop the subscription on the new version

5. Change the application connection string to the new version

6. Resume traffic, monitor the new version's healthy operation

Because writes only stop during this cutover window, the actual downtime is limited to seconds or tens of seconds.

Handling extensions, sequences, and large volumes

Regardless of the upgrade method, there are common items you must attend to.

Extensions

`pg_upgrade` carries over the SQL objects of an extension, but the extension's shared library (.so) must already be installed on the new version. If you use PostGIS, for example, the matching PostGIS package must be installed on the new version first.

After the upgrade, you may additionally need to bring the extension version up to match the new version.

-- Update the extension version after the upgrade

ALTER EXTENSION postgis UPDATE;

-- Compare installed extensions with the latest available versions

SELECT name, default_version, installed_version

FROM pg_available_extensions

WHERE installed_version IS NOT NULL;

Large volumes and indexes

dump/restore and logical replication rebuild indexes on the target side. Regenerating indexes for tables with hundreds of millions of rows consumes a lot of time and resources. During the initial load, it is better to create indexes later and raise `maintenance_work_mem` to speed up the build.

Temporarily raise during initial load / index build (new version)

maintenance_work_mem = 2GB

max_wal_size = 8GB

If the initial copy in logical replication is too slow, you can also consider a split strategy: exclude large tables from the publication, move them separately, and add them later.

Verification and rollback

An upgrade is not truly complete when it "finishes" but when "verification finishes."

Verification checklist

[Post-upgrade verification items]

- All database/schema/table counts match

- Row counts of key tables match between old and new

- Statistics re-collection (ANALYZE) complete

- Extensions working and versions confirmed

- Sequence values correct (mandatory for logical replication)

- Regression check on execution plans of major queries

- Application smoke test passes

Execution plan regression in particular is often overlooked. When the major version changes, the planner's behavior can shift subtly, so a query that ran well may suddenly become slow. Right after the upgrade, re-collect statistics and compare the execution plans of your key queries.

Rollback strategy

Rollback difficulty differs greatly by method.

| Method | Rollback approach |

| --- | --- |

| dump/restore | The old version is still alive, so just point the connection back |

| pg_upgrade (copy) | Old data directory preserved -> restart the old version |

| pg_upgrade (--link) | Once the new version has started, practically only backup restore works |

| Logical replication | Free before cutover; after cutover, requires a reverse-replication design |

To prepare for rollback even after cutover with logical replication, you need an advanced strategy that configures reverse replication from new version to old version ahead of time. This lets you direct traffic back to the old version if something goes wrong, but it greatly increases configuration complexity.

Managed environments: RDS and Aurora

In cloud managed PostgreSQL, the upgrade mechanism is abstracted away.

RDS for PostgreSQL

RDS lets you trigger a major version upgrade via the console or API, and internally it uses `pg_upgrade`. It automatically takes a snapshot before the upgrade, so some rollback safety is secured. However, the instance becomes unavailable during that time, so downtime occurs.

Using Blue/Green Deployments can greatly reduce downtime. You build the new version in the green environment in advance, sync it with logical replication, and promote it with only a brief switch.

Aurora PostgreSQL

Aurora also supports in-place major upgrades and likewise offers Blue/Green Deployments. Because Aurora separates storage from compute, its upgrade model behaves somewhat differently from ordinary RDS. In a managed environment, rather than running pg_upgrade yourself, it is less operational burden to make use of the provided upgrade paths and the Blue/Green feature first.

In common, whether managed or self-hosted, **securing a backup/snapshot before the upgrade** and **a pre-flight compatibility check** are equally important.

Common pitfalls

Finally, let us summarize the pitfalls you frequently meet in practice.

First, **locale and encoding mismatch**. If you initialize the new-version data directory with a locale different from the old version, pg_upgrade either refuses, or even if it passes, the sort order may subtly differ and break indexes. A change in glibc collation version can also cause index corruption, so it is safer to review rebuilding text indexes after the upgrade.

Second, **missing extensions**. If you run pg_upgrade without installing the extension libraries on the new version, it fails at the `--check` stage. You must install all extensions for the new version in advance.

Third, **missing sequences in logical replication**. As emphasized earlier, sequences are not synced automatically. If you skip setval just before cutover, the new version hits a PK collision from the very first INSERT.

Fourth, **the logical replication constraint on tables without a PK**. To replicate UPDATE/DELETE via logical replication, the target table needs a primary key or a REPLICA IDENTITY. Without it, only INSERTs replicate or you get errors.

Fifth, **skipping statistics re-collection**. pg_upgrade does not carry over planner statistics. If you do not run ANALYZE right after the upgrade, every query runs with a wrong execution plan and the whole database slows down.

Sixth, **starting the old version after --link**. The moment you start both the new and old versions for the sake of verification, data can be corrupted. In --link mode, only one of the two may be alive.

Conclusion

A PostgreSQL major version upgrade is ultimately a choice between "how much can you reduce downtime" and "how much operational complexity can you bear." If your data is small, the simplicity of dump/restore is best; if the data is large but you can take a short maintenance window, pg_upgrade --link is powerful. If you can barely tolerate any downtime, logical replication is the answer, but it takes that much more handwork.

Whatever method you choose, there are principles that do not change. Verify compatibility with `--check` in advance, secure a trustworthy backup, re-collect statistics after the upgrade, and verify thoroughly before going to production. An upgrade is something whose risk grows the longer you delay it. Rather than being chased by EOL, the best strategy is to run a rehearsal at least once during a calm period.

References

- [pg_upgrade official documentation](https://www.postgresql.org/docs/current/pgupgrade.html)

- [Logical Replication official documentation](https://www.postgresql.org/docs/current/logical-replication.html)

- [pg_dump official documentation](https://www.postgresql.org/docs/current/app-pgdump.html)

- [CREATE PUBLICATION reference](https://www.postgresql.org/docs/current/sql-createpublication.html)

- [CREATE SUBSCRIPTION reference](https://www.postgresql.org/docs/current/sql-createsubscription.html)

- [PostgreSQL Versioning Policy](https://www.postgresql.org/support/versioning/)

- [Upgrading the PostgreSQL DB engine for Amazon RDS](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_UpgradeDBInstance.PostgreSQL.html)

- [Amazon RDS Blue/Green Deployments](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/blue-green-deployments.html)

- [Upgrading Amazon Aurora PostgreSQL DB clusters](https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/USER_UpgradeDBInstance.PostgreSQL.html)

- [PostgreSQL Wiki: Upgrading](https://wiki.postgresql.org/wiki/Upgrading)

현재 단락 (1/192)

PostgreSQL ships a new major version every year. And each major version has a clearly defined lifeti...

작성 글자: 0원문 글자: 17,635작성 단락: 0/192