Skip to content

필사 모드: Cross-Engine Database Migration — From Oracle to PostgreSQL (and Beyond)

English
0%
정확도 0%
💡 왼쪽 원문을 읽으면서 오른쪽에 따라 써보세요. Tab 키로 힌트를 받을 수 있습니다.
원문 렌더가 준비되기 전까지 텍스트 가이드로 표시합니다.

Opening — Why Is Everyone Suddenly Ripping Out Their Engine

Changing a database engine used to be rare. Once a system was built on Oracle, it usually lived there for a decade or more, and nobody went out of their way to tear out an engine that was running fine. But the mood shifted in the 2020s. Banks, telcos, and even public institutions lined up projects to strip out Oracle and move to PostgreSQL, and by 2026 cross-engine migration is no longer an unusual decision but a standard kind of undertaking.

There are three main reasons. The first is licensing cost. Core-based licenses and the maintenance fees renewed every year grow exponentially as a system grows. The second is the cloud. As organizations move to the cloud and reach for managed databases, they often find that running a commercial engine on the cloud as-is is even more expensive. The third is openness. There is a steady push to escape lock-in to a single vendor and move toward open source with its rich ecosystem and extensions.

In this post we lay out the path from Oracle to PostgreSQL from a practical angle — from motivation through compatibility gaps, tools, data transfer, application rewrites, validation, phased migration, a real case study, and the pitfalls. The subtitle says "and beyond" because the same methodology carries over to every other cross-engine migration as well.

The Motivation — Licensing and Cloud

You have to be precise about why you are moving. When the motivation is fuzzy, a wave of doubt — "why are we even doing this?" — inevitably arrives in the middle of the project.

- Licensing cost: a commercial engine charges core-based licenses plus annual maintenance. Every node you add and every core you add raises the bill alongside it.

- Cloud fit: managed PostgreSQL services are well polished across every cloud provider. Automatic backups, read replicas, and automatic failover come with a few clicks.

- Ecosystem and extensions: PostgreSQL is rich in extensions. It absorbs spatial data, time series, full-text search, and even vector search through extensions.

- Talent and hiring: the pool of developers with open-source experience is wide, which makes hiring and retention relatively easy.

There are also signals that you should not move. If the core business logic is buried in hundreds of thousands of lines of PL/SQL, if that logic is effectively the company's competitive edge, and if the savings are small relative to the risk of switching, then migration is a decision that only inflates cost. You should convert the motivation into numbers and decide accordingly.

The Big Picture — Migration Is Five Separate Streams of Work

Viewing a cross-engine migration as one block is daunting. In reality it splits into five distinct streams of work.

+---------------------------------------------------------------+

| The five streams of cross-engine database migration |

| |

| 1. Schema conversion |

| - tables, indexes, constraints, views, sequences |

| - data type mapping (NUMBER -> numeric, etc.) |

| |

| 2. Object (code) conversion |

| - procedures, functions, triggers, packages |

| - PL/SQL -> PL/pgSQL |

| |

| 3. Data transfer |

| - initial load + change synchronization (CDC) |

| |

| 4. Application changes |

| - SQL dialect differences, drivers, transaction handling |

| |

| 5. Validation and cutover |

| - data consistency, performance regression, cutover |

+---------------------------------------------------------------+

These five streams differ wildly in difficulty and in how much can be automated. Schema and data transfer are well supported by tooling, so their automation rate is high. PL/SQL conversion and application SQL rewriting, by contrast, demand a lot of human effort. Most of a project schedule is consumed by the latter.

Compatibility Gap 1 — Data Types

The first gap you hit is data types. Oracle and PostgreSQL look similar but carry subtly different type systems.

| Oracle | PostgreSQL | Caution |

| --- | --- | --- |

| NUMBER | numeric or integer types | unconstrained NUMBER becomes numeric wholesale and can be slow |

| VARCHAR2 | varchar | length semantics may differ between bytes and characters |

| CHAR | char | space-padding is similar but check comparison rules |

| DATE | timestamp | Oracle DATE includes time; PG date is date only |

| CLOB | text | text has effectively no length limit |

| BLOB | bytea | consider a separate strategy for large objects |

| RAW | bytea | check length-limit differences |

| NUMBER(1) as boolean | boolean | a good chance to normalize to boolean |

The most common pitfall is NUMBER. Oracle developers tend to reach for NUMBER whether the value is an integer or a real, and mapping all of it straight to numeric makes PostgreSQL slower than integer arithmetic. Columns that are clearly integers are better narrowed to integer or bigint.

-- Oracle original

CREATE TABLE orders (

order_id NUMBER(18) NOT NULL,

amount NUMBER(12,2) NOT NULL,

is_paid NUMBER(1) DEFAULT 0 NOT NULL,

created_at DATE DEFAULT SYSDATE

);

-- Form refined for PostgreSQL

CREATE TABLE orders (

order_id bigint NOT NULL,

amount numeric(12,2) NOT NULL,

is_paid boolean DEFAULT false NOT NULL,

created_at timestamptz DEFAULT now()

);

DATE also needs care. Oracle's DATE carries both date and time, whereas PostgreSQL's date carries only the date. A column that needs time must be mapped to timestamp or timestamptz to avoid data loss. If your system deals with time zones, standardizing on timestamptz from the start is the safer choice.

Compatibility Gap 2 — Sequences and Auto-Increment

Oracle's sequences exist in PostgreSQL almost unchanged. What differs is the syntax and the idiom.

-- Oracle style

CREATE SEQUENCE seq_order_id START WITH 1 INCREMENT BY 1;

INSERT INTO orders (order_id) VALUES (seq_order_id.NEXTVAL);

-- PostgreSQL style 1: explicit sequence

CREATE SEQUENCE seq_order_id START WITH 1 INCREMENT BY 1;

INSERT INTO orders (order_id) VALUES (nextval('seq_order_id'));

-- PostgreSQL style 2: standard IDENTITY column (recommended)

CREATE TABLE orders (

order_id bigint GENERATED ALWAYS AS IDENTITY,

amount numeric(12,2)

);

The pattern common before Oracle 11g — filling NEXTVAL via a trigger — is cleanly replaced by an IDENTITY column in PostgreSQL. Just remember, right after transferring data, to bump the sequence's current value up to the maximum value in the data. Skip this and the very first INSERT after cutover blows up with a primary-key collision.

-- After transfer, always: sync the sequence to the data maximum

SELECT setval('seq_order_id', (SELECT max(order_id) FROM orders));

Compatibility Gap 3 — SQL Dialect and Hints

Even the same SQL carries no small dialect differences. You have to comb through the application code and swap out the following patterns.

| Oracle idiom | PostgreSQL equivalent | Note |

| --- | --- | --- |

| dummy table DUAL | a bare SELECT | the FROM clause itself is unnecessary |

| NVL(a, b) | COALESCE(a, b) | COALESCE wins for multi-argument support |

| DECODE(...) | CASE WHEN ... | also improves readability |

| outer join (+) notation | standard LEFT JOIN | the (+) syntax is unsupported |

| ROWNUM-based paging | LIMIT and OFFSET | or keyset paging |

| CONNECT BY hierarchy query | recursive CTE | use WITH RECURSIVE |

| optimizer hint comments | mostly unnecessary | controlled via planner parameters |

| SYSDATE | now() or current_timestamp | mind the return-type difference |

The part you most need to let go of is optimizer hints. In Oracle there is a strong culture of forcing the execution plan through comment-style hints. PostgreSQL by default does not support hints. Instead you gather good statistics and, when needed, steer behavior with session-level planner parameters.

-- Oracle: force an index with a hint

SELECT /*+ INDEX(o idx_orders_user) */ *

FROM orders o WHERE user_id = 42;

-- PostgreSQL: statistics and planner settings instead of hints

ANALYZE orders; -- refreshing stats comes first

SET enable_seqscan = off; -- last resort, session-scoped

EXPLAIN ANALYZE

SELECT * FROM orders WHERE user_id = 42; -- inspect the plan directly

For the rare situations where you truly need hints, you can adopt the pg_hint_plan extension, but most cases are solved by statistics and index design. The anxiety that "I cannot live without hints" is a common misconception in the early phase of a migration.

Compatibility Gap 4 — From PL/SQL to PL/pgSQL

This is the most labor-intensive area in a migration project. PL/SQL and PL/pgSQL both share Ada-family syntax, so they look alike on the surface, but their core structures differ.

The biggest difference is packages. Oracle's PACKAGE is a module unit that bundles related procedures and variables, and PostgreSQL has no packages. The usual approach is to treat a schema as a module and unwind package variables into a separate settings table or into session variables.

-- Oracle PL/SQL procedure

CREATE OR REPLACE PROCEDURE add_order(

p_user_id IN NUMBER,

p_amount IN NUMBER

) AS

BEGIN

INSERT INTO orders (user_id, amount, created_at)

VALUES (p_user_id, p_amount, SYSDATE);

COMMIT;

EXCEPTION

WHEN OTHERS THEN

ROLLBACK;

RAISE;

END;

Translated to PL/pgSQL it looks like the following. The key differences are wrapping the function body in dollar-quoting and the fact that transaction control inside a function is hard to handle.

-- PostgreSQL PL/pgSQL procedure

CREATE OR REPLACE PROCEDURE add_order(

p_user_id bigint,

p_amount numeric

)

LANGUAGE plpgsql

AS $$

BEGIN

INSERT INTO orders (user_id, amount, created_at)

VALUES (p_user_id, p_amount, now());

-- a procedure can COMMIT internally; a function cannot

COMMIT;

EXCEPTION

WHEN others THEN

RAISE;

END;

$$;

Several differences are worth flagging here. First, from PostgreSQL 11 a PROCEDURE exists and can COMMIT and ROLLBACK internally, but a FUNCTION still cannot. Second, unlike Oracle's culture of implicit commits, PostgreSQL more naturally draws transaction boundaries on the calling side. Third, an exception-handling block is costly; overusing it degrades performance.

PL/SQL that leans heavily on cursors needs special care. PL/pgSQL also supports explicit cursors, but most row-by-row loops are overwhelmingly faster when rewritten as set-based SQL. A migration is also a fine chance to rewrite code into something better.

-- Row-by-row loop in PL/SQL (the slow pattern)

-- FOR rec IN (SELECT id FROM staging) LOOP

-- UPDATE target SET flag = 1 WHERE id = rec.id;

-- END LOOP;

-- Rewritten as set-based (the fast pattern)

UPDATE target t

SET flag = 1

FROM staging s

WHERE t.id = s.id;

Schema Conversion Tools — ora2pg and AWS SCT

Fortunately you do not convert schema and code purely by hand. There are two leading tools.

ora2pg is an open-source tool written in Perl that extracts and converts Oracle schema and data into PostgreSQL form. Its biggest strength is the migration assessment report. It auto-scores conversion difficulty, telling you in advance where the human effort will go.

Generate the assessment report from the config file

ora2pg -t SHOW_REPORT --estimate_cost -c ora2pg.conf

Extract schema only (tables, constraints, indexes)

ora2pg -t TABLE -o schema.sql -c ora2pg.conf

Extract procedures and functions (manual review essential)

ora2pg -t PROCEDURE -o procedures.sql -c ora2pg.conf

ora2pg -t FUNCTION -o functions.sql -c ora2pg.conf

AWS SCT is AWS's schema conversion tool, which shows convertible items and items needing manual conversion color-coded in a graphical interface. Its strength is that you can chain the conversion result with AWS DMS to carry data transfer through in a single flow. For a migration headed to the cloud, this combination is smooth.

Both tools share the same principle: do not blindly trust what was auto-converted. The tool handles ninety percent, and the remaining ten percent of thorny logic eats up most of the project's time. The tool's assessment report serves as a map to where that ten percent lives.

Data Migration — Initial Load and Change Synchronization

Once the schema is ready you move the data. Data transfer splits into two phases: the initial full load, and synchronizing the changes that occurred in the meantime.

+------------------------------------------------------------+

| The two phases of data transfer |

| |

| [1] Initial full load |

| Oracle ----(extract/load)----> PostgreSQL |

| large volume, takes a long time |

| |

| [2] Change synchronization (CDC) |

| Oracle redo log -> capture changes -> PostgreSQL |

| catches up changes made during initial load |

| |

| [3] Cutover |

| when lag converges to zero, switch after brief pause |

+------------------------------------------------------------+

For a small system, taking a maintenance window and moving everything at once — the big-bang approach — is simple. But for systems that demand zero downtime, change data capture is essential. AWS DMS supports both initial load and CDC, letting you run the two databases side by side for a while and then cut over briefly once the lag approaches zero.

Simple approach: extract data with ora2pg too (small scale)

ora2pg -t COPY -o data.sql -c ora2pg.conf

Or load directly into PostgreSQL (parallel processing)

ora2pg -t COPY -J 4 -c ora2pg.conf

At large volumes it is faster to drop all indexes, load, and rebuild them at the end. While loading, temporarily loosening automatic VACUUM and synchronous commit also raises throughput considerably. But these settings must absolutely be returned to normal once the load finishes.

Application SQL Rewrites

Changing the database alone is not the end. The SQL the application throws at the database also needs work. The scope of this work depends heavily on how the codebase handles SQL.

For an application cleanly abstracted on top of an ORM, just changing the dialect setting resolves a substantial share automatically. For code that assembles SQL directly from strings, especially code with Oracle-only functions scattered throughout, you must hunt them down and fix them one by one.

The spectrum of rewrite effort

low [ORM well abstracted] ---- [partial native SQL] ---- [all dynamic SQL] high

change dialect setting mixed handling full audit needed

An approach that pays off in practice is to build a graph first. Sweep the entire codebase with regular expressions for Oracle-specific functions and idioms, and build a list of where and how many times each appears. Take patterns like NVL, DECODE, SYSDATE, ROWNUM, CONNECT BY, and the dummy table as priorities and clean them up one at a time. The most dangerous kind is SQL you thought you had found but is generated dynamically and escapes a static search, so it is safest to cover the real execution paths with integration tests.

Validation — Data Consistency and Performance Regression

If you cannot prove that the transferred data is correct and that the transferred system is fast enough, the migration is not done. Validation runs on two axes.

The first is data consistency. Compare row counts, aggregate values, and sample rows between the two databases. The simplest yet most powerful method is to pull per-table row counts and checksums of key columns from both sides and match them up.

-- On the PostgreSQL side: per-table row count and amount total

SELECT 'orders' AS tbl, count(*) AS rows, sum(amount) AS sum_amount

FROM orders;

-- Pull the same values on the Oracle side and compare

-- SELECT 'orders', COUNT(*), SUM(amount) FROM orders;

The second is performance regression. You have to find the queries that got slower after migration. PostgreSQL's pg_stat_statements extension aggregates which queries consume the most total time. Comparing these statistics in an environment that reproduces production load lets you pinpoint exactly which queries regressed.

-- Top slow queries (by total execution time)

SELECT

substr(query, 1, 60) AS q,

calls,

round(total_exec_time::numeric, 1) AS total_ms,

round(mean_exec_time::numeric, 2) AS mean_ms

FROM pg_stat_statements

ORDER BY total_exec_time DESC

LIMIT 20;

Common causes of performance regression are missing statistics, missing indexes, slow arithmetic from moving NUMBER to numeric, and PL/pgSQL where a row-by-row loop was carried over as-is. The habit of looking at the plan directly with EXPLAIN ANALYZE turns out to be the fastest diagnosis in the end.

Phased Migration — You Do Not Move Everything at Once

Moving a large system in a single big bang is risky. Phased migration is the standard. Three strategies are common.

The first is read-first migration. You route only read traffic to a PostgreSQL replica first to validate load and results, and once it is stable enough you move writes too.

The second is splitting by domain. You break the system into business domains and move the loosely coupled ones first. You move and operate one domain, refine the methodology, and apply that experience to the next domain.

The third is parallel operation with shadow comparison. For a while you send the same writes to both databases and receive read results from both sides to compare. When they differ you raise an alert, but you return the existing system's result to the user. Once enough trust accumulates, you make PostgreSQL the primary.

Example timeline of a phased migration

Phase 1 schema/code conversion + build test environment

Phase 2 initial data load + start parallel operation with CDC

Phase 3 route some read traffic to PostgreSQL (shadow comparison)

Phase 4 gradually shift write traffic domain by domain

Phase 5 demote Oracle to a read-only backup, retire it later

Whatever the strategy, the key is that you must be able to roll back. For each phase, you have to build in advance a way to return immediately to the previous state if something goes wrong. A migration without a rollback path is a gamble.

Case Study — Migrating a Payment Settlement System

Let me tie the flow together with a hypothetical case. A payment settlement system had been running on Oracle for more than a decade. It had about 400 tables, roughly 60,000 lines of PL/SQL packages, and nightly batches at the heart of settlement. The motivation was license cost reduction and a move to the cloud.

The team first pulled an assessment report with ora2pg. The schema had a high auto-conversion rate, but of the 60,000 lines of packages, roughly 15,000 lines of core settlement logic were flagged for manual conversion. Most of the schedule went here.

Next they cleaned up data types. They split the indiscriminately used NUMBER into integers and numeric, and standardized DATE to timestamptz. The settlement amount columns kept numeric, while identifiers and counts were narrowed to bigint. This cleanup alone made the nightly batch's arithmetic noticeably faster.

In the PL/SQL conversion they focused on rewriting row-by-row cursor loops into set-based SQL. A settlement batch that originally took two hours looping one record at a time dropped to twenty minutes when rewritten as set operations. The migration actually improved performance.

For data, they did an initial load with AWS DMS and ran in parallel via CDC for two months. Every night they ran a shadow comparison of the settlement results from both sides, and only after confirming that the difference held at zero did they cut over. The cutover itself finished within an hour in the early hours of a weekend. The scariest phase passed by the shortest and quietest because the two months before it were done thoroughly.

Common Pitfalls

Here are the pitfalls that repeatedly trip people up in migration projects.

First, the difference between empty string and NULL. Oracle treats an empty string as NULL, whereas PostgreSQL strictly distinguishes an empty string from NULL. This difference creates subtle bugs in conditions and unique constraints.

Second, identifier case. Oracle folds unquoted identifiers to uppercase, and PostgreSQL folds them to lowercase. When quoted identifiers are mixed in, the two sides interpret them differently.

Third, implicit type conversion. Oracle converts characters and numbers fairly generously, but PostgreSQL is stricter. Code that compared a string to a numeric column throws an error after migration.

Fourth, the difference in transaction isolation and locking behavior. When behavior differs, concurrency bugs surface rarely and only in production. You have to cover the concurrency paths with load tests.

Fifth, sort order. String sort results differ by locale. Paging or comparison that depends on sorting may produce a different order after migration.

Sixth, blindly trusting the tool's auto-conversion. The tool is a starting line, not a finish line. Auto-converted code must be read by a human and validated by tests.

Migration Checklist

Finally, here is a checklist to keep at your side and consult throughout the project.

- Did you convert the motivation into numbers and leave a basis for the decision?

- Did you estimate difficulty and schedule from an ora2pg or AWS SCT assessment report?

- Did you tidy data types like NUMBER and DATE to match their meaning?

- Did you sync sequences to the data maximum after transfer?

- Did you check empty string versus NULL, identifier case, and implicit type conversion?

- Did you replace reliance on optimizer hints with statistics and index design?

- Did you rewrite row-by-row cursor loops as set-based SQL?

- Did you cover even the application's dynamic SQL with integration tests?

- Did you compare data consistency on both sides with row counts and checksums?

- Did you track performance regression with pg_stat_statements?

- Did you prepare a rollback path for each phase?

- Did you run enough parallel operation and shadow comparison before cutover?

Closing

Cross-engine database migration is not the adoption of some dazzling new technology; it is closer to dull, meticulous civil engineering. But at the end of that civil engineering wait freedom from license cost, cloud fit, and liberation from vendor lock-in. The key is to split the migration into five separate streams of work, to distinguish the parts that tooling can automate from the parts a human must rewrite, and to advance in reversible phases.

And there is one consolation. The methodology you learn on the road from Oracle to PostgreSQL carries over almost unchanged whether you go from MySQL to PostgreSQL or from SQL Server to somewhere else. The skeleton — data type mapping, dialect differences, code conversion, data synchronization, and phased cutover — is identical in every cross-engine migration. Do it properly once, and the migration beyond is far less frightening.

References

- PostgreSQL official documentation: https://www.postgresql.org/docs/current/

- PL/pgSQL documentation: https://www.postgresql.org/docs/current/plpgsql.html

- ora2pg official site: https://ora2pg.darold.net/

- ora2pg GitHub repository: https://github.com/darold/ora2pg

- AWS Schema Conversion Tool user guide: https://docs.aws.amazon.com/SchemaConversionTool/latest/userguide/CHAP_Welcome.html

- AWS Database Migration Service user guide: https://docs.aws.amazon.com/dms/latest/userguide/Welcome.html

- PostgreSQL CREATE PROCEDURE documentation: https://www.postgresql.org/docs/current/sql-createprocedure.html

- PostgreSQL pg_stat_statements documentation: https://www.postgresql.org/docs/current/pgstatstatements.html

- Oracle Database official documentation: https://docs.oracle.com/en/database/

- Flyway official documentation: https://flywaydb.org/documentation/

- Liquibase official documentation: https://docs.liquibase.com/

현재 단락 (1/239)

Changing a database engine used to be rare. Once a system was built on Oracle, it usually lived ther...

작성 글자: 0원문 글자: 19,890작성 단락: 0/239