Split View: Live Migration 2: pre-copy, post-copy, dirty page, auto-converge의 실제 의미

Live Migration 2: pre-copy, post-copy, dirty page, auto-converge의 실제 의미

들어가며
migration data plane에서 실제로 옮겨지는 것
pre-copy란 무엇인가
dirty page rate가 왜 중요한가
libvirt migration flag는 무엇을 말해 주는가
migration monitor가 하는 일
언제 post-copy로 넘어가는가
- post-copy의 의미
auto-converge와 pause는 왜 필요한가
- auto-converge
- pause
VFIO workload는 왜 더 까다로운가
migration이 stuck했다고 판단하는 기준
디스크 migration과 메모리 migration이 섞일 때
자주 하는 오해
운영자가 꼭 봐야 할 지표
마무리

들어가며

이제 live migration의 진짜 어려운 구간으로 들어가 보자. target Pod를 띄우는 것까지는 Kubernetes control plane 문제였다. 그 다음부터는 libvirt와 QEMU가 guest의 메모리, CPU 상태, 일부 디스크 상태를 어떻게 옮길지 결정하는 데이터 전송 문제가 된다.

이 구간의 핵심 코드는 pkg/virt-launcher/virtwrap/live-migration-source.go다. 여기서 KubeVirt는 libvirt migration flag를 만들고, 진행 상태를 모니터링하며, 필요하면 post-copy나 pause 같은 보조 전략을 발동한다.

migration data plane에서 실제로 옮겨지는 것

live migration이라고 해서 "VM 전체를 한 번에 복사"하는 것은 아니다. 실제로는:

guest 메모리 페이지
CPU 실행 상태
일부 장치 상태
필요하면 local volume 관련 데이터

같은 정보가 옮겨진다.

shared storage가 있다면 디스크 데이터 전체를 복사하지 않아도 되지만, 메모리와 실행 상태는 계속 전송해야 한다. 여기서 가장 어려운 문제가 바로 guest가 계속 메모리를 더럽히고 있다는 것이다.

pre-copy란 무엇인가

pre-copy는 가장 기본적인 migration 전략이다.

source VM은 계속 실행된다.
동시에 메모리 페이지를 target으로 복사한다.
복사 중 다시 변경된 페이지는 다음 라운드에서 다시 복사한다.
충분히 수렴하면 짧게 stop-and-copy 전환을 한다.

이 방식의 장점은 guest가 대부분의 시간 동안 계속 실행된다는 점이다. 단점은 guest가 메모리를 너무 빠르게 변경하면 복사가 영원히 따라잡지 못할 수 있다는 점이다.

이때 등장하는 개념이 dirty page rate다.

dirty page rate가 왜 중요한가

live-migration-source.go의 logMigrationInfo를 보면 libvirt job stats에서 다음 정보를 읽는다.

DataRemaining
MemProcessed
MemRemaining
MemDirtyRate
Downtime

즉 KubeVirt는 migration을 "성공 또는 실패"만으로 보지 않고, 지금 얼마나 남았는지와 guest가 얼마나 빠르게 메모리를 다시 더럽히는지 계속 관찰한다.

VM이 매우 write-heavy하면, 복사한 페이지가 곧바로 다시 더럽혀져 pre-copy가 수렴하지 않을 수 있다. 고성능 데이터베이스, 메모리 집약 워크로드가 여기서 어렵다.

libvirt migration flag는 무엇을 말해 주는가

generateMigrationFlags를 보면 KubeVirt가 migration 시 어떤 모드를 libvirt에 요구하는지 드러난다.

MIGRATE_LIVE
MIGRATE_PEER2PEER
MIGRATE_PERSIST_DEST
block migration이면 MIGRATE_NON_SHARED_INC
auto-converge 허용 시 MIGRATE_AUTO_CONVERGE
post-copy 허용 시 MIGRATE_POSTCOPY
parallel migration이면 MIGRATE_PARALLEL

즉 migration 방식은 막연한 개념이 아니라, 결국 libvirt flag 조합으로 구체화된다.

migration monitor가 하는 일

migrationMonitor는 data plane을 지켜보는 심장부다. 여기서 KubeVirt는:

remaining data를 추적하고
진행이 멈췄는지 판단하고
acceptable completion time을 넘겼는지 보고
필요하면 post-copy 또는 pause 전략을 발동하고
아예 stuck이면 abort한다

이 구조는 굉장히 현실적이다. migration은 "시작하면 끝까지 간다"가 아니라, 진행률 기반 적응형 제어가 필요하다.

언제 post-copy로 넘어가는가

코드를 보면 migration이 오래 stalled 되었고 AllowPostCopy가 켜져 있으며 VFIO VMI가 아니면 dom.MigrateStartPostCopy를 호출한다. 즉 KubeVirt는 처음부터 post-copy로 시작하지 않고, 보통 pre-copy로 시작한 뒤 수렴이 안 되면 post-copy로 전환한다.

post-copy의 의미

source가 더 이상 모든 페이지를 미리 복사하려고 애쓰지 않는다
target이 active workload를 더 일찍 받는다
필요한 페이지를 나중에 가져올 수 있다

장점은 바쁜 VM도 결국 이동을 끝낼 가능성이 높아진다는 점이다. 하지만 단점은 크다. active state가 target으로 넘어간 뒤 네트워크나 target 장애가 나면 guest가 더 취약해질 수 있다.

그래서 KubeVirt 문서와 설정은 post-copy를 강력한 옵션이지만 위험한 옵션으로 다룬다.

auto-converge와 pause는 왜 필요한가

모든 환경에서 post-copy가 허용되는 것도 아니고, 모든 workload에 적합한 것도 아니다. 그래서 KubeVirt는 다른 보조 전략도 쓴다.

auto-converge

libvirt와 QEMU가 guest 성능에 어느 정도 영향을 주어 dirty rate를 낮추려는 전략이다. 쉽게 말하면 "조금 느려져도 migration을 끝내자"는 쪽이다.

pause

코드에서는 post-copy가 허용되지 않거나 적절하지 않으면 guest를 잠시 suspend해 migration을 마무리하도록 한다. 이건 downtime을 희생해 completion을 얻는 전략이다.

즉 migration은 성능, 가용성, 성공 가능성 사이의 삼각 trade-off다.

VFIO workload는 왜 더 까다로운가

live-migration-source.go는 VFIO 기반 VMI를 따로 다룬다. post-copy가 일반적으로 지원되지 않는 경우가 있고, 대신 매우 큰 max downtime을 설정해 QEMU 내부 switchover를 유도한다.

이는 device passthrough workload가 메모리만 옮기면 되는 일반 VM보다 훨씬 어렵다는 뜻이다. 네트워크 SR-IOV나 GPU passthrough workload에서 migration 제약이 큰 이유가 여기 있다.

migration이 stuck했다고 판단하는 기준

코드는 "진행이 없고 네트워크 또는 QEMU 연결 문제가 의심되면 abort"라는 매우 실무적인 판단을 한다. 즉 단순히 느린 것과 완전히 멈춘 것을 구분한다.

대표적인 두 경우가 있다.

progress timeout 동안 DataRemaining이 줄지 않음
overall acceptable completion time을 초과

첫 번째는 보통 stuck, 두 번째는 수렴 실패에 가깝다.

디스크 migration과 메모리 migration이 섞일 때

block migration 또는 volume migration이 있으면 난이도는 더 올라간다. classifyVolumesForMigration, configureLocalDiskToMigrate 같은 로직이 있는 이유다. 메모리와 디스크를 동시에 옮기면 bandwidth, completion time, failure surface가 커진다.

소스 코드에도 destination volume이 source보다 작을 때 migration이 실패하는 특별한 에러 메시지 개선 로직이 있을 정도로, storage는 migration 실패의 흔한 원인이다.

자주 하는 오해

오해 1: live migration은 무중단이다

완전한 무중단이 아니라, downtime을 최소화하는 이동이다. 수렴이 안 되면 pause나 post-copy로 넘어가며, 그 순간의 trade-off가 생긴다.

오해 2: post-copy가 무조건 더 좋다

아니다. 성공 가능성은 높일 수 있지만 장애 시 위험이 더 크다.

오해 3: migration 실패는 대부분 네트워크 탓이다

네트워크도 원인이지만, dirty rate가 너무 높거나 local volume 조건이 맞지 않거나 device passthrough 제약일 수도 있다.

운영자가 꼭 봐야 할 지표

remaining data가 줄고 있는가
dirty rate가 높은가
post-copy로 전환되었는가
migration mode가 paused 또는 post-copy로 바뀌었는가
volume migration이 섞였는가

이 지표를 보면 "왜 migration이 느린가"와 "왜 migration이 실패했는가"를 분리해서 볼 수 있다.

마무리

KubeVirt live migration의 data plane은 pre-copy를 기본으로 하되, 수렴이 안 되면 auto-converge, pause, post-copy 같은 보조 전략을 사용하는 적응형 시스템이다. 이 구조는 busy VM을 가능한 오래 살려 두면서도 결국 옮길 수 있게 하려는 현실적인 타협이다. 따라서 migration을 제대로 이해하려면 libvirt flag, dirty page rate, progress timeout, post-copy 전환 조건을 함께 봐야 한다.

다음 글에서는 이 데이터 전송이 어떤 네트워크 경로와 소켓, 포트를 통해 이뤄지는지 migration proxy 관점에서 살펴보겠다.

Live Migration 2: The Real Meaning of Pre-copy, Post-copy, Dirty Pages, and Auto-converge

Introduction
What Actually Gets Moved in the Migration Data Plane
What Is Pre-copy?
Why Dirty Page Rate Matters
What Do the libvirt Migration Flags Tell Us?
What the Migration Monitor Does
When Does Post-copy Transition Happen?
- Meaning of post-copy
Why Auto-converge and Pause Are Needed
- Auto-converge
- Pause
Why Are VFIO Workloads More Challenging?
Criteria for Judging Migration as Stuck
When Disk Migration and Memory Migration Are Mixed
Common Misconceptions
Key Metrics Operators Must Watch
Conclusion

Introduction

Now let us enter the truly difficult phase of live migration. Creating the target Pod was a Kubernetes control plane problem. From this point on, it becomes a data transfer problem -- how libvirt and QEMU decide to move the guest's memory, CPU state, and some disk state.

The core code for this phase is pkg/virt-launcher/virtwrap/live-migration-source.go. Here, KubeVirt builds libvirt migration flags, monitors progress, and triggers auxiliary strategies like post-copy or pause when necessary.

What Actually Gets Moved in the Migration Data Plane

Live migration does not mean "copying the entire VM at once." In practice:

Guest memory pages
CPU execution state
Some device state
Local volume-related data if needed

These are what get transferred.

If shared storage is available, the entire disk data does not need to be copied, but memory and execution state must continue to be transferred. The hardest problem here is that the guest keeps dirtying memory continuously.

What Is Pre-copy?

Pre-copy is the most basic migration strategy.

The source VM continues running.
Simultaneously, memory pages are copied to the target.
Pages modified during copying are re-copied in the next round.
When sufficiently converged, a brief stop-and-copy transition occurs.

The advantage is that the guest continues running for most of the time. The disadvantage is that if the guest modifies memory too quickly, the copy may never catch up.

This is where the concept of dirty page rate comes in.

Why Dirty Page Rate Matters

Looking at logMigrationInfo in live-migration-source.go, the following information is read from libvirt job stats:

DataRemaining
MemProcessed
MemRemaining
MemDirtyRate
Downtime

KubeVirt does not view migration as just "success or failure." It continuously observes how much remains and how fast the guest is re-dirtying memory.

If a VM is very write-heavy, copied pages may immediately become dirty again, preventing pre-copy from converging. High-performance databases and memory-intensive workloads are challenging here.

What Do the libvirt Migration Flags Tell Us?

Looking at generateMigrationFlags, you can see what modes KubeVirt requests from libvirt during migration.

MIGRATE_LIVE
MIGRATE_PEER2PEER
MIGRATE_PERSIST_DEST
MIGRATE_NON_SHARED_INC for block migration
MIGRATE_AUTO_CONVERGE when auto-converge is allowed
MIGRATE_POSTCOPY when post-copy is allowed
MIGRATE_PARALLEL for parallel migration

In other words, the migration method is not a vague concept but is ultimately concretized as a combination of libvirt flags.

What the Migration Monitor Does

migrationMonitor is the heartbeat that watches the data plane. Here KubeVirt:

Tracks remaining data
Determines if progress has stalled
Checks if acceptable completion time has been exceeded
Triggers post-copy or pause strategy if needed
Aborts if completely stuck

This structure is very practical. Migration is not "start and it goes to completion" -- it requires progress-based adaptive control.

When Does Post-copy Transition Happen?

The code shows that if migration has been stalled for a while, AllowPostCopy is enabled, and it is not a VFIO VMI, then dom.MigrateStartPostCopy is called. KubeVirt does not start with post-copy from the beginning -- it typically starts with pre-copy and transitions to post-copy if convergence fails.

Meaning of post-copy

The source no longer tries to pre-copy all pages
The target receives the active workload earlier
Required pages can be fetched later

The advantage is that even busy VMs are more likely to eventually complete the move. However, the downside is significant. After the active state moves to the target, if there is a network or target failure, the guest becomes more vulnerable.

That is why KubeVirt documentation and settings treat post-copy as a powerful but risky option.

Why Auto-converge and Pause Are Needed

Not all environments allow post-copy, and it is not suitable for all workloads. So KubeVirt uses other auxiliary strategies.

Auto-converge

A strategy where libvirt and QEMU deliberately impact guest performance to lower the dirty rate. Simply put, it is the approach of "slow down a bit but finish the migration."

Pause

The code shows that if post-copy is not allowed or not appropriate, the guest is temporarily suspended to complete the migration. This is a strategy that sacrifices downtime for completion.

In other words, migration is a triangular trade-off between performance, availability, and success probability.

Why Are VFIO Workloads More Challenging?

live-migration-source.go treats VFIO-based VMIs separately. Post-copy may not be generally supported, and instead, a very large max downtime is set to trigger QEMU internal switchover.

This means device passthrough workloads are much more difficult than regular VMs that just need memory transfer. This is why network SR-IOV or GPU passthrough workloads have significant migration constraints.

Criteria for Judging Migration as Stuck

The code makes a very practical judgment: "if there is no progress and network or QEMU connection issues are suspected, abort." It distinguishes between simply being slow and being completely stalled.

Two representative cases:

DataRemaining has not decreased during the progress timeout
Overall acceptable completion time is exceeded

The first is usually stuck, the second is closer to convergence failure.

When Disk Migration and Memory Migration Are Mixed

When block migration or volume migration is involved, difficulty increases further. Logic like classifyVolumesForMigration and configureLocalDiskToMigrate exists for this reason. Moving memory and disk simultaneously increases bandwidth, completion time, and failure surface.

The source code even has special error message improvement logic for when the destination volume is smaller than the source, showing that storage is a common cause of migration failure.

Common Misconceptions

Misconception 1: Live migration is zero-downtime

Not entirely. It minimizes downtime but is not fully zero-downtime. If convergence fails, it transitions to pause or post-copy, which introduces trade-offs.

Misconception 2: Post-copy is always better

No. It can increase the probability of success but the risk during failures is greater.

Misconception 3: Migration failures are mostly due to network

Network is one cause, but dirty rate being too high, local volume conditions not matching, or device passthrough constraints can also be the issue.

Key Metrics Operators Must Watch

Is remaining data decreasing?
Is the dirty rate high?
Has post-copy been triggered?
Has migration mode changed to paused or post-copy?
Is volume migration involved?

Watching these metrics lets you separate "why is migration slow" from "why did migration fail."

Conclusion

The data plane of KubeVirt live migration defaults to pre-copy, but when convergence fails, it uses auxiliary strategies like auto-converge, pause, and post-copy as an adaptive system. This structure is a realistic compromise to keep busy VMs alive as long as possible while eventually completing the move. Therefore, to properly understand migration, you must look at libvirt flags, dirty page rate, progress timeout, and post-copy transition conditions together.

In the next post, we will examine the network paths, sockets, and ports through which this data transfer occurs from the migration proxy perspective.