Live Migration 1: From Migration CRD to Target Pod Creation

Introduction
Migration Is a Work Object, Not an API Action
What the Migration Controller Watches
Why a Priority Queue Is Used
How Is the Target Pod Created?
Why Are There So Many Timeouts?
Where Are Migration Policy and Cluster Config Reflected?
Why Storage and Quota Enter the Migration Control Plane
- Storage
- Quota
When Does Handoff Occur?
Automatic Migration Requests and Dynamic Network Changes
Common Misconceptions
What Operators Should Check First
Conclusion

Introduction

When understanding live migration, many people jump straight to pre-copy or dirty pages. But there is a more important question first. Who creates the target Pod, when is the handoff triggered on the source VM, and what policies govern whether migration can begin? This is a control plane problem, not a data plane one.

This post focuses on pkg/virt-controller/watch/migration/migration.go to examine the orchestration from when a migration CR is created until the target Pod is ready.

Migration Is a Work Object, Not an API Action

As seen in the previous post, when a user requests migration, virt-api does not perform the move directly. Instead, it creates a VirtualMachineInstanceMigration object.

From that point on, the migration controller in virt-controller watches this object. The advantages of this design are clear.

The work is tracked independently
The controller processes it asynchronously
Failure, pending, and abort can be handled as separate lifecycles

In other words, migration is not a "function call" but Kubernetes object-based orchestration.

What the Migration Controller Watches

Looking at the NewController signature, the migration controller has quite a few informers and stores.

VMI informer
Pod informer
Migration informer
Node store
PVC store
Storage class store
Storage profile store
Migration policy store
Resource quota informer
KubeVirt CR store

Why so many? Because live migration is not a task that works just by matching source and target.

Can the target Pod be scheduled?
Is storage visible from the target?
What is the cluster-wide concurrent migration limit?
What timeout and bandwidth does the migration policy require?
Will resource quota block target Pod creation?

In other words, the migration controller is the central arbiter of policy and capacity.

Why a Priority Queue Is Used

An interesting point is that the migration controller uses a priority queue rather than a regular workqueue. Comments indicate the intent is to give higher priority to active migrations so that capacity-waiting migrations do not delay active migration processing.

This is an important design point. Once a migration starts, it must continuously track state, so queuing pending and active migrations equally is not ideal.

How Is the Target Pod Created?

The migration controller also does not assemble raw Pod specs directly -- it uses a template service. The existence of RenderMigrationManifest in the interface shows that the migration target Pod resembles a regular launcher Pod but has a separate rendering path.

So the controller flow is as follows:

Detect migration CR
Check the VMI's migration feasibility
Render the migration launcher manifest if a target Pod is needed
Create the target Pod
Track source and target state for handoff

In other words, the first step of migration is not "copy memory" but whether a new launcher Pod can be created.

Why Are There So Many Timeouts?

In migration.go, several timeout constants are visible.

Unschedulable pending timeout
Catch-all pending timeout

These exist to distinguish between a target Pod that is simply slow and one that is in a practically impossible state.

For example:

When nodes are temporarily insufficient
When a specific resource will never match
When PVC attach takes a long time

Treating all of these as the same failure makes it difficult for operators to identify the cause. So the controller subdivides the meaning of pending.

Where Are Migration Policy and Cluster Config Reflected?

The reason migrationPolicyStore and clusterConfig are in the controller is that migration does not run with the same policy for every VMI.

Notably, policies affect the following:

Number of concurrent migrations
Per-node outbound migration count
Bandwidth limits
Progress timeout
Completion timeout
Post-copy allowance
TLS disable option

In other words, behind the words "perform a migration," there is always a policy layer.

Why Storage and Quota Enter the Migration Control Plane

From an operator's perspective, it is easy to think only about network and CPU, but storage and quota are also important for actually creating the target Pod.

Storage

If required volumes are not prepared on the target node, migration cannot even begin.

Quota

Since a new target launcher Pod must be created, it can be blocked by namespace resource quota.

In other words, the migration control plane must solve Kubernetes-centric problems far more than guest memory copying.

When Does Handoff Occur?

In migration.go, structures like handOffMap are visible. This is a mechanism for managing the moment of handing off to virt-handler on the source side. Simply put, the controller does not hold on to everything forever. After a certain stage, the node-local execution plane takes on a larger role.

The reason for this separation:

The controller is strong at cluster-wide state decisions
Actual migration execution is better handled by virt-handler and the launcher on source and target nodes

In other words, live migration is a structure where the control plane and execution plane progressively share responsibility.

Automatic Migration Requests and Dynamic Network Changes

pkg/network/migration/evaluator.go reveals another interesting fact. Secondary network hotplug or unplug, and NAD changes, can create a migration required condition.

This means migration does not only happen because a user says "move it now." It can also be used as a re-placement mechanism when network changes are difficult to safely apply on the current Pod.

In other words, the migration control plane is both a maintenance feature and a configuration convergence mechanism.

Common Misconceptions

Misconception 1: Migration ends when source QEMU talks directly to target QEMU

No. Before that, there is migration CR, target Pod creation, scheduling, policy, quota, and storage preparation.

Misconception 2: The migration target Pod just reuses the existing Pod

No. A separate target launcher Pod may be newly created.

Misconception 3: Migration failures are mostly data plane issues

No. In actual operations, many cases are blocked in the control plane by pending, unschedulable, quota, or policy mismatch.

What Operators Should Check First

Has the migration CR been created?
Has the target Pod been created?
Is the target Pod pending or running?
What are the migration policy and cluster-wide concurrency limits?
Are namespace quota and storage visibility acceptable?

These steps must pass before the pre-copy and post-copy in the next post become meaningful.

Conclusion

The control plane of KubeVirt live migration starts with migration CR creation and continues with virt-controller checking policy, quota, storage, and scheduling state to prepare the target Pod. In other words, migration is already a Kubernetes orchestration problem before the libvirt data transfer begins. Grasping this perspective is essential for understanding why migrations get stuck at pending in actual operations.

In the next post, we will explain the pre-copy, post-copy, and dirty page models that transfer the actual memory and disk state after this control plane preparation is complete.