Skip to content
Published on

Eviction, Drain, Migration Failure Modes: How KubeVirt Handles Failures

Authors

Introduction

From an operational perspective, the real difficulty of KubeVirt reveals itself not when launching a VM, but when handling failures. Pods can often be resolved by simply rescheduling, but VMs require considering memory state, disks, network sessions, and guest execution context together. That is why drain, eviction, and migration failures are the scenes that best expose KubeVirt's internal design.

In this post, we examine "what is considered a failure" and "how KubeVirt standardizes and exposes failures."

1. KubeVirt Does Not Immediately Kill on Drain

Looking at the EvictionStrategy annotation in the API types, it defines what strategy to use during node drain situations. KubeVirt does not treat drain as a simple Kubernetes eviction event but as an event requiring VM-specific policy decisions.

The reason is clear.

  • If migratable, it is better to move the VM first.
  • If non-migratable, suspension or deferral may be needed.
  • If the VM is owned by an external controller, a different handling model may be required.

In other words, drain in KubeVirt is not "emptying one Pod" but the question "how should this VM be evacuated?"

2. Operators Must Check EvictionRequested and EvacuationNodeName

VirtualMachineInstanceStatus has EvacuationNodeName, and EvictionRequested is defined among condition types. This means eviction is not just left in event logs but remains as a structured signal in the VMI status.

Operators need to check these values for the following reasons:

  • Whether drain has started
  • Which node is being evacuated from
  • Whether a migration should follow
  • Whether an external controller needs to take follow-up action

In other words, KubeVirt exposes drain in a status-first manner.

3. Non-Migratable Reasons Are Recorded in Advance

The KubeVirt API types define a large number of non-migratable reasons.

  • DisksNotLiveMigratable
  • InterfaceNotLiveMigratable
  • HotplugNotLiveMigratable
  • VirtIOFSNotLiveMigratable
  • HostDeviceNotLiveMigratable
  • SEVNotLiveMigratable
  • SecureExecutionNotLiveMigratable
  • TDXNotLiveMigratable
  • HypervPassthroughNotLiveMigratable
  • PersistentReservationNotLiveMigratable

This design is good because it does not just surface failures as runtime errors after the fact. KubeVirt tells you in advance via condition reasons "why this VM cannot be migrated."

Operators can read the approximate limitations from the API status alone, before triggering a migration and waiting for failure logs.

4. MigrationState Is the Central Axis for Failure Analysis

VirtualMachineInstanceMigrationState contains a wealth of information needed for failure analysis.

  • Source node and source pod
  • Target node and target pod
  • Sync address
  • Direct migration ports
  • Completed status
  • Failed status
  • Abort requested status
  • Abort status
  • Failure reason
  • Current migration mode

This structure reveals that KubeVirt does not view migration as a simple boolean state. Migration is a distributed protocol where source and target change over time, and failures can occur at multiple stages.

5. Pre-copy Failures Have Recovery Potential, but Post-copy Is More Dangerous

As seen in the previous post, pre-copy progressively transfers data while the source retains the original memory. In contrast, post-copy starts execution on the target first and retrieves needed pages from the source later.

Therefore, post-copy failures are much more dangerous. In pkg/virt-handler/vm.go, there is formatIrrecoverableErrorMessage, and when a post-copy failure causes the domain to enter a paused state, the message "VMI is irrecoverable due to failed post-copy migration" is generated.

This is a very strong signal. It does not just mean the migration job failed -- it means the running VM state itself may have collapsed into an unrecoverable state.

In other words, post-copy is a powerful tool for moving busy workloads, but the cost of failure is also greater.

6. Abort Is Also a State Machine

VirtualMachineInstanceMigrationState has AbortRequested and AbortStatus, and the abort state is separately modeled as Succeeded, Failed, and Aborting.

This design is realistic. Migration abort does not end instantly just by pressing a button.

  • Whether the abort is still possible at this stage
  • Whether the target has already received the handoff
  • How far storage or network side effects have progressed

These factors affect the outcome. KubeVirt treats abort not as a simple API cancellation but as a separate state machine.

7. Migration Failures Occur in Both the Control Plane and Data Plane

Failure causes can be broadly divided into two categories.

Control plane failures

  • Target Pod fails to schedule
  • Cannot satisfy the appropriate node selector
  • Blocked by quota or policy
  • Blocked due to utility volumes or backups

Data plane failures

  • Dirty page rate is too high for pre-copy to converge
  • Source-target synchronization breaks after post-copy transition
  • Migration socket or proxy path issues
  • Domain preparation on the target is delayed

In other words, a single "migration failed" message is insufficient -- you need to distinguish which plane the failure occurred in.

8. Secure Features Often Conflict with Migration Flexibility

As already apparent at the API level, features like SEV, Secure Execution, TDX, and host device passthrough frequently conflict with migration constraints.

This is not coincidental. These features generally require one of the following:

  • Strong coupling with specific host hardware
  • Special protection of guest memory
  • Use of device state that is difficult to reproduce externally

In other words, the more you strengthen security or maximize hardware performance, the easier it becomes to conflict with the characteristic of "zero-downtime movement to any node."

9. Multiple Pod Existence During Migration Makes Failure Analysis Harder

Why ActivePods is important becomes clearer in failure modes. During migration, the source and target launcher Pods briefly coexist, making it easy to confuse which Pod is the actual current source when examining logs and status.

When analyzing failures, you should look at least the following together:

  • VMI's activePods
  • Migration CR phase
  • Target pod name
  • Source pod name
  • VMI migrationState

Without cross-referencing this information, it is easy to mistake logs from an already cleaned up Pod as the current problem.

10. Drain Strategy Should Vary Based on Workload Characteristics

The same eviction strategy should not be applied to all VMs. For example:

  • Nearly stateless test VMs
  • Performance-sensitive VMs using SR-IOV and host devices
  • General workload VMs with RWX volumes and live migration capability
  • Memory write-intensive VMs where post-copy allowance is important

These have entirely different failure costs and acceptable responses.

In other words, drain strategy should not be an infrastructure default but an operational policy tailored to VM characteristics.

Key Points for Operators

  • Drain is a VM evacuation strategy issue, not Pod removal.
  • You must look at EvictionRequested, EvacuationNodeName, and MigrationState together.
  • Non-migratable reasons are pre-judgment signals, not post-failure logs.
  • Post-copy failures can lead to irrecoverable states and require much greater caution.

Conclusion

KubeVirt does not hide failures but structures them as state machines. Drain is connected to eviction strategies, live migration feasibility is revealed in advance via condition reasons, and actual migration progress and failure reasons accumulate in MigrationState. The fact that post-copy failures are separated as irrecoverable demonstrates that KubeVirt views VM failures differently from simple Pod restart issues.

In the next post, we will wrap up the series by organizing a source code reading map that shows the order in which to read the KubeVirt source code to understand the entire structure most quickly.