Initially we have a bunch of guests running on compute-2 (which is running
qemu-kvm-ev 2.3.0). We then started live-migrating them one at a time to
compute-0 (which is running qemu-kvm-ev 2.6.0). Three of them migrated
successfully. The fourth (which was essentially identical in configuration
to the first three) failed, as per the following logs in
/var/log/libvirt/qemu/instance-0000000e.log:
2017-03-29T06:38:37.886940Z qemu-kvm: VQ 2 size 0x80 < last_avail_idx 0x47b
- used_idx 0x47c
2017-03-29T06:38:37.886974Z qemu-kvm: error while loading state for instance
0x0 of device '0000:00:07.0/virtio-balloon'
2017-03-29T06:38:37.888684Z qemu-kvm: load of migration failed: Operation
not permitted
2017-03-29 06:38:37.896+0000: shutting down
Does anyone know of an existing bug report covering this issue? (I took a
look and didn't see anything obviously related.)
This is the virtio-balloon device. If you remove the device the live
migration should work reliably.
Alternatively, you can temporarily rmmod virtio_balloon inside the guest
for live migration. After migration you can modprobe virtio_balloon
again.
last_avail_idx 0x47b with used_idx 0x47c is an invalid device state.
I've diffed qemu-kvm-ev 2.6.0-27.1 hw/virtio/virtio-balloon.c against
qemu.git/master and do not see an obvious bug. I also compared
qemu-kvm-ev 2.3.0-31 with qemu-kvm-ev 2.6.0-27.1.