qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[RFC PATCH 0/1] Removing RAMBlocks during migration


From: Yury Kotov
Subject: [RFC PATCH 0/1] Removing RAMBlocks during migration
Date: Mon, 9 Dec 2019 10:41:01 +0300

Hi,

I found that it's possible to remove a RAMBlock during migration.
E.g. device hot-unplugging initiated by a guest (how to reproduce is below).
And I want to clarify whether RAMBlock removing (or even adding) during
migration is valid operation or it's a bug.

Currently, it may cause some race conditions with migration thread and
migration may fail because of them. For instance, vmstate_unregister_ram
function which is called during PCIe device removing does these:
- Memset idstr -> target may receive unknown/zeroed idstr -> migration fail
- Set RAMBlock flags as non-migratable -> migration fail

RAMBlock removing itself seems safe for migration thread because of RCU.
But it seems to me there are other possible race conditions (didn't test it):
- qemu_put_buffer_async -> saves pointer to RAMBlock's memory
   -> block will be freed out of RCU (between ram save iterations)
   -> qemu_fflush -> access to freed memory.

So, I have the following questions:
1. Is RAMBlock removing/adding OK during migration?
2. If yes then what should we do with vmstate_unregister_ram?
   - Just remove vmstate_unregister_ram (my RFC patch)
   - Refcount RAMBlock's migratable/non-migratable state
   - Something else?
3. If it mustn't be possible, so may be
   assert(migration_is_idle()) in qemu_ram_free?

P.S.
I'm working on a fix of below problem and trying to choose better way:
allow device removing and fix all problem like this or fix a particular device.

--------
How to reproduce device removing during migration:

1. Source QEMU command line (target is similar)
  $ x86_64-softmmu/qemu-system-x86_64 \
    -nodefaults -no-user-config -m 1024 -M q35 \
    -qmp unix:./src.sock,server,nowait \
    -drive file=./image,format=raw,if=virtio \
    -device ioh3420,id=pcie.1 \
    -device virtio-net,bus=pcie.1
2. Start migration with slow speed (to simplify reproducing)
3. Power off a device on the hotplug pcie.1 bus:
  $ echo 0 > /sys/bus/pci/slots/0/power
4. Increase migration speed and wait until fail

Most likely you will get something like this:
  qemu-system-x86_64: get_pci_config_device: Bad config data:
          i=0xaa read: 0 device: 40 cmask: ff wmask: 0 w1cmask:19
  qemu-system-x86_64: Failed to load PCIDevice:config
  qemu-system-x86_64: Failed to load
          ioh-3240-express-root-port:parent_obj.parent_obj.parent_obj
  qemu-system-x86_64: error while loading state for instance 0x0 of device
          '0000:00:03.0/ioh-3240-express-root-port'
  qemu-system-x86_64: load of migration failed: Invalid argument

This error is just an illustration of the removing device possibility,
but not actually an illustration of the race conditions for removing RAMBlock.

Regards,
Yury

Yury Kotov (1):
  migration: Remove vmstate_unregister_ram

 hw/block/pflash_cfi01.c     | 1 -
 hw/block/pflash_cfi02.c     | 1 -
 hw/mem/pc-dimm.c            | 5 -----
 hw/misc/ivshmem.c           | 2 --
 hw/pci/pci.c                | 1 -
 include/migration/vmstate.h | 1 -
 migration/savevm.c          | 6 ------
 7 files changed, 17 deletions(-)

-- 
2.24.0




reply via email to

[Prev in Thread] Current Thread [Next in Thread]