qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v1 05/13] migrate/ram: Handle RAM block resizes during precop


From: David Hildenbrand
Subject: Re: [PATCH v1 05/13] migrate/ram: Handle RAM block resizes during precopy
Date: Fri, 21 Feb 2020 16:46:05 +0100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.5.0

On 21.02.20 16:40, Dr. David Alan Gilbert wrote:
> * David Hildenbrand (address@hidden) wrote:
>> On 21.02.20 16:14, Dr. David Alan Gilbert wrote:
>>> * David Hildenbrand (address@hidden) wrote:
>>>> Resizing while migrating is dangerous and does not work as expected.
>>>> The whole migration code works on the usable_length of ram blocks and does
>>>> not expect this to change at random points in time.
>>>>
>>>> In the case of precopy, the ram block size must not change on the source,
>>>> after syncing the RAM block list in ram_save_setup(), so as long as the
>>>> guest is still running on the source.
>>>>
>>>> Resizing can be trigger *after* (but not during) a reset in
>>>> ACPI code by the guest
>>>> - hw/arm/virt-acpi-build.c:acpi_ram_update()
>>>> - hw/i386/acpi-build.c:acpi_ram_update()
>>>>
>>>> Use the ram block notifier to get notified about resizes. Let's simply
>>>> cancel migration and indicate the reason. We'll continue running on the
>>>> source. No harm done.
>>>>
>>>> Update the documentation. Postcopy will be handled separately.
>>>>
>>>> Cc: "Dr. David Alan Gilbert" <address@hidden>
>>>> Cc: Juan Quintela <address@hidden>
>>>> Cc: Eduardo Habkost <address@hidden>
>>>> Cc: Paolo Bonzini <address@hidden>
>>>> Cc: Igor Mammedov <address@hidden>
>>>> Cc: "Michael S. Tsirkin" <address@hidden>
>>>> Cc: Richard Henderson <address@hidden>
>>>> Cc: Shannon Zhao <address@hidden>
>>>> Cc: Alex Bennée <address@hidden>
>>>> Cc: Peter Xu <address@hidden>
>>>> Signed-off-by: David Hildenbrand <address@hidden>
>>>> ---
>>>>  exec.c                |  5 +++--
>>>>  include/exec/memory.h | 10 ++++++----
>>>>  migration/migration.c |  9 +++++++--
>>>>  migration/migration.h |  1 +
>>>>  migration/ram.c       | 41 +++++++++++++++++++++++++++++++++++++++++
>>>>  5 files changed, 58 insertions(+), 8 deletions(-)
>>>>
>>>> diff --git a/exec.c b/exec.c
>>>> index b75250e773..8b015821d6 100644
>>>> --- a/exec.c
>>>> +++ b/exec.c
>>>> @@ -2120,8 +2120,9 @@ static int memory_try_enable_merging(void *addr, 
>>>> size_t len)
>>>>      return qemu_madvise(addr, len, QEMU_MADV_MERGEABLE);
>>>>  }
>>>>  
>>>> -/* Only legal before guest might have detected the memory size: e.g. on
>>>> - * incoming migration, or right after reset.
>>>> +/*
>>>> + * Resizing RAM while migrating can result in the migration being 
>>>> canceled.
>>>> + * Care has to be taken if the guest might have already detected the 
>>>> memory.
>>>>   *
>>>>   * As memory core doesn't know how is memory accessed, it is up to
>>>>   * resize callback to update device state and/or add assertions to detect
>>>> diff --git a/include/exec/memory.h b/include/exec/memory.h
>>>> index e85b7de99a..de111347e8 100644
>>>> --- a/include/exec/memory.h
>>>> +++ b/include/exec/memory.h
>>>> @@ -113,7 +113,7 @@ typedef struct IOMMUNotifier IOMMUNotifier;
>>>>  #define RAM_SHARED     (1 << 1)
>>>>  
>>>>  /* Only a portion of RAM (used_length) is actually used, and migrated.
>>>> - * This used_length size can change across reboots.
>>>> + * Resizing RAM while migrating can result in the migration being 
>>>> canceled.
>>>>   */
>>>>  #define RAM_RESIZEABLE (1 << 2)
>>>>  
>>>> @@ -843,7 +843,9 @@ void 
>>>> memory_region_init_ram_shared_nomigrate(MemoryRegion *mr,
>>>>   *                                     RAM.  Accesses into the region will
>>>>   *                                     modify memory directly.  Only an 
>>>> initial
>>>>   *                                     portion of this RAM is actually 
>>>> used.
>>>> - *                                     The used size can change across 
>>>> reboots.
>>>> + *                                     Changing the size while migrating
>>>> + *                                     can result in the migration being
>>>> + *                                     canceled.
>>>>   *
>>>>   * @mr: the #MemoryRegion to be initialized.
>>>>   * @owner: the object that tracks the region's reference count
>>>> @@ -1464,8 +1466,8 @@ void *memory_region_get_ram_ptr(MemoryRegion *mr);
>>>>  
>>>>  /* memory_region_ram_resize: Resize a RAM region.
>>>>   *
>>>> - * Only legal before guest might have detected the memory size: e.g. on
>>>> - * incoming migration, or right after reset.
>>>> + * Resizing RAM while migrating can result in the migration being 
>>>> canceled.
>>>> + * Care has to be taken if the guest might have already detected the 
>>>> memory.
>>>>   *
>>>>   * @mr: a memory region created with @memory_region_init_resizeable_ram.
>>>>   * @newsize: the new size the region
>>>> diff --git a/migration/migration.c b/migration/migration.c
>>>> index 8fb68795dc..ac9751dbe5 100644
>>>> --- a/migration/migration.c
>>>> +++ b/migration/migration.c
>>>> @@ -175,13 +175,18 @@ void migration_object_init(void)
>>>>      }
>>>>  }
>>>>  
>>>> +void migration_cancel(void)
>>>> +{
>>>> +    migrate_fd_cancel(current_migration);
>>>> +}
>>>> +
>>>>  void migration_shutdown(void)
>>>>  {
>>>>      /*
>>>>       * Cancel the current migration - that will (eventually)
>>>>       * stop the migration using this structure
>>>>       */
>>>> -    migrate_fd_cancel(current_migration);
>>>> +    migration_cancel();
>>>>      object_unref(OBJECT(current_migration));
>>>>  }
>>>>  
>>>> @@ -2019,7 +2024,7 @@ void qmp_migrate(const char *uri, bool has_blk, bool 
>>>> blk,
>>>>  
>>>>  void qmp_migrate_cancel(Error **errp)
>>>>  {
>>>> -    migrate_fd_cancel(migrate_get_current());
>>>> +    migration_cancel();
>>>>  }
>>>>  
>>>>  void qmp_migrate_continue(MigrationStatus state, Error **errp)
>>>> diff --git a/migration/migration.h b/migration/migration.h
>>>> index 8473ddfc88..79fd74afa5 100644
>>>> --- a/migration/migration.h
>>>> +++ b/migration/migration.h
>>>> @@ -343,5 +343,6 @@ int foreach_not_ignored_block(RAMBlockIterFunc func, 
>>>> void *opaque);
>>>>  void migration_make_urgent_request(void);
>>>>  void migration_consume_urgent_request(void);
>>>>  bool migration_rate_limit(void);
>>>> +void migration_cancel(void);
>>>>  
>>>>  #endif
>>>> diff --git a/migration/ram.c b/migration/ram.c
>>>> index ed23ed1c7c..57f32011a3 100644
>>>> --- a/migration/ram.c
>>>> +++ b/migration/ram.c
>>>> @@ -52,6 +52,7 @@
>>>>  #include "migration/colo.h"
>>>>  #include "block.h"
>>>>  #include "sysemu/sysemu.h"
>>>> +#include "sysemu/runstate.h"
>>>>  #include "savevm.h"
>>>>  #include "qemu/iov.h"
>>>>  #include "multifd.h"
>>>> @@ -3710,8 +3711,48 @@ static SaveVMHandlers savevm_ram_handlers = {
>>>>      .resume_prepare = ram_resume_prepare,
>>>>  };
>>>>  
>>>> +static void ram_mig_ram_block_resized(RAMBlockNotifier *n, void *host,
>>>> +                                      size_t old_size, size_t new_size)
>>>> +{
>>>> +    ram_addr_t offset;
>>>> +    Error *err = NULL;
>>>> +    RAMBlock *rb = qemu_ram_block_from_host(host, false, &offset);
>>>> +
>>>> +    if (ramblock_is_ignored(rb)) {
>>>> +        return;
>>>> +    }
>>>> +
>>>> +    /*
>>>> +     * Some resizes are triggered on the migration target by precopy code,
>>>> +     * when synchronizing RAM block sizes. In these cases, the VM is not
>>>> +     * running and migration is not idle. We have to ignore these resizes,
>>>> +     * as we only care about resizes during precopy on the migration 
>>>> source.
>>>> +     * This handler is always registered, so ignore when migration is 
>>>> idle.
>>>> +     */
>>>> +    if (migration_is_idle() || !runstate_is_running() ||
>>>> +        postcopy_is_running()) {
>>>> +        return;
>>>> +    }
>>>> +
>>>> +    /*
>>>> +     * Precopy code cannot deal with the size of ram blocks changing at
>>>> +     * random points in time. We're still running on the source, abort
>>>> +     * the migration and continue running here. Make sure to wait until
>>>> +     * migration was canceled.
>>>> +     */
>>>> +    error_setg(&err, "RAM block '%s' resized during precopy.", rb->idstr);
>>>> +    migrate_set_error(migrate_get_current(), err);
>>>> +    error_free(err);
>>>> +    migration_cancel();
>>>> +}
>>>> +
>>>> +static RAMBlockNotifier ram_mig_ram_notifier = {
>>>> +    .ram_block_resized = ram_mig_ram_block_resized,
>>>> +};
>>>> +
>>>>  void ram_mig_init(void)
>>>>  {
>>>>      qemu_mutex_init(&XBZRLE.lock);
>>>>      register_savevm_live("ram", 0, 4, &savevm_ram_handlers, &ram_state);
>>>> +    ram_block_notifier_add(&ram_mig_ram_notifier);
>>>
>>> Can we avoid the question of the 'is_idle' checks by doing this
>>> registration in save_setup/load_setup and unregistering in
>>> save_cleanup/load_cleanup?
>>
>> Well, I figured it out now :)
>>
>> migration_is_idle() is enough to handle it in this patch.
>>
>>>
>>> That means if we land in the handler we know we're in either an incoming
>>> or outgoing migration and then you just have to check which?
>>
>> Can save/load race with other QEMU code that might register/unregister
>> notifiers? I want to avoid having to introduce locking just for that.
> 
> <looks at code> It looks like it can; qemu_savevm_state_setup
> is called from near the top of migration_thread, so I don't
> think it's holding locks at that point; and all the existing notifier
> changes on this notifier seem to be done from either init or vfio_open
> which I guess is under the big lock (?)

Yes, same as actual resizes AFAIK (which could otherwise also race with
(un)registration when traversing the list). So I'd prefer to not
register the notifier dynamically for now. (at least makes it easier for
me :) )


-- 
Thanks,

David / dhildenb




reply via email to

[Prev in Thread] Current Thread [Next in Thread]