[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [RFC PATCH v1 00/26] migration: File based migration with multifd an
From: |
Claudio Fontana |
Subject: |
Re: [RFC PATCH v1 00/26] migration: File based migration with multifd and fixed-ram |
Date: |
Tue, 4 Apr 2023 17:10:52 +0200 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.4.0 |
On 4/4/23 16:53, Peter Xu wrote:
> On Tue, Apr 04, 2023 at 10:00:16AM +0200, Claudio Fontana wrote:
>> Hi Peter,
>
> Hi, Claudio,
>
>>
>> On 4/3/23 21:26, Peter Xu wrote:
>>> Hi, Claudio,
>>>
>>> Thanks for the context.
>>>
>>> On Mon, Apr 03, 2023 at 09:47:26AM +0200, Claudio Fontana wrote:
>>>> Hi, not sure if what is asked here is context in terms of the previous
>>>> upstream discussions or our specific requirement we are trying to bring
>>>> upstream.
>>>>
>>>> In terms of the specific requirement we are trying to bring upstream, we
>>>> need to get libvirt+QEMU VM save and restore functionality to be able to
>>>> transfer VM sizes of ~30 GB (4/8 vcpus) in roughly 5 seconds. When an
>>>> event trigger happens, the VM needs to be quickly paused and saved to
>>>> disk safely, including datasync, and another VM needs to be restored,
>>>> also in ~5 secs. For our specific requirement, the VM is never running
>>>> when its data (mostly consisting of RAM) is saved.
>>>>
>>>> I understand that the need to handle also the "live" case comes from
>>>> upstream discussions about solving the "general case", where someone
>>>> might want to do this for "live" VMs, but if helpful I want to highlight
>>>> that it is not part of the specific requirement we are trying to address,
>>>> and for this specific case won't also in the future, as the whole point
>>>> of the trigger is to replace the running VM with another VM, so it cannot
>>>> be kept running.
>>>
>>> From what I read so far, that scenario suites exactly what live snapshot
>>> would do with current QEMU - that at least should involve a snapshot on the
>>> disks being used or I can't see how that can be live. So it looks like a
>>> separate request.
>>>
>>>> The reason we are using "migrate" here likely stems from the fact that
>>>> existing libvirt code currently uses QMP migrate to implement the save
>>>> and restore commands. And in my personal view, I think that reusing the
>>>> existing building blocks (migration, multifd) would be preferable, to
>>>> avoid having to maintain two separate ways to do the same thing. That
>>>> said, it could be done in a different way, if the performance can keep
>>>> up. Just thinking of reducing the overall effort and also maintenance
>>>> surface.
>>>
>>> I would vaguely guess the performance can not only keep up but better than
>>> what the current solution would provide, due to the possibility of (1)
>>> batch handling of continuous guest pages, and (2) completely no dirty
>>> tracking overhead.
>>>
>>> For (2), it's not about wr-protect page faults or vmexits due to PML being
>>> full (because vcpus will be stopped anyway..), it's about enabling the
>>> dirty tracking (which already contains overhead, especially when huge pages
>>> are enabled, to split huge pages in EPT pgtables) and all the bitmap
>>> operations QEMU does during live migration even if the VM is not live.
>>
>> something we could profile for, I do not remember it being really an
>> important source of overhead in my previous profile runs,
>> but maybe worthwhile redoing the profiling with Fabiano's patchset.
>
> Yes I don't know the detailed number either, it should depend on the guest
> configuration (mem size, mem type, kernel version etc). It could be less a
> concern comparing to the time used elsewhere. More on this on below.
>
>>
>>>
>>> IMHO reusing multifd may or may not be a good idea here, because it'll of
>>> course also complicate multifd code, hence makes multifd harder to
>>> maintain, while not in a good way, because as I mentioned I don't think it
>>> can use much of what multifd provides.
>>
>>
>> The main advantage we get is the automatic multithreading of the
>> qemu_savevm_state_iterate code in my view.
>>
>> Reimplementing the same thing again has the potential to cause bitrot for
>> this use case, and using multiple fds for the transfer is exactly what is
>> needed here,
>> and in my understanding the same exact reason multifd exists: to take
>> advantage of high bandwidth migration channels.
>>
>> The only adjustment needed to multifd is the ability to work with block
>> devices (file fds) as the migration channels instead of just sockets,
>> so it seems a very natural extension of multifd to me.
>
> Yes, since I haven't looked at the multifd patches at all so I don't have
> solid clue on how much it'll affect multifd. I'll leave that to Juan.
>
>>
>>>
>>> I don't have a strong opinion on the impl (even though I do have a
>>> preference..), but I think at least we should still check on two things:
>>>
>>> - Being crystal clear on the use case above, and double check whether "VM
>>> stop" should be the default operation at the start of the new cmd - we
>>> shouldn't assume the user will be aware of doing this, neither should
>>> we assume the user is aware of the performance implications.
>>
>>
>> Not sure I can identify what you are asking specifically: the use case is to
>> stop executing the currently running VM as soon as possible, save it to
>> disk, then restore another VM as soon as possible.
>> Probably I missed something there.
>
> Yes, then IMHO as mentioned we should make "vm stop" part of the command
> procedure if vm was still running when invoked. Then we can already
> optimize dirty logging of above (2) with the current framework. E.g., we
> already optimized live snapshot to not enable dirty logging:
>
> if (!migrate_background_snapshot()) {
> memory_global_dirty_log_start(GLOBAL_DIRTY_MIGRATION);
> migration_bitmap_sync_precopy(rs);
> }
>
> Maybe that can also be done for fixed-ram migration, so no matter how much
> overhead there will be, that can be avoided.
Understood, agree.
Would it make sense to check for something like if (!runstate_is_running())
instead of checking for the specific multifd + fixed-ram feature?
I think from a high level perspective, there should not be dirtying if the
vcpus are not running right?
This could even be a bit more future proof to avoid checking for many features,
if they all happen to share the fact that vcpus are not running.
>
> PS: I think similar optimizations can be done too in ram_save_complete() or
> ram_state_pending_exact().. maybe we should move the check into
> migration_bitmap_sync_precopy() so it can be skipped as a whole when it can.
makes sense, interesting.
I wonder if ramblock_is_ignored() could be optimized a bit too, since it seems
to consume roughly the same amount of cpu as the dirty bitmap handling, even
when "ignore-shared" is not used.
this feature was added by:
commit fbd162e629aaf8a7e464af44d2f73d06b26428ad
Author: Yury Kotov <yury-kotov@yandex-team.ru>
Date: Fri Feb 15 20:45:46 2019 +0300
migration: Add an ability to ignore shared RAM blocks
If ignore-shared capability is set then skip shared RAMBlocks during the
RAM migration.
Also, move qemu_ram_foreach_migratable_block (and rename) to the
migration code, because it requires access to the migration capabilities.
Signed-off-by: Yury Kotov <yury-kotov@yandex-team.ru>
Message-Id: <20190215174548.2630-4-yury-kotov@yandex-team.ru>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Probably not that important, just to mention since we were thinking of possible
small optimizations.
I would like to share the complete previous callgrind data, but cannot find a
way to export them in a readable state, could export the graph though as PDF if
helpful.
Likely we'd need a new round of measurements with perf...
Ciao,
Claudio
>
> Thanks,
>
>>
>>>
>>> - Making sure the image layout is well defined, so:
>>>
>>> - It'll be extensible in the future, and,
>>>
>>> - If someone would like to refactor it to not use the migration thread
>>> model anymore, the image format, hopefully, can be easy to keep
>>> untouched so it can be compatible with the current approach.
>>>
>>> Just my two cents. I think Juan should have the best grasp on this.
>>>
>>> Thanks,
>>>
>>
>> Ciao,
>>
>> Claudio
>>
>
- Re: [RFC PATCH v1 00/26] migration: File based migration with multifd and fixed-ram, David Hildenbrand, 2023/04/03
- Re: [RFC PATCH v1 00/26] migration: File based migration with multifd and fixed-ram, Claudio Fontana, 2023/04/03
- Re: [RFC PATCH v1 00/26] migration: File based migration with multifd and fixed-ram, Peter Xu, 2023/04/03
- Re: [RFC PATCH v1 00/26] migration: File based migration with multifd and fixed-ram, Claudio Fontana, 2023/04/04
- Re: [RFC PATCH v1 00/26] migration: File based migration with multifd and fixed-ram, Peter Xu, 2023/04/04
- Re: [RFC PATCH v1 00/26] migration: File based migration with multifd and fixed-ram,
Claudio Fontana <=
- Re: [RFC PATCH v1 00/26] migration: File based migration with multifd and fixed-ram, Peter Xu, 2023/04/04
- Re: [RFC PATCH v1 00/26] migration: File based migration with multifd and fixed-ram, Fabiano Rosas, 2023/04/06
- Re: [RFC PATCH v1 00/26] migration: File based migration with multifd and fixed-ram, Claudio Fontana, 2023/04/07
- Re: [RFC PATCH v1 00/26] migration: File based migration with multifd and fixed-ram, Peter Xu, 2023/04/11
Re: [RFC PATCH v1 00/26] migration: File based migration with multifd and fixed-ram, Daniel P . Berrangé, 2023/04/18