qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [RFC v5 0/3] migration: reduce time of loading non-iterable vmstate


From: Claudio Fontana
Subject: Re: [RFC v5 0/3] migration: reduce time of loading non-iterable vmstate
Date: Wed, 15 Feb 2023 18:06:40 +0100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.4.0

On 1/17/23 12:55, Chuang Xu wrote:
> In this version:
> 
> - rename rcu_read_locked() to rcu_read_is_locked().
> - adjust the sanity check in address_space_to_flatview().
> - improve some comments.
> 
> The duration of loading non-iterable vmstate accounts for a significant
> portion of downtime (starting with the timestamp of source qemu stop and
> ending with the timestamp of target qemu start). Most of the time is spent
> committing memory region changes repeatedly.
> 
> This patch packs all the changes to memory region during the period of        
> loading non-iterable vmstate in a single memory transaction. With the
> increase of devices, this patch will greatly improve the performance.
> 
> Here are the test1 results:
> test info:
> - Host
>   - Intel(R) Xeon(R) Platinum 8260 CPU
>   - NVIDIA Mellanox ConnectX-5
> - VM
>   - 32 CPUs 128GB RAM VM
>   - 8 16-queue vhost-net device
>   - 16 4-queue vhost-user-blk device.
> 
>       time of loading non-iterable vmstate     downtime
> before                about 150 ms                      740+ ms
> after         about 30 ms                       630+ ms
> 
> (This result is different from that of v1. It may be that someone has 
> changed something on my host.., but it does not affect the display of 
> the optimization effect.)
> 
> 
> In test2, we keep the number of the device the same as test1, reduce the 
> number of queues per device:
> 
> Here are the test2 results:
> test info:
> - Host
>   - Intel(R) Xeon(R) Platinum 8260 CPU
>   - NVIDIA Mellanox ConnectX-5
> - VM
>   - 32 CPUs 128GB RAM VM
>   - 8 1-queue vhost-net device
>   - 16 1-queue vhost-user-blk device.
> 
>       time of loading non-iterable vmstate     downtime
> before                about 90 ms                      about 250 ms
> 
> after         about 25 ms                      about 160 ms
> 
> 
> 
> In test3, we keep the number of queues per device the same as test1, reduce 
> the number of devices:
> 
> Here are the test3 results:
> test info:
> - Host
>   - Intel(R) Xeon(R) Platinum 8260 CPU
>   - NVIDIA Mellanox ConnectX-5
> - VM
>   - 32 CPUs 128GB RAM VM
>   - 1 16-queue vhost-net device
>   - 1 4-queue vhost-user-blk device.
> 
>       time of loading non-iterable vmstate     downtime
> before                about 20 ms                      about 70 ms
> after         about 11 ms                      about 60 ms
> 
> 
> As we can see from the test results above, both the number of queues and 
> the number of devices have a great impact on the time of loading non-iterable 
> vmstate. The growth of the number of devices and queues will lead to more 
> mr commits, and the time consumption caused by the flatview reconstruction 
> will also increase.
> 
> Please review, Chuang.
> 
> [v4]
> 
> - attach more information in the cover letter.
> - remove changes on virtio_load.
> - add rcu_read_locked() to detect holding of rcu lock.
> 
> [v3]
> 
> - move virtio_load_check_delay() from virtio_memory_listener_commit() to 
>   virtio_vmstate_change().
> - add delay_check flag to VirtIODevice to make sure virtio_load_check_delay() 
>   will be called when delay_check is true.
> 
> [v2]
> 
> - rebase to latest upstream.
> - add sanity check to address_space_to_flatview().
> - postpone the init of the vring cache until migration's loading completes. 
> 
> [v1]
> 
> The duration of loading non-iterable vmstate accounts for a significant
> portion of downtime (starting with the timestamp of source qemu stop and
> ending with the timestamp of target qemu start). Most of the time is spent
> committing memory region changes repeatedly.
> 
> This patch packs all the changes to memory region during the period of
> loading non-iterable vmstate in a single memory transaction. With the
> increase of devices, this patch will greatly improve the performance.
> 
> Here are the test results:
> test vm info:
> - 32 CPUs 128GB RAM
> - 8 16-queue vhost-net device
> - 16 4-queue vhost-user-blk device.
> 
>       time of loading non-iterable vmstate
> before                about 210 ms
> after         about 40 ms
> 
> 

great improvements on the load times, congrats!

Claudio



reply via email to

[Prev in Thread] Current Thread [Next in Thread]