qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH RESEND v6 0/5] migration: reduce time of loading non-iterable


From: Peter Xu
Subject: Re: [PATCH RESEND v6 0/5] migration: reduce time of loading non-iterable vmstate
Date: Mon, 6 Mar 2023 15:48:31 -0500

On Mon, Mar 06, 2023 at 08:48:05PM +0800, Chuang Xu wrote:
> Hi, Peter,
> 
> On 2023/3/6 上午6:05, Peter Xu wrote:
> > Hi, Chuang,
> > 
> > On Fri, Mar 03, 2023 at 06:56:50PM +0800, Chuang Xu wrote:
> > > Sorry to forget to update the test results in the last patch of v6.
> > > 
> > > In this version:
> > > 
> > > - add peter's patch.
> > > - split mr_do_commit() from mr_commit().
> > > - adjust the sanity check in address_space_to_flatview().
> > > - rebase to latest upstream.
> > > - replace 8260 with 8362 as testing host.
> > > - update the latest test results.
> > > 
> > > Here I list some cases which will trigger do_commit() in 
> > > address_space_to_flatview():
> 
> I ran qtest cases and used systemtap to trace those do_commit().
> 
> > > 
> > > 1.virtio_load->virtio_init_region_cache
> > > 2.virtio_load->virtio_set_features_nocheck
> > What is this one specifically?  I failed to see quickly why this needs to
> > reference the flatview.
> 
> 0x560f3bb26510 : memory_region_transaction_do_commit+0x0/0x1c0 
> [/data00/migration/qemu-open/build/qemu-system-x86_64]
> 0x560f3bb26e45 : address_space_get_flatview+0x95/0x100 
> [/data00/migration/qemu-open/build/qemu-system-x86_64]
> 0x560f3bb296b6 : memory_listener_register+0xf6/0x300 
> [/data00/migration/qemu-open/build/qemu-system-x86_64]
> 0x560f3baec59f : virtio_device_realize+0x12f/0x1a0 
> [/data00/migration/qemu-open/build/qemu-system-x86_64]
> 0x560f3bbb3b1f : device_set_realized+0x2ff/0x660 
> [/data00/migration/qemu-open/build/qemu-system-x86_64]
> 0x560f3bbb6ec6 : property_set_bool+0x46/0x70 
> [/data00/migration/qemu-open/build/qemu-system-x86_64]
> 0x560f3bbb9173 : object_property_set+0x73/0x100 
> [/data00/migration/qemu-open/build/qemu-system-x86_64]
> 0x560f3bbbc1ef : object_property_set_qobject+0x2f/0x50 
> [/data00/migration/qemu-open/build/qemu-system-x86_64]
> 0x560f3bbb93e4 : object_property_set_bool+0x34/0xa0 
> [/data00/migration/qemu-open/build/qemu-system-x86_64]
> 0x560f3b9830d9 : virtio_pci_realize+0x299/0x4e0 
> [/data00/migration/qemu-open/build/qemu-system-x86_64]
> 0x560f3b901204 : pci_qdev_realize+0x7e4/0x1090 
> [/data00/migration/qemu-open/build/qemu-system-x86_64]
> 0x560f3bbb3b1f : device_set_realized+0x2ff/0x660 
> [/data00/migration/qemu-open/build/qemu-system-x86_64]
> 0x560f3bbb6ec6 : property_set_bool+0x46/0x70 
> [/data00/migration/qemu-open/build/qemu-system-x86_64]
> 0x560f3bbb9173 : object_property_set+0x73/0x100 
> [/data00/migration/qemu-open/build/qemu-system-x86_64]
> 0x560f3bbbc1ef : object_property_set_qobject+0x2f/0x50 
> [/data00/migration/qemu-open/build/qemu-system-x86_64]
> 0x560f3bbb93e4 : object_property_set_bool+0x34/0xa0 
> [/data00/migration/qemu-open/build/qemu-system-x86_64]
> 0x560f3b99e4a0 : qdev_device_add_from_qdict+0xb00/0xc40 
> [/data00/migration/qemu-open/build/qemu-system-x86_64]
> 0x560f3bac0c75 : virtio_net_set_features+0x385/0x490 
> [/data00/migration/qemu-open/build/qemu-system-x86_64]
> 0x560f3baec238 : virtio_set_features_nocheck+0x58/0x70 
> [/data00/migration/qemu-open/build/qemu-system-x86_64]
> 0x560f3baf1e9e : virtio_load+0x33e/0x820 
> [/data00/migration/qemu-open/build/qemu-system-x86_64]

I think this one can also be avoided.  Basically any memory listener
related op can avoid referencing the latest flatview because even if it's
during depth>0 it'll be synced again when depth==0.

> 
> > 
> > > 3.vapic_post_load
> > Same confusion to this one..
> 
> 0x557a57b0e510 : memory_region_transaction_do_commit+0x0/0x1c0 
> [/data00/migration/qemu-open/build/qemu-system-x86_64]
> 0x557a57b0e9b5 : memory_region_find_rcu+0x2e5/0x310 
> [/data00/migration/qemu-open/build/qemu-system-x86_64]
> 0x557a57b11165 : memory_region_find+0x55/0xf0 
> [/data00/migration/qemu-open/build/qemu-system-x86_64]
> 0x557a57a07638 : vapic_prepare+0x58/0x250 
> [/data00/migration/qemu-open/build/qemu-system-x86_64]
> 0x557a57a07a7b : vapic_post_load+0x1b/0x80 
> [/data00/migration/qemu-open/build/qemu-system-x86_64]

AFAIU this one is the other one that truly need to reference the latest
flatview; the other one is (5) on AHCI.  It's a pity that it uses
memory_region_find_rcu() even if it must already have BQL so it's kind of
unclear (and enforced commit should never need to happen with RCU
logically..).

> 
> > 
> > > 4.tcg_commit
> > This is not enabled if kvm is on, right?
> 
> Yes.
> 
> I obtained these results from qtests. And some qtests enabled tcg.😁
> 
> > 
> > > 5.ahci_state_post_load
> > > 
> > > During my test, virtio_init_region_cache() will frequently trigger
> > > do_commit() in address_space_to_flatview(), which will reduce the
> > > optimization  effect of v6 compared with v1.
> > IIU above 1 & 4 could leverage one address_space_to_flatview_rcu() which
> > can keep the old semantics of address_space_to_flatview() by just assert
> > rcu read lock and do qatomic_rcu_read() (e.g., tcg_commit() will run again
> > at last stage of vm load).  Not sure how much it'll help.
> 
> This can really improve the performance of the existing patch, which is
> basically the same as that of v1, but it needs to add 
> address_space_to_flatview_rcu()
> and address_space_get_flatview_rcu(). I have also considered this, but will
> others find it confusing? Because there will be to_flatview(), 
> to_flatview_raw()
> and to_flatview_rcu()..

Why do we need address_space_get_flatview_rcu()?  I'm not sure whether you
mean the two calls in memory listener add/del, if so would you consider a
fixup that I attached in this reply and squash it into patch 1 of mine?  I
assume that'll also remove case (2) above, and also remove the need to have
address_space_get_flatview_rcu().

Having address_space_to_flatview_rcu() alone is fine to me (which is
actually the original address_space_to_flatview).  Again IMHO it should
only be called in the case where a stall flatview doesn't hurt.

> 
> > 
> > You may also want to have a look at the other patch to optimize ioeventfd
> > commit here; I think that'll also speed up vm load but not sure how much
> > your series can further do upon:
> > 
> > https://lore.kernel.org/all/20230228142514.2582-1-longpeng2@huawei.com/
> > 
> > Thanks,
> 
> Here are the latest test results:
> 
> test info:
> - Host
>   - Intel(R) Xeon(R) Platinum 8362 CPU
>   - Mellanox Technologies MT28841
> - VM
>   - 32 CPUs 128GB RAM VM
>   - 8 16-queue vhost-net device
>   - 16 4-queue vhost-user-blk device.
> 
>                       time of loading non-iterable vmstate
> before                                        112 ms
> long's patch applied                  103 ms
> my patch applied                       44 ms
> both applied                           39 ms
> add as_to_flat_rcu                     19 ms

The result is useful.  Thanks a lot.

-- 
Peter Xu

Attachment: 0001-fixup-memory-Reference-as-current_map-directly-in-me.patch
Description: Text document


reply via email to

[Prev in Thread] Current Thread [Next in Thread]