qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [RFC v5 0/3] migration: reduce time of loading non-iterable vmstate


From: Chuang Xu
Subject: Re: [RFC v5 0/3] migration: reduce time of loading non-iterable vmstate
Date: Mon, 27 Feb 2023 21:19:39 +0800
User-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:102.0) Gecko/20100101 Thunderbird/102.7.2

Hi, Peter

On 2023/2/25 δΈ‹εˆ11:32, Peter Xu wrote:
On Thu, Feb 23, 2023 at 11:28:46AM +0800, Chuang Xu wrote:
Hi, Peter
Hi, Chuang,

On 2023/2/22 δΈ‹εˆ11:57, Peter Xu wrote:
I don't see why it's wrong, and that's exactly what I wanted to guarantee,
that if memory_region_update_pending==true when updating ioeventfd, we may
have some serios problem.

For this, I split my patch into two parts and I put this change into the
last patch.  See the attachment.  If you worry about this, you can e.g. try
applying patch 1 only later, but I still don't see why it could.

Sorry, I made some mistakes in my previous understanding of the code. However,
if this assertion is added, it will indeed trigger some coredump in qtest with
my patches. Here is the coredump(This is the only one I found):

#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x00007fa5e7b17535 in __GI_abort () at abort.c:79
#2  0x00007fa5e7b1740f in __assert_fail_base (fmt=0x7fa5e7c78ef0 "%s%s%s:%u: 
%s%sAssertion `%s' failed.\n%n",
    assertion=0x56305fc02d60 "qemu_mutex_iothread_locked() && !memory_region_update_pending", 
file=0x56305fc028cb "../softmmu/memory.c", line=855, function=<optimized out>) at assert.c:92
#3  0x00007fa5e7b251a2 in __GI___assert_fail (assertion=assertion@entry=0x56305fc02d60 
"qemu_mutex_iothread_locked() && !memory_region_update_pending",
    file=file@entry=0x56305fc028cb "../softmmu/memory.c", line=line@entry=855, 
function=function@entry=0x56305fc03cc0 <__PRETTY_FUNCTION__.38596> 
"address_space_update_ioeventfds")
    at assert.c:101
#4  0x000056305f8a9194 in address_space_update_ioeventfds 
(as=as@entry=0x563061293648) at ../softmmu/memory.c:855
#5  0x000056305f8afe4f in address_space_init (as=as@entry=0x563061293648, 
root=root@entry=0x5630612936a0, name=name@entry=0x5630612934f0 
"virtio-net-pci") at ../softmmu/memory.c:3172
#6  0x000056305f686e43 in do_pci_register_device (errp=0x7fa5e4f39850, devfn=<optimized 
out>, name=0x563061011c40 "virtio-net-pci", pci_dev=0x563061293410) at 
../hw/pci/pci.c:1145
#7  pci_qdev_realize (qdev=0x563061293410, errp=0x7fa5e4f39850) at 
../hw/pci/pci.c:2036
#8  0x000056305f939daf in device_set_realized (obj=<optimized out>, value=true, 
errp=0x7fa5e4f39ae0) at ../hw/core/qdev.c:510
#9  0x000056305f93d156 in property_set_bool (obj=0x563061293410, v=<optimized out>, 
name=<optimized out>, opaque=0x5630610221d0, errp=0x7fa5e4f39ae0) at 
../qom/object.c:2285
#10 0x000056305f93f403 in object_property_set (obj=obj@entry=0x563061293410, 
name=name@entry=0x56305fba6bc3 "realized", v=v@entry=0x563061e52a00, 
errp=errp@entry=0x7fa5e4f39ae0)
    at ../qom/object.c:1420
#11 0x000056305f94247f in object_property_set_qobject (obj=obj@entry=0x563061293410, 
name=name@entry=0x56305fba6bc3 "realized", value=value@entry=0x563061e61cb0,
    errp=errp@entry=0x7fa5e4f39ae0) at ../qom/qom-qobject.c:28
#12 0x000056305f93f674 in object_property_set_bool (obj=0x563061293410, 
name=name@entry=0x56305fba6bc3 "realized", value=value@entry=true, 
errp=errp@entry=0x7fa5e4f39ae0)
    at ../qom/object.c:1489
#13 0x000056305f93959c in qdev_realize (dev=<optimized out>, 
bus=bus@entry=0x563061c9c400, errp=errp@entry=0x7fa5e4f39ae0) at ../hw/core/qdev.c:292
#14 0x000056305f7244a0 in qdev_device_add_from_qdict (opts=0x563061e64c00, 
from_json=<optimized out>, errp=<optimized out>, errp@entry=0x7fa5e4f39ae0)
    at /data00/migration/qemu-open/include/hw/qdev-core.h:17
#15 0x000056305f846c75 in failover_add_primary (errp=0x7fa5e4f39ad8, 
n=0x563062043530) at ../hw/net/virtio-net.c:933
#16 virtio_net_set_features (vdev=<optimized out>, 
features=4611687122246533156) at ../hw/net/virtio-net.c:1004
#17 0x000056305f872238 in virtio_set_features_nocheck 
(vdev=vdev@entry=0x563062043530, val=val@entry=4611687122246533156) at 
../hw/virtio/virtio.c:2851
#18 0x000056305f877e9e in virtio_load (vdev=0x563062043530, f=0x56306125bde0, 
version_id=11) at ../hw/virtio/virtio.c:3027
#19 0x000056305f73c601 in vmstate_load_state (f=f@entry=0x56306125bde0, 
vmsd=0x56305fef16c0 <vmstate_virtio_net>, opaque=0x563062043530, version_id=11) 
at ../migration/vmstate.c:137
#20 0x000056305f757672 in vmstate_load (f=0x56306125bde0, se=0x563062176700) at 
../migration/savevm.c:919
#21 0x000056305f757905 in qemu_loadvm_section_start_full 
(f=f@entry=0x56306125bde0, mis=0x56306101d3e0) at ../migration/savevm.c:2503
#22 0x000056305f75aca8 in qemu_loadvm_state_main (f=f@entry=0x56306125bde0, 
mis=mis@entry=0x56306101d3e0) at ../migration/savevm.c:2719
#23 0x000056305f75c17a in qemu_loadvm_state (f=0x56306125bde0) at 
../migration/savevm.c:2809
#24 0x000056305f74980e in process_incoming_migration_co (opaque=<optimized 
out>) at ../migration/migration.c:606
#25 0x000056305fab25cb in coroutine_trampoline (i0=<optimized out>, i1=<optimized 
out>) at ../util/coroutine-ucontext.c:177

I really think changing depth is a hack... :(

Here IMHO the problem is we have other missing calls to
address_space_to_flatview() during commit() and that caused the issue.
There aren't a lot of those, and sorry to miss yet another one.

So let me try one more time on this (with patch 1; I think I've got two
places missed in the previous attempt).  Let's see how it goes.

We may want to add a tracepoint or have some way to know when enfornced
commit() is triggered during the vm load phase.  I just noticed when you
worried about having enforced commit() to often then it beats the original
purpose and I think yes that's something to worry.

I still believe AHCI condition is rare (since e.g. you've passed all Juan's
tests even before we have this "do_commit" stuff), but in short: I think it
would still be interesting if you can capture all the users of enforced
commit() like the AHCI case you quoted before, and list them in the cover
letter in your next post (along with a new perf measurements, to make sure
your worry is not true on having too much enforced commit will invalid the
whole idea).

When I was digging this out, I also found some RCU issue when using
address_space_to_flatview() (when BQL was not taken), only in the
memory_region_clear_dirty_bitmap() path.  I hope the new assertion
(rcu_read_is_locked()) won't trigger on some of the use cases for you
already, but anyway feel free to ignore this whole paragraph for now until
if you see some assert(rcu_read_is_locked()) being triggered.  I plan to
post some RFC for fixing RCU and I'll have you copied just for reference
(may be separate issue as what you're working on).

Thanks,

Unfortunately, there is another case of stack overflow...

#8  memory_region_transaction_do_commit () at ../softmmu/memory.c:1145
#9  0x00005610e96d3665 in address_space_to_flatview (as=0x5610e9ece820 
<address_space_memory>) at 
/data00/migration/qemu-open/include/exec/memory.h:1119
#10 address_space_get_flatview (as=0x5610e9ece820 <address_space_memory>) at 
../softmmu/memory.c:818
#11 0x00005610e96dfa14 in address_space_cache_init (cache=cache@entry=0x56111cdee410, 
as=<optimized out>, addr=addr@entry=1048576, len=len@entry=4096, is_write=false)
    at ../softmmu/physmem.c:3350
#12 0x00005610e96a0928 in virtio_init_region_cache 
(vdev=vdev@entry=0x5610eb72bf10, n=n@entry=0) at ../hw/virtio/virtio.c:247
#13 0x00005610e96a0db4 in virtio_memory_listener_commit 
(listener=0x5610eb72bff8) at ../hw/virtio/virtio.c:3592
#14 0x00005610e96d2e5e in address_space_update_flatviews_all () at 
../softmmu/memory.c:1126
#15 memory_region_transaction_do_commit () at ../softmmu/memory.c:1145

Fortunately, this is probably the last one (at least according to the qtest 
results)😁.

BTW, Perhaps you can define do_commit as a non-static function? Because I'll 
use it in
address_space_to_flatview later.

My v6 patches are attached. If necessary, you can apply them and make some test.

Thanks!

Attachment: v6-0001-rcu-introduce-rcu_read_is_locked.patch
Description: Text document

Attachment: v6-0002-memory-add-sanity-check-in-address_space_to_flatv.patch
Description: Text document

Attachment: v6-0003-migration-reduce-time-of-loading-non-iterable-vms.patch
Description: Text document


reply via email to

[Prev in Thread] Current Thread [Next in Thread]