qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [RFC v5 0/3] migration: reduce time of loading non-iterable vmstate


From: Peter Xu
Subject: Re: [RFC v5 0/3] migration: reduce time of loading non-iterable vmstate
Date: Mon, 27 Feb 2023 15:56:59 -0500

On Mon, Feb 27, 2023 at 09:19:39PM +0800, Chuang Xu wrote:
> Hi, Peter

Hi, Chuang,

> 
> On 2023/2/25 δΈ‹εˆ11:32, Peter Xu wrote:
> > On Thu, Feb 23, 2023 at 11:28:46AM +0800, Chuang Xu wrote:
> > > Hi, Peter
> > Hi, Chuang,
> > 
> > > On 2023/2/22 δΈ‹εˆ11:57, Peter Xu wrote:
> > I don't see why it's wrong, and that's exactly what I wanted to guarantee,
> > that if memory_region_update_pending==true when updating ioeventfd, we may
> > have some serios problem.
> > 
> > For this, I split my patch into two parts and I put this change into the
> > last patch.  See the attachment.  If you worry about this, you can e.g. try
> > applying patch 1 only later, but I still don't see why it could.
> 
> Sorry, I made some mistakes in my previous understanding of the code. However,
> if this assertion is added, it will indeed trigger some coredump in qtest with
> my patches. Here is the coredump(This is the only one I found):
> 
> #0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
> #1  0x00007fa5e7b17535 in __GI_abort () at abort.c:79
> #2  0x00007fa5e7b1740f in __assert_fail_base (fmt=0x7fa5e7c78ef0 "%s%s%s:%u: 
> %s%sAssertion `%s' failed.\n%n",
>     assertion=0x56305fc02d60 "qemu_mutex_iothread_locked() && 
> !memory_region_update_pending", file=0x56305fc028cb "../softmmu/memory.c", 
> line=855, function=<optimized out>) at assert.c:92
> #3  0x00007fa5e7b251a2 in __GI___assert_fail 
> (assertion=assertion@entry=0x56305fc02d60 "qemu_mutex_iothread_locked() && 
> !memory_region_update_pending",
>     file=file@entry=0x56305fc028cb "../softmmu/memory.c", 
> line=line@entry=855, function=function@entry=0x56305fc03cc0 
> <__PRETTY_FUNCTION__.38596> "address_space_update_ioeventfds")
>     at assert.c:101
> #4  0x000056305f8a9194 in address_space_update_ioeventfds 
> (as=as@entry=0x563061293648) at ../softmmu/memory.c:855
> #5  0x000056305f8afe4f in address_space_init (as=as@entry=0x563061293648, 
> root=root@entry=0x5630612936a0, name=name@entry=0x5630612934f0 
> "virtio-net-pci") at ../softmmu/memory.c:3172
> #6  0x000056305f686e43 in do_pci_register_device (errp=0x7fa5e4f39850, 
> devfn=<optimized out>, name=0x563061011c40 "virtio-net-pci", 
> pci_dev=0x563061293410) at ../hw/pci/pci.c:1145
> #7  pci_qdev_realize (qdev=0x563061293410, errp=0x7fa5e4f39850) at 
> ../hw/pci/pci.c:2036
> #8  0x000056305f939daf in device_set_realized (obj=<optimized out>, 
> value=true, errp=0x7fa5e4f39ae0) at ../hw/core/qdev.c:510
> #9  0x000056305f93d156 in property_set_bool (obj=0x563061293410, v=<optimized 
> out>, name=<optimized out>, opaque=0x5630610221d0, errp=0x7fa5e4f39ae0) at 
> ../qom/object.c:2285
> #10 0x000056305f93f403 in object_property_set (obj=obj@entry=0x563061293410, 
> name=name@entry=0x56305fba6bc3 "realized", v=v@entry=0x563061e52a00, 
> errp=errp@entry=0x7fa5e4f39ae0)
>     at ../qom/object.c:1420
> #11 0x000056305f94247f in object_property_set_qobject 
> (obj=obj@entry=0x563061293410, name=name@entry=0x56305fba6bc3 "realized", 
> value=value@entry=0x563061e61cb0,
>     errp=errp@entry=0x7fa5e4f39ae0) at ../qom/qom-qobject.c:28
> #12 0x000056305f93f674 in object_property_set_bool (obj=0x563061293410, 
> name=name@entry=0x56305fba6bc3 "realized", value=value@entry=true, 
> errp=errp@entry=0x7fa5e4f39ae0)
>     at ../qom/object.c:1489
> #13 0x000056305f93959c in qdev_realize (dev=<optimized out>, 
> bus=bus@entry=0x563061c9c400, errp=errp@entry=0x7fa5e4f39ae0) at 
> ../hw/core/qdev.c:292
> #14 0x000056305f7244a0 in qdev_device_add_from_qdict (opts=0x563061e64c00, 
> from_json=<optimized out>, errp=<optimized out>, errp@entry=0x7fa5e4f39ae0)
>     at /data00/migration/qemu-open/include/hw/qdev-core.h:17
> #15 0x000056305f846c75 in failover_add_primary (errp=0x7fa5e4f39ad8, 
> n=0x563062043530) at ../hw/net/virtio-net.c:933
> #16 virtio_net_set_features (vdev=<optimized out>, 
> features=4611687122246533156) at ../hw/net/virtio-net.c:1004
> #17 0x000056305f872238 in virtio_set_features_nocheck 
> (vdev=vdev@entry=0x563062043530, val=val@entry=4611687122246533156) at 
> ../hw/virtio/virtio.c:2851
> #18 0x000056305f877e9e in virtio_load (vdev=0x563062043530, f=0x56306125bde0, 
> version_id=11) at ../hw/virtio/virtio.c:3027
> #19 0x000056305f73c601 in vmstate_load_state (f=f@entry=0x56306125bde0, 
> vmsd=0x56305fef16c0 <vmstate_virtio_net>, opaque=0x563062043530, 
> version_id=11) at ../migration/vmstate.c:137
> #20 0x000056305f757672 in vmstate_load (f=0x56306125bde0, se=0x563062176700) 
> at ../migration/savevm.c:919
> #21 0x000056305f757905 in qemu_loadvm_section_start_full 
> (f=f@entry=0x56306125bde0, mis=0x56306101d3e0) at ../migration/savevm.c:2503
> #22 0x000056305f75aca8 in qemu_loadvm_state_main (f=f@entry=0x56306125bde0, 
> mis=mis@entry=0x56306101d3e0) at ../migration/savevm.c:2719
> #23 0x000056305f75c17a in qemu_loadvm_state (f=0x56306125bde0) at 
> ../migration/savevm.c:2809
> #24 0x000056305f74980e in process_incoming_migration_co (opaque=<optimized 
> out>) at ../migration/migration.c:606
> #25 0x000056305fab25cb in coroutine_trampoline (i0=<optimized out>, 
> i1=<optimized out>) at ../util/coroutine-ucontext.c:177

I thought about something like this one and assumed it was fine, but I
forgot this can trigger after your other patch applied..  It means we need
to drop the 2nd patch that I provided in the last reply.

> 
> > I really think changing depth is a hack... :(
> > 
> > Here IMHO the problem is we have other missing calls to
> > address_space_to_flatview() during commit() and that caused the issue.
> > There aren't a lot of those, and sorry to miss yet another one.
> > 
> > So let me try one more time on this (with patch 1; I think I've got two
> > places missed in the previous attempt).  Let's see how it goes.
> > 
> > We may want to add a tracepoint or have some way to know when enfornced
> > commit() is triggered during the vm load phase.  I just noticed when you
> > worried about having enforced commit() to often then it beats the original
> > purpose and I think yes that's something to worry.
> > 
> > I still believe AHCI condition is rare (since e.g. you've passed all Juan's
> > tests even before we have this "do_commit" stuff), but in short: I think it
> > would still be interesting if you can capture all the users of enforced
> > commit() like the AHCI case you quoted before, and list them in the cover
> > letter in your next post (along with a new perf measurements, to make sure
> > your worry is not true on having too much enforced commit will invalid the
> > whole idea).
> > 
> > When I was digging this out, I also found some RCU issue when using
> > address_space_to_flatview() (when BQL was not taken), only in the
> > memory_region_clear_dirty_bitmap() path.  I hope the new assertion
> > (rcu_read_is_locked()) won't trigger on some of the use cases for you
> > already, but anyway feel free to ignore this whole paragraph for now until
> > if you see some assert(rcu_read_is_locked()) being triggered.  I plan to
> > post some RFC for fixing RCU and I'll have you copied just for reference
> > (may be separate issue as what you're working on).
> > 
> > Thanks,
> > 
> Unfortunately, there is another case of stack overflow...
> 
> #8  memory_region_transaction_do_commit () at ../softmmu/memory.c:1145
> #9  0x00005610e96d3665 in address_space_to_flatview (as=0x5610e9ece820 
> <address_space_memory>) at 
> /data00/migration/qemu-open/include/exec/memory.h:1119
> #10 address_space_get_flatview (as=0x5610e9ece820 <address_space_memory>) at 
> ../softmmu/memory.c:818
> #11 0x00005610e96dfa14 in address_space_cache_init 
> (cache=cache@entry=0x56111cdee410, as=<optimized out>, 
> addr=addr@entry=1048576, len=len@entry=4096, is_write=false)
>     at ../softmmu/physmem.c:3350
> #12 0x00005610e96a0928 in virtio_init_region_cache 
> (vdev=vdev@entry=0x5610eb72bf10, n=n@entry=0) at ../hw/virtio/virtio.c:247
> #13 0x00005610e96a0db4 in virtio_memory_listener_commit 
> (listener=0x5610eb72bff8) at ../hw/virtio/virtio.c:3592
> #14 0x00005610e96d2e5e in address_space_update_flatviews_all () at 
> ../softmmu/memory.c:1126
> #15 memory_region_transaction_do_commit () at ../softmmu/memory.c:1145
> 
> Fortunately, this is probably the last one (at least according to the qtest 
> results)😁.

I think this issue will also go away if you drop my previous patch 2,
because that patch contains a very tiny little functional change, where we
will reset memory_region_update_pending only after the global commit().

        MEMORY_LISTENER_CALL_GLOBAL(commit, Forward);
        memory_region_update_pending = false;

While I think we need at least for this stack to not trigger:

        memory_region_update_pending = false;
        MEMORY_LISTENER_CALL_GLOBAL(commit, Forward);

I do think this is tricky, though, e.g. if someone calls _to_flatview() in
either begin() or region_add() we can loop again, even though I really,
really don't think we should do that.

Today I also went back and had a look at the AHCI issue, it seems it's not
because of mr->ram not updated (FIR address is a random buffer on guest
RAM), but because we have "bus master as" for PCI devices.  That explains
why it happened before, and unfortunately I still can't think of any better
way than this do_commit() thing even if I tried one more time.

As a summary: please drop patch 2 and retry (with a rewritten do_commit()
by yourself; note the ordering I mentioned above!).  I hope this is the
last piece of puzzle, or we should revisit.

> 
> BTW, Perhaps you can define do_commit as a non-static function? Because I'll 
> use it in
> address_space_to_flatview later.

Feel free to modify whatever piece ofcode piece I offered in this thread to
suite your need; as long as it works for you. :)

Thanks,

-- 
Peter Xu




reply via email to

[Prev in Thread] Current Thread [Next in Thread]