qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [RFC v5 0/3] migration: reduce time of loading non-iterable vmstate


From: Chuang Xu
Subject: Re: [RFC v5 0/3] migration: reduce time of loading non-iterable vmstate
Date: Fri, 17 Feb 2023 16:11:19 +0800
User-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:102.0) Gecko/20100101 Thunderbird/102.6.1

Hi, Peter!

In my last email to Juan, I mentioned two errors.
Now I want to discuss them with you.

On 2023/2/16 下午11:41, Chuang Xu wrote:
I ran qtest with reference to your environment, and finally reported two errors.

Error 1(the same as yours):

 QTEST_QEMU_BINARY=./qemu-system-x86_64 MALLOC_PERTURB_=87 G_TEST_DBUS_DAEMON=/data00/migration/qemu-open/tests/dbus-vmstate-daemon.sh /data00/migration/qemu-open/build/tests/qtest/virtio-net-failover --tap -k ――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――― ―――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――
stderr:
qemu-system-x86_64: /data00/migration/qemu-open/include/exec/memory.h:1114: address_space_to_flatview: Assertion `(!memory_region_transaction_in_progress() && qemu_mutex_iothread_locked()) || rcu_read_is_locked()' failed.
Broken pipe
../tests/qtest/libqtest.c:190: kill_qemu() detected QEMU death from signal 6 (Aborted) (core dumped)

(test program exited with status code -6)

TAP parsing error: Too few tests run (expected 23, got 12)

Coredump backtrace:
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x00007f3af64a8535 in __GI_abort () at abort.c:79
#2  0x00007f3af64a840f in __assert_fail_base (fmt=0x7f3af6609ef0 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=0x55d9425f48a8 "(!memory_region_transaction_in_progress() && qemu_mutex_iothread_locked()) || rcu_read_is_locked()",     file=0x55d9425f4870 "/data00/migration/qemu-open/include/exec/memory.h", line=1114, function=<optimized out>) at assert.c:92 #3  0x00007f3af64b61a2 in __GI___assert_fail (assertion=assertion@entry=0x55d9425f48a8 "(!memory_region_transaction_in_progress() && qemu_mutex_iothread_locked()) || rcu_read_is_locked()",     file=file@entry=0x55d9425f4870 "/data00/migration/qemu-open/include/exec/memory.h", line=line@entry=1114, function=function@entry=0x55d9426cdce0 <__PRETTY_FUNCTION__.20039> "address_space_to_flatview") at assert.c:101 #4  0x000055d942373853 in address_space_to_flatview (as=0x55d944738648) at /data00/migration/qemu-open/include/exec/memory.h:1112 #5  0x000055d9423746f5 in address_space_to_flatview (as=0x55d944738648) at /data00/migration/qemu-open/include/qemu/rcu.h:126 #6  address_space_set_flatview (as=as@entry=0x55d944738648) at ../softmmu/memory.c:1029 #7  0x000055d94237ace3 in address_space_update_topology (as=0x55d944738648) at ../softmmu/memory.c:1080 #8  address_space_init (as=as@entry=0x55d944738648, root=root@entry=0x55d9447386a0, name=name@entry=0x55d9447384f0 "virtio-net-pci") at ../softmmu/memory.c:3082 #9  0x000055d942151e43 in do_pci_register_device (errp=0x7f3aef7fe850, devfn=<optimized out>, name=0x55d9444b6c40 "virtio-net-pci", pci_dev=0x55d944738410) at ../hw/pci/pci.c:1145 #10 pci_qdev_realize (qdev=0x55d944738410, errp=0x7f3aef7fe850) at ../hw/pci/pci.c:2036 #11 0x000055d942404a8f in device_set_realized (obj=<optimized out>, value=true, errp=0x7f3aef7feae0) at ../hw/core/qdev.c:510 #12 0x000055d942407e36 in property_set_bool (obj=0x55d944738410, v=<optimized out>, name=<optimized out>, opaque=0x55d9444c71d0, errp=0x7f3aef7feae0) at ../qom/object.c:2285 #13 0x000055d94240a0e3 in object_property_set (obj=obj@entry=0x55d944738410, name=name@entry=0x55d942670c23 "realized", v=v@entry=0x55d9452f7a00, errp=errp@entry=0x7f3aef7feae0) at ../qom/object.c:1420 #14 0x000055d94240d15f in object_property_set_qobject (obj=obj@entry=0x55d944738410, name=name@entry=0x55d942670c23 "realized", value=value@entry=0x55d945306cb0, errp=errp@entry=0x7f3aef7feae0) at ../qom/qom-qobject.c:28 #15 0x000055d94240a354 in object_property_set_bool (obj=0x55d944738410, name=name@entry=0x55d942670c23 "realized", value=value@entry=true, errp=errp@entry=0x7f3aef7feae0) at ../qom/object.c:1489 #16 0x000055d94240427c in qdev_realize (dev=<optimized out>, bus=bus@entry=0x55d945141400, errp=errp@entry=0x7f3aef7feae0) at ../hw/core/qdev.c:292 #17 0x000055d9421ef4a0 in qdev_device_add_from_qdict (opts=0x55d945309c00, from_json=<optimized out>, errp=<optimized out>, errp@entry=0x7f3aef7feae0) at /data00/migration/qemu-open/include/hw/qdev-core.h:17 #18 0x000055d942311c85 in failover_add_primary (errp=0x7f3aef7fead8, n=0x55d9454e8530) at ../hw/net/virtio-net.c:933 #19 virtio_net_set_features (vdev=<optimized out>, features=4611687122246533156) at ../hw/net/virtio-net.c:1004 #20 0x000055d94233d248 in virtio_set_features_nocheck (vdev=vdev@entry=0x55d9454e8530, val=val@entry=4611687122246533156) at ../hw/virtio/virtio.c:2851 #21 0x000055d942342eae in virtio_load (vdev=0x55d9454e8530, f=0x55d944700de0, version_id=11) at ../hw/virtio/virtio.c:3027 #22 0x000055d942207601 in vmstate_load_state (f=f@entry=0x55d944700de0, vmsd=0x55d9429baba0 <vmstate_virtio_net>, opaque=0x55d9454e8530, version_id=11) at ../migration/vmstate.c:137 #23 0x000055d942222672 in vmstate_load (f=0x55d944700de0, se=0x55d94561b700) at ../migration/savevm.c:919 #24 0x000055d942222927 in qemu_loadvm_section_start_full (f=f@entry=0x55d944700de0, mis=0x55d9444c23e0) at ../migration/savevm.c:2503 #25 0x000055d942225cc8 in qemu_loadvm_state_main (f=f@entry=0x55d944700de0, mis=mis@entry=0x55d9444c23e0) at ../migration/savevm.c:2729 #26 0x000055d942227195 in qemu_loadvm_state (f=0x55d944700de0) at ../migration/savevm.c:2816 #27 0x000055d94221480e in process_incoming_migration_co (opaque=<optimized out>) at ../migration/migration.c:606 #28 0x000055d94257d2ab in coroutine_trampoline (i0=<optimized out>, i1=<optimized out>) at ../util/coroutine-ucontext.c:177 #29 0x00007f3af64d2c80 in __correctly_grouped_prefixwc (begin=0x2 <error: Cannot access memory at address 0x2>, end=0x0, thousands=0 L'\000', grouping=0x7f3af64bd8eb <__GI_raise+267> "H\213\214$\b\001") at grouping.c:171
#30 0x0000000000000000 in ?? ()


It seems that when address_space_to_flatview() is called, there is mr transaction
in progress, and the rcu read lock is not held.

I need to further consider the conditions for sanity check or whether we can hold the
rcu read lock before address_space_init() to solve the problem.


Error 2:

ERROR:../tests/qtest/migration-helpers.c:205:wait_for_migration_status: assertion failed: (g_test_timer_elapsed() < MIGRATION_STATUS_WAIT_TIMEOUT) ERROR 180/180 qemu:qtest+qtest-x86_64 / qtest-x86_64/migration-test                  ERROR 146.32s   killed by signal 6 SIGABRT
QTEST_QEMU_BINARY=./qemu-system-x86_64 MALLOC_PERTURB_=250 G_TEST_DBUS_DAEMON=/data00/migration/qemu-open/tests/dbus-vmstate-daemon.sh /data00/migration/qemu-open/build/tests/qtest/migration-test --tap -k
――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――― ――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――― qemu-system-x86_64: ../softmmu/memory.c:1094: memory_region_transaction_commit: Assertion `qemu_mutex_iothread_locked()' failed.
**
ERROR:../tests/qtest/migration-helpers.c:205:wait_for_migration_status: assertion failed: (g_test_timer_elapsed() < MIGRATION_STATUS_WAIT_TIMEOUT) ../tests/qtest/libqtest.c:190: kill_qemu() detected QEMU death from signal 6 (Aborted) (core dumped)

(test program exited with status code -6)

Coredump backtrace:
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x00007fed5c14d535 in __GI_abort () at abort.c:79
#2  0x00007fed5c14d40f in __assert_fail_base (fmt=0x7fed5c2aeef0 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=0x561bc4152424 "qemu_mutex_iothread_locked()", file=0x561bc41ae94b "../softmmu/memory.c", line=1094, function=<optimized out>) at assert.c:92 #3  0x00007fed5c15b1a2 in __GI___assert_fail (assertion=assertion@entry=0x561bc4152424 "qemu_mutex_iothread_locked()", file=file@entry=0x561bc41ae94b "../softmmu/memory.c", line=line@entry=1094,     function=function@entry=0x561bc41afca0 <__PRETTY_FUNCTION__.38746> "memory_region_transaction_commit") at assert.c:101 #4  0x0000561bc3e5a053 in memory_region_transaction_commit () at ../softmmu/memory.c:1094 #5  0x0000561bc3d07b55 in qemu_loadvm_state_main (f=f@entry=0x561bc6443aa0, mis=mis@entry=0x561bc62028a0) at ../migration/savevm.c:2789 #6  0x0000561bc3d08e46 in postcopy_ram_listen_thread (opaque=opaque@entry=0x561bc62028a0) at ../migration/savevm.c:1922 #7  0x0000561bc404b3da in qemu_thread_start (args=<optimized out>) at ../util/qemu-thread-posix.c:505 #8  0x00007fed5c2f2fa3 in start_thread (arg=<optimized out>) at pthread_create.c:486 #9  0x00007fed5c22406f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95


Error2 is related to postcopy. I don't know much about the code of postcopy.
So I need some time to look at this part of code.

And later I will send another email to discuss it with Peter.

Copy Peter.

Thanks!

Error 1 was triggered by our sanity check. I try to add RCU_READ_LOCK_GUARD()
in address_space_init() and it works. But I'm not sure if this code change is
appropriate. If this change is not appropriate, we may need to consider other
sanity check.

Error 2 was related to postcopy. I read the official document of postcopy
(I hope it is the latest) and learned that two threads will call
qemu_loadvm_state_main() in the process of postcopy. The one called by main 
thread
will take the BQL, and the one called by ram_listen thread won't take the BQL.
The latter checks whether the BQL is held when calling 
memory_region_transaction_commit(),
thus triggering the assertion. Creating a new function 
qemu_loadvm_state_ram_listen()
without memory_region_transaction_commit() will solve this error.

I don't know if you suggest using this patch in postcopy. If this patch is 
applicable to
postcopy, considering the difference between how postcopy and precheck load 
device state,
do we need to consider more details?

Looking forward to your reply.

Thanks.







reply via email to

[Prev in Thread] Current Thread [Next in Thread]