qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Question on memory commit during MR finalize()


From: Peter Xu
Subject: Re: Question on memory commit during MR finalize()
Date: Fri, 16 Jul 2021 10:18:43 -0400

On Fri, Jul 16, 2021 at 11:42:02AM +0000, Thanos Makatos wrote:
> > -----Original Message-----
> > From: Peter Xu <peterx@redhat.com>
> > Sent: 15 July 2021 19:35
> > To: Thanos Makatos <thanos.makatos@nutanix.com>
> > Cc: Paolo Bonzini <pbonzini@redhat.com>; Markus Armbruster
> > <armbru@redhat.com>; QEMU Devel Mailing List <qemu-
> > devel@nongnu.org>; John Levon <john.levon@nutanix.com>; John G
> > Johnson <john.g.johnson@oracle.com>
> > Subject: Re: Question on memory commit during MR finalize()
> > 
> > On Thu, Jul 15, 2021 at 02:27:48PM +0000, Thanos Makatos wrote:
> > > Hi Peter,
> > 
> > Hi, Thanos,
> > 
> > > We're hitting this issue using a QEMU branch where JJ is using vfio-user 
> > > as
> > the transport for multiprocess-qemu
> > (https://urldefense.proofpoint.com/v2/url?u=https-
> > 3A__github.com_oracle_qemu_issues_9&d=DwIBaQ&c=s883GpUCOChKOHi
> > ocYtGcg&r=XTpYsh5Ps2zJvtw6ogtti46atk736SI4vgsJiUKIyDE&m=9nFuGF9kg5l
> > ZsKPi03BNzo9pckG8DlodVG0LuEofnKw&s=dcp70CIgJljcWFwSRZm5zZRJj80jX
> > XERLwpbH6ZcgzQ&e= ). We can reproduce it fairly reliably by migrating a
> > virtual SPDK NVMe controller (the NVMf/vfio-user target with experimental
> > migration support, https://urldefense.proofpoint.com/v2/url?u=https-
> > 3A__review.spdk.io_gerrit_c_spdk_spdk_-
> > 2B_7617_14&d=DwIBaQ&c=s883GpUCOChKOHiocYtGcg&r=XTpYsh5Ps2zJvtw
> > 6ogtti46atk736SI4vgsJiUKIyDE&m=9nFuGF9kg5lZsKPi03BNzo9pckG8DlodVG0
> > LuEofnKw&s=iXolOQM5sYj4IB-cf__Ta8jgKXZqisYE-uuwq6qnbLo&e= ). I can
> > provide detailed repro instructions but first I want to make sure we're not
> > missing any patches.
> > 
> > I don't think you missed any bug fix patches, as the issue I mentioned can
> > only be trigger with my own branch at that time, and that's fixed when my
> > patchset got merged.
> > 
> > However if you encountered the same issue, it's possible that there's an
> > incorrect use of qemu memory/cpu API too somewhere there so similar
> > issue is triggered.  For example, in my case it was run_on_cpu() called
> > incorrectly within memory layout changing so BQL is released without being
> > noticed.
> > 
> > I've got a series that tries to expose these hard to debug issues:
> > 
> > https://urldefense.proofpoint.com/v2/url?u=https-
> > 3A__lore.kernel.org_qemu-2Ddevel_20200421162108.594796-2D1-2Dpeterx-
> > 40redhat.com_&d=DwIBaQ&c=s883GpUCOChKOHiocYtGcg&r=XTpYsh5Ps2zJ
> > vtw6ogtti46atk736SI4vgsJiUKIyDE&m=9nFuGF9kg5lZsKPi03BNzo9pckG8Dlod
> > VG0LuEofnKw&s=kQRJEb4CQmxEirS-III15QJz_phzhCYLIgjOF-SB9Pk&e=
> > 
> > Obviously the series didn't track enough interest so it didn't get merged.
> > However maybe that's also something useful to what you're debugging, so
> > you can apply those patches onto your branch and see the stack when it
> > reproduces again. Logically with these sanity patches it could fail earlier 
> > than
> > what you've hit right now (which I believe should be within the RCU thread;
> > btw it would be interesting to share your stack too when it's hit) and it 
> > could
> > provide more useful information.
> > 
> > I saw that the old series won't apply onto master any more, so I rebased it
> > and pushed it here (with one patch dropped since someone wrote a similar
> > patch and got merged, so there're only 7 patches in the new tree):
> > 
> > https://urldefense.proofpoint.com/v2/url?u=https-
> > 3A__github.com_xzpeter_qemu_tree_memory-
> > 2Dsanity&d=DwIBaQ&c=s883GpUCOChKOHiocYtGcg&r=XTpYsh5Ps2zJvtw6og
> > tti46atk736SI4vgsJiUKIyDE&m=9nFuGF9kg5lZsKPi03BNzo9pckG8DlodVG0LuE
> > ofnKw&s=G-8FV-H-VcZTgCVRfTEVKo1GALIk2PqBvTdAcAXFoZ0&e=
> > 
> > No guarantee it'll help, but IMHO worth trying.
> 
> The memory-sanity branch fails to build:
> 
> ./configure --prefix=/opt/qemu-xzpeter --target-list=x86_64-linux-user  
> --enable-debug
> make -j 8
> ...
> [697/973] Linking target qemu-x86_64
> FAILED: qemu-x86_64
> c++  -o qemu-x86_64 libcommon.fa.p/cpus-common.c.o 
> libcommon.fa.p/page-vary-common.c.o libcommon.fa.p/disas_i386.c.o 
> libcommon.fa.p/disas_capstone.c.o libcommon.fa.p/hw_core_cpu-common.c.o 
> libcommon.fa.p/ebpf_ebpf_rss-stub.c.o libcommon.fa.p/accel_accel-user.c.o 
> libqemu-x86_64-linux-user.fa.p/target_i386_tcg_user_excp_helper.c.o 
> libqemu-x86_64-linux-user.fa.p/target_i386_tcg_user_seg_helper.c.o 
> libqemu-x86_64-linux-user.fa.p/linux-user_x86_64_signal.c.o 
> libqemu-x86_64-linux-user.fa.p/linux-user_x86_64_cpu_loop.c.o 
> libqemu-x86_64-linux-user.fa.p/target_i386_cpu.c.o 
> libqemu-x86_64-linux-user.fa.p/target_i386_gdbstub.c.o 
> libqemu-x86_64-linux-user.fa.p/target_i386_helper.c.o 
> libqemu-x86_64-linux-user.fa.p/target_i386_xsave_helper.c.o 
> libqemu-x86_64-linux-user.fa.p/target_i386_cpu-dump.c.o 
> libqemu-x86_64-linux-user.fa.p/target_i386_sev-stub.c.o 
> libqemu-x86_64-linux-user.fa.p/target_i386_kvm_kvm-stub.c.o 
> libqemu-x86_64-linux-user.fa.p/target_i386_tcg_bpt_helper.c.o 
> libqemu-x86_64-linux-user.fa.p/target_i386_tcg_cc_helper.c.o 
> libqemu-x86_64-linux-user.fa.p/target_i386_tcg_excp_helper.c.o 
> libqemu-x86_64-linux-user.fa.p/target_i386_tcg_fpu_helper.c.o 
> libqemu-x86_64-linux-user.fa.p/target_i386_tcg_int_helper.c.o 
> libqemu-x86_64-linux-user.fa.p/target_i386_tcg_mem_helper.c.o 
> libqemu-x86_64-linux-user.fa.p/target_i386_tcg_misc_helper.c.o 
> libqemu-x86_64-linux-user.fa.p/target_i386_tcg_mpx_helper.c.o 
> libqemu-x86_64-linux-user.fa.p/target_i386_tcg_seg_helper.c.o 
> libqemu-x86_64-linux-user.fa.p/target_i386_tcg_tcg-cpu.c.o 
> libqemu-x86_64-linux-user.fa.p/target_i386_tcg_translate.c.o 
> libqemu-x86_64-linux-user.fa.p/trace_control-target.c.o 
> libqemu-x86_64-linux-user.fa.p/cpu.c.o 
> libqemu-x86_64-linux-user.fa.p/disas.c.o 
> libqemu-x86_64-linux-user.fa.p/gdbstub.c.o 
> libqemu-x86_64-linux-user.fa.p/page-vary.c.o 
> libqemu-x86_64-linux-user.fa.p/tcg_optimize.c.o 
> libqemu-x86_64-linux-user.fa.p/tcg_region.c.o 
> libqemu-x86_64-linux-user.fa.p/tcg_tcg.c.o 
> libqemu-x86_64-linux-user.fa.p/tcg_tcg-common.c.o 
> libqemu-x86_64-linux-user.fa.p/tcg_tcg-op.c.o 
> libqemu-x86_64-linux-user.fa.p/tcg_tcg-op-gvec.c.o 
> libqemu-x86_64-linux-user.fa.p/tcg_tcg-op-vec.c.o 
> libqemu-x86_64-linux-user.fa.p/fpu_softfloat.c.o 
> libqemu-x86_64-linux-user.fa.p/accel_accel-common.c.o 
> libqemu-x86_64-linux-user.fa.p/accel_tcg_tcg-all.c.o 
> libqemu-x86_64-linux-user.fa.p/accel_tcg_cpu-exec-common.c.o 
> libqemu-x86_64-linux-user.fa.p/accel_tcg_cpu-exec.c.o 
> libqemu-x86_64-linux-user.fa.p/accel_tcg_tcg-runtime-gvec.c.o 
> libqemu-x86_64-linux-user.fa.p/accel_tcg_tcg-runtime.c.o 
> libqemu-x86_64-linux-user.fa.p/accel_tcg_translate-all.c.o 
> libqemu-x86_64-linux-user.fa.p/accel_tcg_translator.c.o 
> libqemu-x86_64-linux-user.fa.p/accel_tcg_user-exec.c.o 
> libqemu-x86_64-linux-user.fa.p/accel_tcg_user-exec-stub.c.o 
> libqemu-x86_64-linux-user.fa.p/accel_tcg_plugin-gen.c.o 
> libqemu-x86_64-linux-user.fa.p/accel_stubs_hax-stub.c.o 
> libqemu-x86_64-linux-user.fa.p/accel_stubs_xen-stub.c.o 
> libqemu-x86_64-linux-user.fa.p/accel_stubs_kvm-stub.c.o 
> libqemu-x86_64-linux-user.fa.p/plugins_loader.c.o 
> libqemu-x86_64-linux-user.fa.p/plugins_core.c.o 
> libqemu-x86_64-linux-user.fa.p/plugins_api.c.o 
> libqemu-x86_64-linux-user.fa.p/linux-user_elfload.c.o 
> libqemu-x86_64-linux-user.fa.p/linux-user_exit.c.o 
> libqemu-x86_64-linux-user.fa.p/linux-user_fd-trans.c.o 
> libqemu-x86_64-linux-user.fa.p/linux-user_linuxload.c.o 
> libqemu-x86_64-linux-user.fa.p/linux-user_main.c.o 
> libqemu-x86_64-linux-user.fa.p/linux-user_mmap.c.o 
> libqemu-x86_64-linux-user.fa.p/linux-user_safe-syscall.S.o 
> libqemu-x86_64-linux-user.fa.p/linux-user_signal.c.o 
> libqemu-x86_64-linux-user.fa.p/linux-user_strace.c.o 
> libqemu-x86_64-linux-user.fa.p/linux-user_syscall.c.o 
> libqemu-x86_64-linux-user.fa.p/linux-user_uaccess.c.o 
> libqemu-x86_64-linux-user.fa.p/linux-user_uname.c.o 
> libqemu-x86_64-linux-user.fa.p/thunk.c.o 
> libqemu-x86_64-linux-user.fa.p/meson-generated_.._x86_64-linux-user-gdbstub-xml.c.o
>  
> libqemu-x86_64-linux-user.fa.p/meson-generated_.._trace_generated-helpers.c.o 
> -Wl,--as-needed -Wl,--no-undefined -pie -Wl,--whole-archive libhwcore.fa 
> libqom.fa -Wl,--no-whole-archive -Wl,--warn-common -Wl,-z,relro -Wl,-z,now 
> -m64 -fstack-protector-strong -Wl,--start-group libcapstone.a libqemuutil.a 
> libhwcore.fa libqom.fa -ldl 
> -Wl,--dynamic-list=/root/src/qemu/build/qemu-plugins-ld.symbols -lrt -lutil 
> -lm -pthread -Wl,--export-dynamic -lgmodule-2.0 -lglib-2.0 -lstdc++ 
> -Wl,--end-group
> /usr/bin/ld: libcommon.fa.p/cpus-common.c.o: in function `do_run_on_cpu':
> /root/src/qemu/build/../cpus-common.c:153: undefined reference to 
> `qemu_cond_wait_iothread'
> collect2: error: ld returned 1 exit status
> [698/973] Compiling C object 
> tests/fp/libsoftfloat.a.p/berkeley-softfloat-3_source_f32_to_ui64_r_minMag.c.o
> [699/973] Compiling C object 
> tests/fp/libsoftfloat.a.p/berkeley-softfloat-3_source_f32_to_i32_r_minMag.c.o
> [700/973] Compiling C object 
> tests/fp/libsoftfloat.a.p/berkeley-softfloat-3_source_f32_to_f16.c.o
> [701/973] Compiling C object 
> tests/fp/libsoftfloat.a.p/berkeley-softfloat-3_source_f32_to_f64.c.o
> [702/973] Compiling C object 
> tests/fp/libsoftfloat.a.p/berkeley-softfloat-3_source_f32_to_i64_r_minMag.c.o
> [703/973] Compiling C object 
> tests/fp/libsoftfloat.a.p/berkeley-softfloat-3_source_f32_to_extF80M.c.o
> [704/973] Compiling C object 
> tests/fp/libsoftfloat.a.p/berkeley-softfloat-3_source_f32_to_extF80.c.o
> ninja: build stopped: subcommand failed.
> make[1]: *** [Makefile:154: run-ninja] Error 1
> make[1]: Leaving directory '/root/src/qemu/build'
> make: *** [GNUmakefile:11: all] Error 2

So it fails linux-user...  I can fix the compilation, but it should pass
x86_64-softmmu. More importantly - are you using linux-user binaries?  The
thing is my branch will only be helpful to debug BQL related issues, so if
that's the case then please ignore the branch as linux-user shouldn't be using
bql, then my branch won't help.

> 
> Regarding the stack trace, I can very easily reproduce it on our branch, I 
> know exactly where to set the breakpoint:
> 
> (gdb) r
> Starting prThread 0x7fffeffff7 In: __pthread_cond_waitu host -enable-kvm -smp 
> 4 -nographic -m 2G -object 
> memory-backend-file,id=mem0,size=2G,mem-path=/dev/hugepages,share=on,prealloc=yes,
>  -numa node,memdev=mem0 -L88   PC: 0x7ffff772700cuThread 8 "qemu-system-x86" 
> received signal SIGUSR1, User defined signal 1.
>                         f58c1        GI_raise                                 
>                                                                               
>                                                          50               
> 58f7bb
> #0  0x00007ffff758f7bb in __GI_raise (sig=sig@entry=6) at 
> ../sysdeps/unix/sysv/linux/raise.c:50
> #1  0x00007ffff757a535 in __GI_abort () at abort.c:79
> #2  0x0000555555c9301e in kvm_set_phys_mem (kml=0x5555568ee830, 
> section=0x7ffff58c05e0, add=true) at ../accel/kvm/kvm-all.c:1194
> #3  0x0000555555c930cd in kvm_region_add (listener=0x5555568ee830, 
> section=0x7ffff58c05e0) at ../accel/kvm/kvm-all.c:1211
> #4  0x0000555555bd6c9e in address_space_update_topology_pass 
> (as=0x555556648420 <address_space_memory>, old_view=0x555556f21730, 
> new_view=0x7ffff0001cb0, adding=true) at ../softmmu/memory.c:971
> #5  0x0000555555bd6f98 in address_space_set_flatview (as=0x555556648420 
> <address_space_memory>) at ../softmmu/memory.c:1047
> #6  0x0000555555bd713f in memory_region_transaction_commit () at 
> ../softmmu/memory.c:1099
> #7  0x0000555555bd89a5 in memory_region_finalize (obj=0x555556e21800) at 
> ../softmmu/memory.c:1751
> #8  0x0000555555cca132 in object_deinit (obj=0x555556e21800, 
> type=0x5555566a8f80) at ../qom/object.c:673
> #9  0x0000555555cca1a4 in object_finalize (data=0x555556e21800) at 
> ../qom/object.c:687
> #10 0x0000555555ccb196 in object_unref (objptr=0x555556e21800) at 
> ../qom/object.c:1186
> #11 0x0000555555bb11f0 in phys_section_destroy (mr=0x555556e21800) at 
> ../softmmu/physmem.c:1171
> #12 0x0000555555bb124a in phys_sections_free (map=0x5555572cf9a0) at 
> ../softmmu/physmem.c:1180
> #13 0x0000555555bb4632 in address_space_dispatch_free (d=0x5555572cf990) at 
> ../softmmu/physmem.c:2562
> #14 0x0000555555bd4485 in flatview_destroy (view=0x5555572cf950) at 
> ../softmmu/memory.c:291
> #15 0x0000555555e367e8 in call_rcu_thread (opaque=0x0) at ../util/rcu.c:281
> #16 0x0000555555e68e57 in qemu_thread_start (args=0x555556665e30) at 
> ../util/qemu-thread-posix.c:521
> #17 0x00007ffff7720fa3 in start_thread (arg=<optimized out>) at 
> pthread_create.c:486lot=10, start=0xfebd1000, size=0x1000: File exists
> #18 0x00007ffff76514cf in clone () at 
> ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Yes indeed it looks alike.

-- 
Peter Xu




reply via email to

[Prev in Thread] Current Thread [Next in Thread]