[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] Question about wrong ram-node0 reference
From: |
liujunjie (A) |
Subject: |
Re: [Qemu-devel] Question about wrong ram-node0 reference |
Date: |
Mon, 27 May 2019 12:51:00 +0000 |
We find only one VM aborted among at least 20 VMs with the same configuration.
And this problem does not reproduce yet... (Be aware of reproduce is importance
to figure out the problem, we already tried to add more VMs to reproduce, but
no results yet.)
The qemu cmdline is as follows:
/usr/bin/qemu-kvm -name guest=instance-00025bf8,debug-threads=on -S -object
secret,id=masterKey0,format=raw,file=/var/run/libvirt/qemu/domain-118-instance-00025bf8/master-key.aes
-machine
pc-i440fx-2.8,accel=kvm,usb=off,dump-guest-core=off,max-ram-below-4g=2G -cpu
host,host-cache-info=on -m 131072 -realtime min_guarantee=131072,mlock=off -smp
16,sockets=2,cores=4,threads=2 -object iothread,id=iothread1 -object
iothread,id=iothread2 -object iothread,id=iothread3 -object
iothread,id=iothread4 -object iothread,id=iothread5 -object
iothread,id=iothread6 -object iothread,id=iothread7 -object
iothread,id=iothread8 -object iothread,id=iothread9 -object
iothread,id=iothread10 -object iothread,id=iothread11 -object
iothread,id=iothread12 -object iothread,id=iothread13 -object
iothread,id=iothread14 -object iothread,id=iothread15 -object
iothread,id=iothread16 -object iothread,id=iothread17 -object
iothread,id=iothread18 -object iothread,id=iothread19 -object
iothread,id=iothread20 -object iothread,id=iothread21 -object
iothread,id=iothread22 -object iothread,id=iothread23 -object
iothread,id=iothread24 -object iothread,id=iothread25 -object
iothread,id=iothread26 -object iothread,id=iothread27 -object
iothread,id=iothread28 -object iothread,id=iothread29 -object
memory-backend-file,id=ram-node0,prealloc=yes,mem-path=/dev/hugepages/libvirt/qemu/118-instance-00025bf8,share=yes,size=68719476736,host-nodes=0,policy=bind
-numa node,nodeid=0,cpus=0-7,memdev=ram-node0 -object
memory-backend-file,id=ram-node1,prealloc=yes,mem-path=/dev/hugepages/libvirt/qemu/118-instance-00025bf8,share=yes,size=68719476736,host-nodes=1,policy=bind
-numa node,nodeid=1,cpus=8-15,memdev=ram-node1 -uuid
6952c043-4e0c-4267-80c1-fac2e302443f -smbios type=1,manufacturer=OpenStack
Foundation,product=OpenStack
Nova,version=13.2.1-20181119144459,serial=c5cc21e6-1d3b-4587-8c1e-208a1d19a47e,uuid=6952c043-4e0c-4267-80c1-fac2e302443f,family=Virtual
Machine -no-user-config -nodefaults -chardev
socket,id=charmonitor,path=/var/run/libvirt/qemu/domain-118-instance-00025bf8/monitor.sock,server,nowait
-mon chardev=charmonitor,id=monitor,mode=control -rtc
base=2019-01-21T06:59:37,clock=vm,driftfix=slew -global
kvm-pit.lost_tick_policy=delay -no-hpet -no-shutdown -boot strict=on -device
pci-bridge,chassis_nr=1,id=pci.1,bus=pci.0,addr=0x3 -device
pci-bridge,chassis_nr=2,id=pci.2,bus=pci.0,addr=0x4 -device
pci-bridge,chassis_nr=3,id=pci.3,bus=pci.0,addr=0x5 -device
pci-bridge,chassis_nr=4,id=pci.4,bus=pci.0,addr=0x6 -device
pci-bridge,chassis_nr=5,id=pci.5,bus=pci.0,addr=0x7 -device
pci-bridge,chassis_nr=6,id=pci.6,bus=pci.0,addr=0x8 -device
pci-bridge,chassis_nr=7,id=pci.7,bus=pci.0,addr=0x9 -device
pci-bridge,chassis_nr=8,id=pci.8,bus=pci.0,addr=0xa -device
piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device
virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0xb -drive
file=/dev/mapper/648d06e72e68404a9401854e21409f3d-dm,format=raw,if=none,id=drive-virtio-disk0,serial=648d06e7-2e68-404a-9401-854e21409f3d,cache=none,aio=native
-device
virtio-blk-pci,scsi=off,bus=pci.2,addr=0x1,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1
-chardev socket,id=charnet0,path=/var/run/vhost-user/tap4ba9f4eb-19 -netdev
vhost-user,chardev=charnet0,queues=4,id=hostnet0 -device
virtio-net-pci,mq=on,vectors=10,netdev=hostnet0,id=net0,mac=fa:16:3e:0f:ed:94,bus=pci.4,addr=0x3,bootindex=2
-add-fd set=0,fd=45 -chardev file,id=charserial0,path=/dev/fdset/0,append=on
-device isa-serial,chardev=charserial0,id=serial0 -chardev
socket,id=charchannel0,path=/var/run/libvirt/qemu/instance-00025bf8.extend,server,nowait
-device
virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.qemu.guest_agent.1
-chardev
socket,id=charchannel1,path=/var/run/libvirt/qemu/instance-00025bf8.agent,server,nowait
-device
virtserialport,bus=virtio-serial0.0,nr=2,chardev=charchannel1,id=channel1,name=org.qemu.guest_agent.0
-chardev
socket,id=charchannel2,path=/var/run/libvirt/qemu/instance-00025bf8.hostd,server,nowait
-device
virtserialport,bus=virtio-serial0.0,nr=3,chardev=charchannel2,id=channel2,name=org.qemu.guest_agent.2
-chardev
socket,id=charchannel3,path=/var/run/libvirt/qemu/instance-00025bf8.upgraded,server,nowait
-device
virtserialport,bus=virtio-serial0.0,nr=4,chardev=charchannel3,id=channel3,name=org.qemu.guest_agent.3
-device usb-tablet,id=input0,bus=usb.0,port=1 -vnc 172.28.5.246:3,password -k
en-us -device cirrus-vga,id=video0,vgamem_mb=16,bus=pci.0,addr=0x2 -device
vfio-pci,host=95:00.0,id=hostdev0,bus=pci.5,addr=0x1 -device
vfio-pci,host=99:00.0,id=hostdev1,bus=pci.5,addr=0x2 -device
vfio-pci,host=35:00.0,id=hostdev2,peer-clique-id=0,iomem=0x98000000-0x98ffffff:0x3e800000000-0x3ebffffffff:0x3ec00000000-0x3ec01ffffff,bus=pci.0,addr=0xc
-device
vfio-pci,host=39:00.0,id=hostdev3,peer-clique-id=0,iomem=0x92000000-0x92ffffff:0x3e000000000-0x3e3ffffffff:0x3e400000000-0x3e401ffffff,bus=pci.0,addr=0xd
-global p2p.downstream_ports=28:10.0 28:14.0 -device
virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0xe -NetInterruptAutobind
-chardev
file,id=seabios,path=/var/log/libvirt/qemu/instance-00025bf8.seabios,mux=off,append=on
-device isa-debugcon,iobase=0x402,chardev=seabios -msg timestamp=on
> -----Original Message-----
> From: Igor Mammedov [mailto:address@hidden
> Sent: Monday, May 27, 2019 3:57 PM
> To: liujunjie (A) <address@hidden>
> Cc: address@hidden; address@hidden; address@hidden;
> address@hidden; Zhoujian (jay) <address@hidden>; fangying
> <address@hidden>; wangxin (U) <address@hidden>;
> Huangweidong (C) <address@hidden>
> Subject: Re: Question about wrong ram-node0 reference
>
> On Sat, 25 May 2019 03:35:20 +0000
> "liujunjie (A)" <address@hidden> wrote:
>
> > Hi, I have met a problem:
> >
> > The QEMU version is 2.8.1, the virtual machine is configured with 1G huge
> pages, two NUMA nodes and four pass-through NVME SSDs.
> >
> > After we started the VM, in addition to some QMP queries nothing more has
> been done, the QEMU aborted after some months later.
> > After that, the VM is restarted, and the problem does not reproduce yet.
> > And The backtrace of the RCU thread is as follows:
> > (gdb) bt
> > #0 0x00007fd2695f0197 in raise () from /usr/lib64/libc.so.6
> > #1 0x00007fd2695f1888 in abort () from /usr/lib64/libc.so.6
> > #2 0x00007fd2695e9206 in __assert_fail_base () from /usr/lib64/libc.so.6
> > #3 0x00007fd2695e92b2 in __assert_fail () from /usr/lib64/libc.so.6
> > #4 0x0000000000476a84 in memory_region_finalize (obj=<optimized out>)
> > at /home/abuild/rpmbuild/BUILD/qemu-kvm-2.8.1/memory.c:1512
> > #5 0x0000000000763105 in object_deinit (address@hidden,
> > address@hidden) at qom/object.c:448
> > #6 0x0000000000763153 in object_finalize (data=0x1dc1fd0) at
> qom/object.c:462
> > #7 0x00000000007627cc in object_property_del_all
> (address@hidden)
> > at qom/object.c:399
> > #8 0x0000000000763148 in object_finalize (data=0x1dc1f70) at
> qom/object.c:461
> > #9 0x0000000000764426 in object_unref (obj=<optimized out>) at
> qom/object.c:897
> > #10 0x0000000000473b6b in memory_region_unref (mr=<optimized out>)
> > at /home/abuild/rpmbuild/BUILD/qemu-kvm-2.8.1/memory.c:1560
> > #11 0x0000000000473bc7 in flatview_destroy (view=0x7fc188b9cb90)
> > at /home/abuild/rpmbuild/BUILD/qemu-kvm-2.8.1/memory.c:289
> > #12 0x0000000000843be0 in call_rcu_thread (opaque=<optimized out>)
> > at util/rcu.c:279
> > #13 0x00000000008325c2 in qemu_thread_start
> (address@hidden)
> > at util/qemu_thread_posix.c:496
> > #14 0x00007fd269983dc5 in start_thread () from /usr/lib64/libpthread.so.0
> > #15 0x00007fd2696b27bd in clone () from /usr/lib64/libc.so.6
> >
> > In this core, I found that the reference of "/objects/ram-node0"( the type
> > of
> ram-node0 is struct "HostMemoryBackendFile") equals to 0 , while the
> reference of "/objects/ram-node1" equals to 129, more details can be seen at
> the end of this email.
> >
> > I searched through the community, and found a case that had the same error
> report:
> https://mail.coreboot.org/pipermail/seabios/2017-September/011799.html
> > However, I did not configure pcie_pci_bridge. Besides, qemu aborted in
> device initialization phase in this case.
> That case doesn't seem relevant.
>
> >
> > Also, I try to find out which can reference "/objects/ram-node0" so as to
> > look
> for the one that may un reference improperly, most of them lie in the function
> of "render_memory_region" or "phys_section_add" when memory topology
> changes.
> > Later, the temporary flatviews are destroyed by RCU thread, so un reference
> happened and the backtrace is similar to the one shown above.
> > But I am not familiar with the detail of these process, it is hard to keep
> > trace
> of these memory topology changes.
> >
> > My question is:
> > How can ram-node0's reference comes down to 0 when the virtual machine is
> still running?
> >
> > Maybe someone who is familiar with memory_region_ref or
> memory-backend-file can help me figure out.
> > Any idea is appreciated.
>
> Could you provide steps to reproduce (incl. command line)?
>
> [...]
> > Thanks,
> > Junjie Liu
> >