qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] assert during internal snapshot


From: Stefan Hajnoczi
Subject: Re: [Qemu-devel] assert during internal snapshot
Date: Tue, 10 Nov 2015 11:00:25 +0000
User-agent: Mutt/1.5.23 (2015-06-09)

On Mon, Nov 09, 2015 at 03:29:13AM +0000, Li, Liang Z wrote:
> > -----Original Message-----
> > From: Denis V. Lunev [mailto:address@hidden
> > Sent: Saturday, November 07, 2015 11:20 PM
> > To: Li, Liang Z; Paolo Bonzini; Juan Quintela; Amit Shah
> > Cc: QEMU
> > Subject: assert during internal snapshot
> > 
> > Hello, All!
> > 
> > This commit
> > 
> > commit 94f5a43704129ca4995aa3385303c5ae225bde42
> > Author: Liang Li <address@hidden>
> > Date:   Mon Nov 2 15:37:00 2015 +0800
> > 
> >      migration: defer migration_end & blk_mig_cleanup
> > 
> >      Because of the patch 3ea3b7fa9af067982f34b of kvm, which introduces a
> >      lazy collapsing of small sptes into large sptes mechanism, now
> >      migration_end() is a time consuming operation because it calls
> >      memroy_global_dirty_log_stop(), which will trigger the dropping of 
> > small
> >      sptes operation and takes about dozens of milliseconds, so call
> >      migration_end() before all the vmsate data has already been transferred
> >      to the destination will prolong VM downtime. This operation should be
> >      deferred after all the data has been transferred to the destination.
> > 
> >      blk_mig_cleanup() can be deferred too.
> > 
> >      For a VM with 8G RAM, this patch can reduce the VM downtime about
> > 30 ms.
> > 
> >      Signed-off-by: Liang Li <address@hidden>
> >      Reviewed-by: Paolo Bonzini <address@hidden>
> >      Reviewed-by: Juan Quintela <address@hidden>al3
> >      Reviewed-by: Amit Shah <address@hidden>al3
> >      Signed-off-by: Juan Quintela <address@hidden>al3
> > 
> > introduces the following regression
> > 
> > (gdb) bt
> > #0  0x00007fd5d314a267 in __GI_raise (address@hidden)
> >      at ../sysdeps/unix/sysv/linux/raise.c:55
> > #1  0x00007fd5d314beca in __GI_abort () at abort.c:89
> > #2  0x00007fd5d314303d in __assert_fail_base (
> >      fmt=0x7fd5d32a5028 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n",
> >      address@hidden "i != mr->ioeventfd_nb",
> >      address@hidden "/home/den/src/qemu/memory.c",
> >      address@hidden,
> >      address@hidden <__PRETTY_FUNCTION__.32545>
> > "memory_region_del_eventfd") at assert.c:92
> > #3  0x00007fd5d31430f2 in __GI___assert_fail (
> >      assertion=0x557288ed5b69 "i != mr->ioeventfd_nb",
> >      file=0x557288ed5a36 "/home/den/src/qemu/memory.c", line=1731,
> >      function=0x557288ed5fb0 <__PRETTY_FUNCTION__.32545>
> > "memory_region_del_eventfd") at assert.c:101
> > #4  0x0000557288b108fa in memory_region_del_eventfd
> > (mr=0x55728ad83700,
> >      addr=16, size=2, match_data=true, data=0, e=0x55728b21ff40)
> >      at /home/den/src/qemu/memory.c:1731
> > #5  0x0000557288d9fc18 in virtio_pci_set_host_notifier_internal (
> >      proxy=0x55728ad82e80, n=0, assign=false, set_handler=false)
> >      at hw/virtio/virtio-pci.c:178
> > #6  0x0000557288da19a9 in virtio_pci_set_host_notifier (d=0x55728ad82e80,
> > n=0,
> >      assign=false) at hw/virtio/virtio-pci.c:984
> > #7  0x0000557288b523df in virtio_scsi_dataplane_start (s=0x55728ad8afa0)
> >      at /home/den/src/qemu/hw/scsi/virtio-scsi-dataplane.c:268
> > #8  0x0000557288b50210 in virtio_scsi_handle_cmd (vdev=0x55728ad8afa0,
> >      vq=0x55728b21ffc0) at /home/den/src/qemu/hw/scsi/virtio-scsi.c:574
> > #9  0x0000557288b65cb7 in virtio_queue_notify_vq (vq=0x55728b21ffc0)
> >      at /home/den/src/qemu/hw/virtio/virtio.c:966
> > #10 0x0000557288b67bbf in virtio_queue_host_notifier_read
> > (n=0x55728b220010)
> >      at /home/den/src/qemu/hw/virtio/virtio.c:1643
> > #11 0x0000557288e12a2b in aio_dispatch (ctx=0x55728acaeab0) at
> > aio-posix.c:160
> > #12 0x0000557288e03194 in aio_ctx_dispatch (source=0x55728acaeab0,
> >      callback=0x0, user_data=0x0) at async.c:226
> > #13 0x00007fd5d409fff7 in g_main_context_dispatch ()
> >     from /lib/x86_64-linux-gnu/libglib-2.0.so.0
> > ---Type <return> to continue, or q <return> to quit---
> > #14 0x0000557288e1110d in glib_pollfds_poll () at main-loop.c:211
> > #15 0x0000557288e111e8 in os_host_main_loop_wait (timeout=0) at
> > main-loop.c:256
> > #16 0x0000557288e11295 in main_loop_wait (nonblocking=0) at main-
> > loop.c:504
> > #17 0x0000557288c1c31c in main_loop () at vl.c:1890
> > #18 0x0000557288c23dec in main (argc=105, argv=0x7ffca9a6fa08,
> >      envp=0x7ffca9a6fd58) at vl.c:4644
> > (gdb)
> > 
> > during 'virsh create-snapshot' operation over alive VM.
> > It happens 100% of time when VM is run using the following command line:
> > 
> >   7498 ?        tl     0:37 qemu-system-x86_64 -enable-kvm -name rhel7
> > -S -machine pc-i440fx-2.2,accel=kvm,usb=off -cpu SandyBridge -m 1024 -
> > realtime mlock=off -smp 1,sockets=1,cores=1,threads=1 -object
> > iothread,id=iothread1 -uuid 456af4d3-5d67-41c6-a229-c55ded6098e9
> > -no-user-config -nodefaults -chardev
> > socket,id=charmonitor,path=/var/lib/libvirt/qemu/rhel7.monitor,server,nowait
> > -mon chardev=charmonitor,id=monitor,mode=control -rtc
> > base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=discard -no-hpet 
> > -no-
> > shutdown -global PIIX4_PM.disable_s3=1 -global PIIX4_PM.disable_s4=1 -boot
> > strict=on -device ich9-usb-ehci1,id=usb,bus=pci.0,addr=0x6.0x7
> > -device
> > ich9-usb-
> > uhci1,masterbus=usb.0,firstport=0,bus=pci.0,multifunction=on,addr=0x6
> > -device
> > ich9-usb-uhci2,masterbus=usb.0,firstport=2,bus=pci.0,addr=0x6.0x1
> > -device
> > ich9-usb-uhci3,masterbus=usb.0,firstport=4,bus=pci.0,addr=0x6.0x2
> > -device virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x5 -device
> > virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x7 -drive
> > file=/var/lib/libvirt/images/rhel7.qcow2,if=none,id=drive-scsi0-0-0-
> > 0,format=qcow2,cache=none,aio=native
> > -device
> > scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-
> > 0-0,bootindex=1
> > -drive
> > file=/home/den/tmp/CentOS-7.0-1406-x86_64-DVD.iso,if=none,id=drive-ide0-0-
> > 0,readonly=on,format=raw
> > -device ide-cd,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0 -netdev
> > tap,fd=24,id=hostnet0,vhost=on,vhostfd=25 -device
> > virtio-net-
> > pci,netdev=hostnet0,id=net0,mac=52:54:00:97:2f:1d,bus=pci.0,addr=0x3
> > -chardev
> > socket,id=charchannel0,path=/var/lib/libvirt/qemu/channel/target/rhel7.org.qe
> > mu.guest_agent.0,server,nowait
> > -device
> > virtserialport,bus=virtio-
> > serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.qemu.guest_agen
> > t.0
> > -chardev spicevmc,id=charchannel1,name=vdagent -device
> > virtserialport,bus=virtio-
> > serial0.0,nr=2,chardev=charchannel1,id=channel1,name=com.redhat.spice.0
> > -chardev pty,id=charconsole0 -device
> > virtconsole,chardev=charconsole0,id=console0 -device
> > usb-tablet,id=input0 -spice
> > port=5900,addr=127.0.0.1,disable-ticketing,seamless-migration=on -device
> > qxl-
> > vga,id=video0,ram_size=67108864,vram_size=67108864,vgamem_mb=16,bus=
> > pci.0,addr=0x2
> > -device intel-hda,id=sound0,bus=pci.0,addr=0x4 -device
> > hda-duplex,id=sound0-codec0,bus=sound0.0,cad=0 -chardev
> > spicevmc,id=charredir0,name=usbredir -device
> > usb-redir,chardev=charredir0,id=redir0 -chardev
> > spicevmc,id=charredir1,name=usbredir -device
> > usb-redir,chardev=charredir1,id=redir1 -chardev
> > spicevmc,id=charredir2,name=usbredir -device
> > usb-redir,chardev=charredir2,id=redir2 -chardev
> > spicevmc,id=charredir3,name=usbredir -device
> > usb-redir,chardev=charredir3,id=redir3 -device
> > virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x8 -set
> > device.scsi0.iothread=iothread1 -msg timestamp=on
> > 
> > Minimal dumb (!!!) fix is the following:
> > 
> > hades ~/src/qemu $ cat 1.diff
> > diff --git a/migration/ram.c b/migration/ram.c index 25e9eeb..25ebf0d 100644
> > --- a/migration/ram.c
> > +++ b/migration/ram.c
> > @@ -1100,7 +1100,8 @@ static void migration_bitmap_free(struct BitmapRcu
> > *bmap)
> >       g_free(bmap);
> >   }
> > 
> > -static void migration_end(void)
> > +extern void migration_end(void);
> > +void migration_end(void)
> >   {
> >       /* caller have hold iothread lock or is in a bh, so there is
> >        * no writing race against this migration_bitmap diff --git
> > a/migration/savevm.c b/migration/savevm.c index dbcc39a..01da865 100644
> > --- a/migration/savevm.c
> > +++ b/migration/savevm.c
> > @@ -914,6 +914,7 @@ void qemu_savevm_state_cancel(void)
> >       }
> >   }
> > 
> > +extern void migration_end(void);
> >   static int qemu_savevm_state(QEMUFile *f, Error **errp)
> >   {
> >       int ret;
> > @@ -942,6 +943,8 @@ static int qemu_savevm_state(QEMUFile *f, Error
> > **errp)
> >           qemu_savevm_state_complete(f);
> >           ret = qemu_file_get_error(f);
> >       }
> > +    migration_end();
> > +
> >       if (ret != 0) {
> >           qemu_savevm_state_cancel();
> >           error_setg_errno(errp, -ret, "Error while writing VM state"); 
> > hades
> > ~/src/qemu $
> > 
> > (patch attached to start a discussion).
> > 
> > For me it looks like commit is wrong and should be reverted and reworked.
> > 
> > Den
> 
> Cc to : Stefan
> 
> It seems the  'bdrv_drain_all()'  in ' blk_mig_cleanup ()' should not be 
> deferred.  Stefan, is that right?

Denis, do you still experience this crash with your own savevm patch
series applied?  Since that series does AioContext acquire/release where
missing in savevm, it might fix this bug.

Stefan

Attachment: signature.asc
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]