qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Qemu-devel] [Bug 1297218] Re: guest hangs after live migration due to t


From: Serge Hallyn
Subject: [Qemu-devel] [Bug 1297218] Re: guest hangs after live migration due to tsc jump
Date: Mon, 04 Jul 2016 17:35:47 -0000

No, I'm afraid not.  But if you can test when this package is accepted
into trusty-proposed that'll be great.

** Description changed:

+ =====================================
+ SRU Justification:
+ 1. Impact: guests hang after live migration with 100% cpu
+ 2. Upstream fix: a set of four patches fix this upstream
+ 3. Stable fix: we have a backport of the four patches into a single patch.
+ 4. Test case: try a set of migrations of different VMS (it is unfortunately 
not 100% reproducible)
+ 5. Regression potential: the patch is not trivial, however the 
lp:qa-regression-tests testsuite passed 100% with this package.
+ =====================================
+ 
  We have two identical Ubuntu servers running libvirt/kvm/qemu, sharing a
  Gluster filesystem. Guests can be live migrated between them. However,
  live migration often leads to the guest being stuck at 100% for a while.
  In that case, the dmesg output for such a guest will show (once it
  recovers): Clocksource tsc unstable (delta = 662463064082 ns). In this
  particular example, a guest was migrated and only after 11 minutes (662
  seconds) did it become responsive again.
  
  It seems that newly booted guests doe not suffer from this problem,
  these can be migrated back and forth at will. After a day or so, the
  problem becomes apparent. It also seems that migrating from server A to
  server B causes much more problems than going from B back to A. If
  necessary, I can do more measurements to qualify these observations.
  
  The VM servers run Ubuntu 13.04 with these packages:
  Kernel: 3.8.0-35-generic x86_64
  Libvirt: 1.0.2
  Qemu: 1.4.0
  Gluster-fs: 3.4.2 (libvirt access the images via the filesystem, not using 
libgfapi yet as the Ubuntu libvirt is not linked against libgfapi).
- The interconnect between both machines (both for migration and gluster) is 
10GbE. 
+ The interconnect between both machines (both for migration and gluster) is 
10GbE.
  Both servers are synced to NTP and well within 1ms form one another.
  
  Guests are either Ubuntu 13.04 or 13.10.
  
  On the guests, the current_clocksource is kvm-clock.
- The XML definition of the guests only contains:  <clock offset='utc'/> 
+ The XML definition of the guests only contains:  <clock offset='utc'/>
  
  Now as far as I've read in the documentation of kvm-clock, it specifically 
supports live migrations, so I'm a bit surprised at these problems. There isn't 
all that much information to find on these issue, although I have found 
postings by others that seem to have run into the same issues, but without a 
solution.
- --- 
+ ---
  ApportVersion: 2.14.1-0ubuntu3
  Architecture: amd64
  DistroRelease: Ubuntu 14.04
  Package: libvirt (not installed)
  ProcCmdline: BOOT_IMAGE=/boot/vmlinuz-3.13.0-24-generic 
root=UUID=1b0c3c6d-a9b8-4e84-b076-117ae267d178 ro console=ttyS1,115200n8 
BOOTIF=01-00-25-90-75-b5-c8
  ProcVersionSignature: Ubuntu 3.13.0-24.47-generic 3.13.9
  Tags:  trusty apparmor apparmor apparmor apparmor apparmor
  Uname: Linux 3.13.0-24-generic x86_64
  UpgradeStatus: No upgrade log present (probably fresh install)
  UserGroups:
-  
+ 
  _MarkForUpload: True
  modified.conffile..etc.default.libvirt.bin: [modified]
  modified.conffile..etc.libvirt.libvirtd.conf: [modified]
  modified.conffile..etc.libvirt.qemu.conf: [modified]
  modified.conffile..etc.libvirt.qemu.networks.default.xml: [deleted]
  mtime.conffile..etc.default.libvirt.bin: 2014-05-12T19:07:40.020662
  mtime.conffile..etc.libvirt.libvirtd.conf: 2014-05-13T14:40:25.894837
  mtime.conffile..etc.libvirt.qemu.conf: 2014-05-12T18:58:27.885506

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1297218

Title:
  guest hangs after live migration due to tsc jump

Status in QEMU:
  New
Status in glusterfs package in Ubuntu:
  Invalid
Status in qemu package in Ubuntu:
  Fix Released
Status in glusterfs source package in Trusty:
  Confirmed
Status in qemu source package in Trusty:
  Confirmed

Bug description:
  =====================================
  SRU Justification:
  1. Impact: guests hang after live migration with 100% cpu
  2. Upstream fix: a set of four patches fix this upstream
  3. Stable fix: we have a backport of the four patches into a single patch.
  4. Test case: try a set of migrations of different VMS (it is unfortunately 
not 100% reproducible)
  5. Regression potential: the patch is not trivial, however the 
lp:qa-regression-tests testsuite passed 100% with this package.
  =====================================

  We have two identical Ubuntu servers running libvirt/kvm/qemu, sharing
  a Gluster filesystem. Guests can be live migrated between them.
  However, live migration often leads to the guest being stuck at 100%
  for a while. In that case, the dmesg output for such a guest will show
  (once it recovers): Clocksource tsc unstable (delta = 662463064082
  ns). In this particular example, a guest was migrated and only after
  11 minutes (662 seconds) did it become responsive again.

  It seems that newly booted guests doe not suffer from this problem,
  these can be migrated back and forth at will. After a day or so, the
  problem becomes apparent. It also seems that migrating from server A
  to server B causes much more problems than going from B back to A. If
  necessary, I can do more measurements to qualify these observations.

  The VM servers run Ubuntu 13.04 with these packages:
  Kernel: 3.8.0-35-generic x86_64
  Libvirt: 1.0.2
  Qemu: 1.4.0
  Gluster-fs: 3.4.2 (libvirt access the images via the filesystem, not using 
libgfapi yet as the Ubuntu libvirt is not linked against libgfapi).
  The interconnect between both machines (both for migration and gluster) is 
10GbE.
  Both servers are synced to NTP and well within 1ms form one another.

  Guests are either Ubuntu 13.04 or 13.10.

  On the guests, the current_clocksource is kvm-clock.
  The XML definition of the guests only contains:  <clock offset='utc'/>

  Now as far as I've read in the documentation of kvm-clock, it specifically 
supports live migrations, so I'm a bit surprised at these problems. There isn't 
all that much information to find on these issue, although I have found 
postings by others that seem to have run into the same issues, but without a 
solution.
  ---
  ApportVersion: 2.14.1-0ubuntu3
  Architecture: amd64
  DistroRelease: Ubuntu 14.04
  Package: libvirt (not installed)
  ProcCmdline: BOOT_IMAGE=/boot/vmlinuz-3.13.0-24-generic 
root=UUID=1b0c3c6d-a9b8-4e84-b076-117ae267d178 ro console=ttyS1,115200n8 
BOOTIF=01-00-25-90-75-b5-c8
  ProcVersionSignature: Ubuntu 3.13.0-24.47-generic 3.13.9
  Tags:  trusty apparmor apparmor apparmor apparmor apparmor
  Uname: Linux 3.13.0-24-generic x86_64
  UpgradeStatus: No upgrade log present (probably fresh install)
  UserGroups:

  _MarkForUpload: True
  modified.conffile..etc.default.libvirt.bin: [modified]
  modified.conffile..etc.libvirt.libvirtd.conf: [modified]
  modified.conffile..etc.libvirt.qemu.conf: [modified]
  modified.conffile..etc.libvirt.qemu.networks.default.xml: [deleted]
  mtime.conffile..etc.default.libvirt.bin: 2014-05-12T19:07:40.020662
  mtime.conffile..etc.libvirt.libvirtd.conf: 2014-05-13T14:40:25.894837
  mtime.conffile..etc.libvirt.qemu.conf: 2014-05-12T18:58:27.885506

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1297218/+subscriptions



reply via email to

[Prev in Thread] Current Thread [Next in Thread]