we are running a cluster of virtual machines on ganeti 2.15.2 on top of qemu-kvm 2.1.2. Hosts are Debian jessie (Kernel 3.16). We have big trouble with timekeeping on guests, no matter if these guests are lenny, squeeze, wheezy or jessie. Clock drifts heavily on some, others are fine. There is no obvious pattern on which guest this happens and on which not, but it seems the clock drifts heavier on guests that are idle and have nothing to do. Some more facts:
* Clocksource on all hosts are set to tsc
* Clocksource on all guests are set to kvm-clock
* CPUs on the hosts are Intel Xeons, all have these related flags: tsc, constant_tsc, nonstop_tsc. One host has tsc_deadline_timer as an extra, because it's a newer Xeon, but it doesn't seem to make a difference in drifting.
* Most guests complain about "Clocksource tsc unstable" in dmesg after boot (even when set to clocksource=kvm-clock)
* The hosts don't drift at all, we only see the issue in guests
* We don't use CPU pinning for KVM guests
* We run OpenNTPD 5.7p4 on both the hosts and the guests (we build this for ourselves, so the version is the same in every guest across every Debian version)
We tried different things to debug the cause of this issue but we're out if ideas now. This is what we have tested so far:
* Newer kernel from jessie-backports
* Newer qemu-kvm from jessie-backports
* Set clocksource=hpet on the hosts, this made drifts even worse (not tested yet: acpi_pm)
* Tried to run guests without kvm_steal_time setting
* Tried without running OpenNTPD on the guests
We have some hints along our investigations. Novell for example says that CPU pinning may help on drifting guests:https://www.novell.com/support/kb/doc.php?id=7008698
Others suggested using an "failover clocksource" on the guests:https://blog.laimbock.com/tag/clocksource_failover/
However, before we just continue to poke around in the dark, we wanted to hear input from virtualization experts that may have stumbled across something like this before. Any input would be greatly appreciated!
Please reply directly, as I'm not subscribed to the list. And sorry for cross-posting this to several lists at once, we just hope to get as much feedback as possible.
Thanks in advance,