[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Qemu-devel] [Bug 1307225] Re: Running a virtual machine on a Haswell sy
From: |
cvbkf |
Subject: |
[Qemu-devel] [Bug 1307225] Re: Running a virtual machine on a Haswell system produces machine check events |
Date: |
Fri, 25 Jul 2014 10:17:16 -0000 |
I can confirm this.
Using qemu-kvm for three virtual machines on Ubuntu 14.04 LTS using a
Intel i7-4770 Haswell based server.
dmesg:
[63429.847437] mce: [Hardware Error]: Machine check events logged
[65996.795630] mce: [Hardware Error]: Machine check events logged
mcelog:
Hardware event. This is not a software error.
MCE 0
CPU 2 BANK 0
TIME 1406265172 Fri Jul 25 07:12:52 2014
MCG status:
MCi status:
Corrected error
Error enabled
MCA: Internal parity error
STATUS 90000040000f0005 MCGSTATUS 0
MCGCAP c09 APICID 4 SOCKETID 0
CPUID Vendor Intel Family 6 Model 60
It's the same error everytime, only APICID and CPU numbers are different.
The mce errors did not happen until i migrated the virtual machines from
another system, the haswell-server was up for three days without any incidents,
now, while running qemu-kvm there is a mce error every 6-12 hours.
After the first errors, i called the support of my server provider, they first
exchanged RAM, upgraded BIOS...
Then, they replaced the whole server, only swapping my harddisks to the new
one. But even that didn't help, i still got MCE errors. The harddisks where
replaced too, one at a time (to resync raid). Now, i have a completely swapped
hardware, but the MCE errors are still popping up.
system information attached
--
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1307225
Title:
Running a virtual machine on a Haswell system produces machine check
events
Status in QEMU:
New
Bug description:
I'm running a virtual Windows SBS 2003 installation on a Xeon E3
Haswell system running Gentoo Linux. First, I used Qemu 1.5.3 (the
latest stable version on Gentoo). I got a lot of machine check events
("mce: [Hardware Error]: Machine check events logged") in dmesg that
always looked like (using mcelog):
Hardware event. This is not a software error.
MCE 0
CPU 3 BANK 0
TIME 1397455091 Mon Apr 14 07:58:11 2014
MCG status:
MCi status:
Corrected error
Error enabled
MCA: Internal parity error
STATUS 90000040000f0005 MCGSTATUS 0
MCGCAP c09 APICID 6 SOCKETID 0
CPUID Vendor Intel Family 6 Model 60
I found this discussion on the vmware community:
https://communities.vmware.com/thread/452344
It seems that this is (at least partly) caused by the Qemu machine. I
switched to Qemu 1.7.0, the first version to use "pc-i440fx-1.7". With
this version, the errors almost disappeared, but from time to time, I
still get machine check events. Anyways, they so not seem to affect
neither the vm, nor the host.
The Haswell machine has been set up and running for several days
without a single error message. They only appear when the VM is
running. so I think this is actually some problem with the Haswell
architecture (and not a real hardware error).
To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1307225/+subscriptions