[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Qemu-devel] Re: [patch uq/master 7/8] MCE: Relay UCR MCE to guest
From: |
Hidetoshi Seto |
Subject: |
[Qemu-devel] Re: [patch uq/master 7/8] MCE: Relay UCR MCE to guest |
Date: |
Fri, 08 Oct 2010 14:54:07 +0900 |
User-agent: |
Mozilla/5.0 (Windows; U; Windows NT 6.0; ja; rv:1.9.2.9) Gecko/20100915 Thunderbird/3.1.4 |
Hi, Huang-san,
(2010/10/08 12:15), Huang Ying wrote:
> Hi, Seto,
>
> On Thu, 2010-10-07 at 11:41 +0800, Hidetoshi Seto wrote:
>> (2010/10/07 3:10), Dean Nelson wrote:
>>> On 10/06/2010 11:05 AM, Marcelo Tosatti wrote:
>>>> On Wed, Oct 06, 2010 at 10:58:36AM +0900, Hidetoshi Seto wrote:
>>>>> I got some more question:
>>>>>
>>>>> (2010/10/05 3:54), Marcelo Tosatti wrote:
>>>>>> Index: qemu/target-i386/cpu.h
>>>>>> ===================================================================
>>>>>> --- qemu.orig/target-i386/cpu.h
>>>>>> +++ qemu/target-i386/cpu.h
>>>>>> @@ -250,16 +250,32 @@
>>>>>> #define PG_ERROR_RSVD_MASK 0x08
>>>>>> #define PG_ERROR_I_D_MASK 0x10
>>>>>>
>>>>>> -#define MCG_CTL_P (1UL<<8) /* MCG_CAP register available */
>>>>>> +#define MCG_CTL_P (1ULL<<8) /* MCG_CAP register available */
>>>>>> +#define MCG_SER_P (1ULL<<24) /* MCA recovery/new status bits */
>>>>>>
>>>>>> -#define MCE_CAP_DEF MCG_CTL_P
>>>>>> +#define MCE_CAP_DEF (MCG_CTL_P|MCG_SER_P)
>>>>>> #define MCE_BANKS_DEF 10
>>>>>>
>>>>>
>>>>> It seems that current kvm doesn't support SER_P, so injecting SRAO
>>>>> to guest will mean that guest receives VAL|UC|!PCC and RIPV event
>>>>> from virtual processor that doesn't have SER_P.
>>>>
>>>> Dean also noted this. I don't think it was deliberate choice to not
>>>> expose SER_P. Huang?
>>>
>>> In my testing, I found that MCG_SER_P was not being set (and I was
>>> running on a Nehalem-EX system). Injecting a MCE resulted in the
>>> guest entering into panic() from mce_panic(). If crash_kexec()
>>> finds a kexec_crash_image the system ends up rebooting, otherwise,
>>> what happens next requires operator intervention.
>>
>> Good to know.
>> What I'm concerning is that if memory scrubbing SRAO event is
>> injected when !SER_P, linux guest with certain mce tolerant level
>> might grade it as "UC" severity and continue running with none of
>> panicking, killing and poisoning because of !PCC and RIPV.
>>
>> Could you provide the panic message of the guest in your test?
>> I think it can tell me why the mce handler decided to go panic.
>
> That is a bug that the SER_P is not in KVM_MCE_CAP_SUPPORTED in kernel.
> I will fix it as soon as possible. And SRAO MCE should not be sent
> when !SER_P, we should add that condition in qemu-kvm.
That makes sense.
I think it is qemu's responsibility for what follows the AO-SIGBUS,
what action should be taken depends on the KVM's capability.
>>> When I applied a patch to the guest's kernel which forces mce_ser to be
>>> set, as if MCG_SER_P was set (see __mcheck_cpu_cap_init()), I found
>>> that when the memory page was 'owned' by a guest process, the process
>>> would be killed (if the page was dirty), and the guest would stay
>>> running. The HWPoisoned page would be sidelined and not cause any more
>>> issues.
>>
>> Excellent.
>> So while guest kernel knows which page is poisoned, guest processes
>> are controlled not to touch the page.
>>
>> ... Therefore rebooting the vm and renewing kernel will lost the
>> information where is poisoned.
>
> Yes. That is an issue. Dean suggests that make qemu-kvm to refuse reboot
> the guest if there is poisoned page and ask for user to intervention. I
> have another idea to replace the poison pages with good pages when
> reboot, that is, recover without user intervention.
Sounds good.
I think it may be worth something to reserve pages for the replacement
before reboot is requested; at least we really don't want to fail
rebooting with 'no memory'.
>>>>> I think most OSes don't expect that it can receives MCE with !PCC
>>>>> on traditional x86 processor without SER_P.
>>>>>
>>>>> Q1: Is it safe to expect that guests can handle such !PCC event?
>>>
>>> This might be best answered by Huang, but as I mentioned above, without
>>> MCG_SER_P being set, the result was an orderly system panic on the
>>> guest.
>>
>> Though I'll wait Huang (I think he is on holiday), I believe that
>> system panic is just a possible option for AO (Action Optional)
>> event, no matter how the SER_P is.
>
> We should fix this as I said above.
>
>>>>> Q2: What is the expected behavior on the guest?
>>>
>>> I think I answered this above.
>>
>> Yeah, thanks.
>>
>>>
>>>>> Q3: What happen if guest reboots itself in response to the MCE?
>>>
>>> That depends...
>>>
>>> And the following issue also holds for a guest that is rebooted at
>>> some point having successfully sidelined the bad page.
>>>
>>> After the guest has panic'd, a system_reset of the guest or a restart
>>> initiated by crash_kexec() (called by panic() on the guest), usually
>>> results in the guest hanging because the bad page still belongs
>>> to qemu-kvm and is now being referenced by the new guest in some way.
>>
>> Yes. In other words my concern about reboot is that new guest kernel
>> including kdump kernel might try to read the bad page. If there is
>> no AR-SIGBUS etc., we need some tricks to inhibit such accesses.
>>
>>> (It actually may not hang, but successfully reboot and be runnable,
>>> with the bad page lurking in the background. It all seems to depend on
>>> where the bad page ends up, and whether it's ever referenced.)
>>
>> I know some tough guys using their PC with buggy DIMMs :-)
>>
>>>
>>> I believe there was an attempt to deal with this in kvm on the host.
>>> See kvm_handle_bad_page(). This function was suppose to result in the
>>> sending of a BUS_MCEERR_AR flavored SIGBUS by do_sigbus() to qemu-kvm
>>> which in theory would result in the right thing happening. But commit
>>> 96054569190bdec375fe824e48ca1f4e3b53dd36 prevents the signal from being
>>> sent. So this mechanism needs to be re-worked, and the issue remains.
>>
>> Definitely.
>> I guess Huang has some plan or hint for rework this point.
>
> Yes. This should be fixed. The SRAR SIGBUS should be sent directly
> instead of being sent via touching poisoned virtual address.
Good. It should work.
>>> I would think that if the the bad page can't be sidelined, such that
>>> the newly booting guest can't use it, then the new guest shouldn't be
>>> allowed to boot. But perhaps there is some merit in letting it try to
>>> boot and see if one gets 'lucky'.
>>
>> In case of booting a real machine in real world, hardware and firmware
>> usually (or often) do self-test before passing control to OS.
>> Some platform can boot OS with degraded configuration (for example,
>> fewer memory) if it has trouble on its component. Some BIOS may
>> stop booting and show messages like "please reseat [component]" on the
>> screen. So we could implement/request qemu to have such mechanism.
>>
>> I can understand the merit you mentioned here, in some degree. But I
>> think it is hard to say "unlucky" to customer in business...
>
> Because the contents of poisoned pages are not relevant after reboot.
> Qemu can replace the poisoned pages with good pages when reboot guest.
> Do you think that is good.
Sure.
Of course this trick will not needed if user has done migration or
save/restore the guest before a reboot.
Thank you for answering!
Thanks,
H.Seto
- [Qemu-devel] [patch uq/master 1/8] signalfd compatibility, (continued)
- [Qemu-devel] [patch uq/master 1/8] signalfd compatibility, Marcelo Tosatti, 2010/10/04
- [Qemu-devel] [patch uq/master 7/8] MCE: Relay UCR MCE to guest, Marcelo Tosatti, 2010/10/04
- [Qemu-devel] Re: [patch uq/master 7/8] MCE: Relay UCR MCE to guest, Hidetoshi Seto, 2010/10/05
- [Qemu-devel] Re: [patch uq/master 7/8] MCE: Relay UCR MCE to guest, Hidetoshi Seto, 2010/10/05
- [Qemu-devel] Re: [patch uq/master 7/8] MCE: Relay UCR MCE to guest, Marcelo Tosatti, 2010/10/06
- [Qemu-devel] Re: [patch uq/master 7/8] MCE: Relay UCR MCE to guest, Dean Nelson, 2010/10/06
- [Qemu-devel] Re: [patch uq/master 7/8] MCE: Relay UCR MCE to guest, Hidetoshi Seto, 2010/10/06
- [Qemu-devel] Re: [patch uq/master 7/8] MCE: Relay UCR MCE to guest, Dean Nelson, 2010/10/07
- [Qemu-devel] Re: [patch uq/master 7/8] MCE: Relay UCR MCE to guest, Huang Ying, 2010/10/07
- [Qemu-devel] Re: [patch uq/master 7/8] MCE: Relay UCR MCE to guest,
Hidetoshi Seto <=
- [Qemu-devel] Re: [patch uq/master 7/8] MCE: Relay UCR MCE to guest, Dean Nelson, 2010/10/08
- [Qemu-devel] Re: [patch uq/master 7/8] MCE: Relay UCR MCE to guest, Huang Ying, 2010/10/07
Re: [Qemu-devel] [patch uq/master 0/8] port qemu-kvm's MCE support, Andreas Färber, 2010/10/05
[Qemu-devel] [patch uq/master 0/8] port qemu-kvm's MCE support (v2), Marcelo Tosatti, 2010/10/06