qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v2 for-8.2?] i386/sev: Avoid SEV-ES crash due to missing MSR_


From: Maxim Levitsky
Subject: Re: [PATCH v2 for-8.2?] i386/sev: Avoid SEV-ES crash due to missing MSR_EFER_LMA bit
Date: Fri, 08 Dec 2023 17:20:08 +0200
User-agent: Evolution 3.36.5 (3.36.5-2.fc32)

On Wed, 2023-12-06 at 11:42 -0600, Michael Roth wrote:
> On Wed, Dec 06, 2023 at 07:20:14PM +0200, Maxim Levitsky wrote:
> > On Tue, 2023-12-05 at 16:28 -0600, Michael Roth wrote:
> > > Commit 7191f24c7fcf ("accel/kvm/kvm-all: Handle register access errors")
> > > added error checking for KVM_SET_SREGS/KVM_SET_SREGS2. In doing so, it
> > > exposed a long-running bug in current KVM support for SEV-ES where the
> > > kernel assumes that MSR_EFER_LMA will be set explicitly by the guest
> > > kernel, in which case EFER write traps would result in KVM eventually
> > > seeing MSR_EFER_LMA get set and recording it in such a way that it would
> > > be subsequently visible when accessing it via KVM_GET_SREGS/etc.
> > > 
> > > However, guests kernels currently rely on MSR_EFER_LMA getting set
> > > automatically when MSR_EFER_LME is set and paging is enabled via
> > > CR0_PG_MASK. As a result, the EFER write traps don't actually expose the
> > > MSR_EFER_LMA even though it is set internally, and when QEMU
> > > subsequently tries to pass this EFER value back to KVM via
> > > KVM_SET_SREGS* it will fail various sanity checks and return -EINVAL,
> > > which is now considered fatal due to the aforementioned QEMU commit.
> > > 
> > > This can be addressed by inferring the MSR_EFER_LMA bit being set when
> > > paging is enabled and MSR_EFER_LME is set, and synthesizing it to ensure
> > > the expected bits are all present in subsequent handling on the host
> > > side.
> > > 
> > > Ultimately, this handling will be implemented in the host kernel, but to
> > > avoid breaking QEMU's SEV-ES support when using older host kernels, the
> > > same handling can be done in QEMU just after fetching the register
> > > values via KVM_GET_SREGS*. Implement that here.
> > > 
> > > Cc: Paolo Bonzini <pbonzini@redhat.com>
> > > Cc: Marcelo Tosatti <mtosatti@redhat.com>
> > > Cc: Tom Lendacky <thomas.lendacky@amd.com>
> > > Cc: Akihiko Odaki <akihiko.odaki@daynix.com>
> > > Cc: kvm@vger.kernel.org
> > > Fixes: 7191f24c7fcf ("accel/kvm/kvm-all: Handle register access errors")
> > > Signed-off-by: Michael Roth <michael.roth@amd.com>
> > > ---
> > > v2:
> > >   - Add handling for KVM_GET_SREGS, not just KVM_GET_SREGS2
> > > 
> > >  target/i386/kvm/kvm.c | 14 ++++++++++++++
> > >  1 file changed, 14 insertions(+)
> > > 
> > > diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
> > > index 11b8177eff..8721c1bf8f 100644
> > > --- a/target/i386/kvm/kvm.c
> > > +++ b/target/i386/kvm/kvm.c
> > > @@ -3610,6 +3610,7 @@ static int kvm_get_sregs(X86CPU *cpu)
> > >  {
> > >      CPUX86State *env = &cpu->env;
> > >      struct kvm_sregs sregs;
> > > +    target_ulong cr0_old;
> > >      int ret;
> > >  
> > >      ret = kvm_vcpu_ioctl(CPU(cpu), KVM_GET_SREGS, &sregs);
> > > @@ -3637,12 +3638,18 @@ static int kvm_get_sregs(X86CPU *cpu)
> > >      env->gdt.limit = sregs.gdt.limit;
> > >      env->gdt.base = sregs.gdt.base;
> > >  
> > > +    cr0_old = env->cr[0];
> > >      env->cr[0] = sregs.cr0;
> > >      env->cr[2] = sregs.cr2;
> > >      env->cr[3] = sregs.cr3;
> > >      env->cr[4] = sregs.cr4;
> > >  
> > >      env->efer = sregs.efer;
> > > +    if (sev_es_enabled() && env->efer & MSR_EFER_LME) {
> > > +        if (!(cr0_old & CR0_PG_MASK) && env->cr[0] & CR0_PG_MASK) {
> > > +            env->efer |= MSR_EFER_LMA;
> > > +        }
> > > +    }
> > 
> > I think that we should not check that CR0_PG has changed, and just blindly 
> > assume
> > that if EFER.LME is set and CR0.PG is set, then EFER.LMA must be set as 
> > defined in x86 spec.
> > 
> > Otherwise, suppose qemu calls kvm_get_sregs twice: First time it will work,
> > but second time CR0.PG will match one that is stored in the env, and thus 
> > the workaround
> > will not be executed, and instead we will revert back to wrong EFER value 
> > reported by the kernel.
> > 
> > How about something like that:
> > 
> > 
> > if (sev_es_enabled() && env->efer & MSR_EFER_LME && env->cr[0] & 
> > CR0_PG_MASK) {
> >     /* 
> >          * Workaround KVM bug, because of which KVM might not be aware of 
> > the 
> >          * fact that EFER.LMA was toggled by the hardware 
> >          */
> >     env->efer |= MSR_EFER_LMA;
> > }
> 
> Hi Maxim,
> 
> I'd already sent a v3 based on a similar suggestion from Paolo:
> 
>   https://lists.gnu.org/archive/html/qemu-devel/2023-12/msg00751.html
> 
> Does that one look okay to you?

Yep, thanks!

Best regards,
        Maxim Levitsky
> 
> Thanks,
> 
> Mike
> 
> > 
> > Best regards,
> >     Maxim Levitsky
> > 
> > >  
> > >      /* changes to apic base and cr8/tpr are read back via 
> > > kvm_arch_post_run */
> > >      x86_update_hflags(env);
> > > @@ -3654,6 +3661,7 @@ static int kvm_get_sregs2(X86CPU *cpu)
> > >  {
> > >      CPUX86State *env = &cpu->env;
> > >      struct kvm_sregs2 sregs;
> > > +    target_ulong cr0_old;
> > >      int i, ret;
> > >  
> > >      ret = kvm_vcpu_ioctl(CPU(cpu), KVM_GET_SREGS2, &sregs);
> > > @@ -3676,12 +3684,18 @@ static int kvm_get_sregs2(X86CPU *cpu)
> > >      env->gdt.limit = sregs.gdt.limit;
> > >      env->gdt.base = sregs.gdt.base;
> > >  
> > > +    cr0_old = env->cr[0];
> > >      env->cr[0] = sregs.cr0;
> > >      env->cr[2] = sregs.cr2;
> > >      env->cr[3] = sregs.cr3;
> > >      env->cr[4] = sregs.cr4;
> > >  
> > >      env->efer = sregs.efer;
> > > +    if (sev_es_enabled() && env->efer & MSR_EFER_LME) {
> > > +        if (!(cr0_old & CR0_PG_MASK) && env->cr[0] & CR0_PG_MASK) {
> > > +            env->efer |= MSR_EFER_LMA;
> > > +        }
> > > +    }
> > >  
> > >      env->pdptrs_valid = sregs.flags & KVM_SREGS2_FLAGS_PDPTRS_VALID;
> > >  





reply via email to

[Prev in Thread] Current Thread [Next in Thread]