[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH v2 for-8.2?] i386/sev: Avoid SEV-ES crash due to missing MSR_
From: |
Maxim Levitsky |
Subject: |
Re: [PATCH v2 for-8.2?] i386/sev: Avoid SEV-ES crash due to missing MSR_EFER_LMA bit |
Date: |
Fri, 08 Dec 2023 17:20:08 +0200 |
User-agent: |
Evolution 3.36.5 (3.36.5-2.fc32) |
On Wed, 2023-12-06 at 11:42 -0600, Michael Roth wrote:
> On Wed, Dec 06, 2023 at 07:20:14PM +0200, Maxim Levitsky wrote:
> > On Tue, 2023-12-05 at 16:28 -0600, Michael Roth wrote:
> > > Commit 7191f24c7fcf ("accel/kvm/kvm-all: Handle register access errors")
> > > added error checking for KVM_SET_SREGS/KVM_SET_SREGS2. In doing so, it
> > > exposed a long-running bug in current KVM support for SEV-ES where the
> > > kernel assumes that MSR_EFER_LMA will be set explicitly by the guest
> > > kernel, in which case EFER write traps would result in KVM eventually
> > > seeing MSR_EFER_LMA get set and recording it in such a way that it would
> > > be subsequently visible when accessing it via KVM_GET_SREGS/etc.
> > >
> > > However, guests kernels currently rely on MSR_EFER_LMA getting set
> > > automatically when MSR_EFER_LME is set and paging is enabled via
> > > CR0_PG_MASK. As a result, the EFER write traps don't actually expose the
> > > MSR_EFER_LMA even though it is set internally, and when QEMU
> > > subsequently tries to pass this EFER value back to KVM via
> > > KVM_SET_SREGS* it will fail various sanity checks and return -EINVAL,
> > > which is now considered fatal due to the aforementioned QEMU commit.
> > >
> > > This can be addressed by inferring the MSR_EFER_LMA bit being set when
> > > paging is enabled and MSR_EFER_LME is set, and synthesizing it to ensure
> > > the expected bits are all present in subsequent handling on the host
> > > side.
> > >
> > > Ultimately, this handling will be implemented in the host kernel, but to
> > > avoid breaking QEMU's SEV-ES support when using older host kernels, the
> > > same handling can be done in QEMU just after fetching the register
> > > values via KVM_GET_SREGS*. Implement that here.
> > >
> > > Cc: Paolo Bonzini <pbonzini@redhat.com>
> > > Cc: Marcelo Tosatti <mtosatti@redhat.com>
> > > Cc: Tom Lendacky <thomas.lendacky@amd.com>
> > > Cc: Akihiko Odaki <akihiko.odaki@daynix.com>
> > > Cc: kvm@vger.kernel.org
> > > Fixes: 7191f24c7fcf ("accel/kvm/kvm-all: Handle register access errors")
> > > Signed-off-by: Michael Roth <michael.roth@amd.com>
> > > ---
> > > v2:
> > > - Add handling for KVM_GET_SREGS, not just KVM_GET_SREGS2
> > >
> > > target/i386/kvm/kvm.c | 14 ++++++++++++++
> > > 1 file changed, 14 insertions(+)
> > >
> > > diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
> > > index 11b8177eff..8721c1bf8f 100644
> > > --- a/target/i386/kvm/kvm.c
> > > +++ b/target/i386/kvm/kvm.c
> > > @@ -3610,6 +3610,7 @@ static int kvm_get_sregs(X86CPU *cpu)
> > > {
> > > CPUX86State *env = &cpu->env;
> > > struct kvm_sregs sregs;
> > > + target_ulong cr0_old;
> > > int ret;
> > >
> > > ret = kvm_vcpu_ioctl(CPU(cpu), KVM_GET_SREGS, &sregs);
> > > @@ -3637,12 +3638,18 @@ static int kvm_get_sregs(X86CPU *cpu)
> > > env->gdt.limit = sregs.gdt.limit;
> > > env->gdt.base = sregs.gdt.base;
> > >
> > > + cr0_old = env->cr[0];
> > > env->cr[0] = sregs.cr0;
> > > env->cr[2] = sregs.cr2;
> > > env->cr[3] = sregs.cr3;
> > > env->cr[4] = sregs.cr4;
> > >
> > > env->efer = sregs.efer;
> > > + if (sev_es_enabled() && env->efer & MSR_EFER_LME) {
> > > + if (!(cr0_old & CR0_PG_MASK) && env->cr[0] & CR0_PG_MASK) {
> > > + env->efer |= MSR_EFER_LMA;
> > > + }
> > > + }
> >
> > I think that we should not check that CR0_PG has changed, and just blindly
> > assume
> > that if EFER.LME is set and CR0.PG is set, then EFER.LMA must be set as
> > defined in x86 spec.
> >
> > Otherwise, suppose qemu calls kvm_get_sregs twice: First time it will work,
> > but second time CR0.PG will match one that is stored in the env, and thus
> > the workaround
> > will not be executed, and instead we will revert back to wrong EFER value
> > reported by the kernel.
> >
> > How about something like that:
> >
> >
> > if (sev_es_enabled() && env->efer & MSR_EFER_LME && env->cr[0] &
> > CR0_PG_MASK) {
> > /*
> > * Workaround KVM bug, because of which KVM might not be aware of
> > the
> > * fact that EFER.LMA was toggled by the hardware
> > */
> > env->efer |= MSR_EFER_LMA;
> > }
>
> Hi Maxim,
>
> I'd already sent a v3 based on a similar suggestion from Paolo:
>
> https://lists.gnu.org/archive/html/qemu-devel/2023-12/msg00751.html
>
> Does that one look okay to you?
Yep, thanks!
Best regards,
Maxim Levitsky
>
> Thanks,
>
> Mike
>
> >
> > Best regards,
> > Maxim Levitsky
> >
> > >
> > > /* changes to apic base and cr8/tpr are read back via
> > > kvm_arch_post_run */
> > > x86_update_hflags(env);
> > > @@ -3654,6 +3661,7 @@ static int kvm_get_sregs2(X86CPU *cpu)
> > > {
> > > CPUX86State *env = &cpu->env;
> > > struct kvm_sregs2 sregs;
> > > + target_ulong cr0_old;
> > > int i, ret;
> > >
> > > ret = kvm_vcpu_ioctl(CPU(cpu), KVM_GET_SREGS2, &sregs);
> > > @@ -3676,12 +3684,18 @@ static int kvm_get_sregs2(X86CPU *cpu)
> > > env->gdt.limit = sregs.gdt.limit;
> > > env->gdt.base = sregs.gdt.base;
> > >
> > > + cr0_old = env->cr[0];
> > > env->cr[0] = sregs.cr0;
> > > env->cr[2] = sregs.cr2;
> > > env->cr[3] = sregs.cr3;
> > > env->cr[4] = sregs.cr4;
> > >
> > > env->efer = sregs.efer;
> > > + if (sev_es_enabled() && env->efer & MSR_EFER_LME) {
> > > + if (!(cr0_old & CR0_PG_MASK) && env->cr[0] & CR0_PG_MASK) {
> > > + env->efer |= MSR_EFER_LMA;
> > > + }
> > > + }
> > >
> > > env->pdptrs_valid = sregs.flags & KVM_SREGS2_FLAGS_PDPTRS_VALID;
> > >