Re: [Qemu-devel] [PATCH 05/23] hyperv: ensure VP index equal to QEMU cpu

qemu-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH 05/23] hyperv: ensure VP index equal to QEMU cpu

From:	Igor Mammedov
Subject:	Re: [Qemu-devel] [PATCH 05/23] hyperv: ensure VP index equal to QEMU cpu_index
Date:	Wed, 14 Jun 2017 17:31:45 +0200

On Wed, 14 Jun 2017 10:35:36 -0300
Eduardo Habkost <address@hidden> wrote:

> On Wed, Jun 14, 2017 at 03:24:50PM +0200, Igor Mammedov wrote:
> > On Wed, 14 Jun 2017 10:00:15 -0300
> > Eduardo Habkost <address@hidden> wrote:
> >   
> > > On Wed, Jun 14, 2017 at 02:25:07PM +0300, Roman Kagan wrote:  
> > > > On Tue, Jun 13, 2017 at 03:57:52PM -0300, Eduardo Habkost wrote:    
> > > > > On Tue, Jun 06, 2017 at 09:19:30PM +0300, Roman Kagan wrote:    
> > > > > > Hyper-V identifies vcpus by the virtual processor (VP) index which 
> > > > > > is
> > > > > > normally queried by the guest via HV_X64_MSR_VP_INDEX msr.
> > > > > > 
> > > > > > It has to be owned by QEMU in order to preserve it across migration.
> > > > > > 
> > > > > > However, the current implementation in KVM doesn't allow to set this
> > > > > > msr, and KVM uses its own notion of VP index.  Fortunately, the way
> > > > > > vcpus are created in QEMU/KVM basically guarantees that the KVM 
> > > > > > value is
> > > > > > equal to QEMU cpu_index.    
> > > > > 
> > > > > This might not be true in the future.  cpu_index is not a good
> > > > > identifier for CPUs, and we would like to remove it in the
> > > > > future.
> > > > > 
> > > > > But it looks like we have no choice, see extra comments below:    
> > > >     
> > > > > > +static void hyperv_set_vp_index(CPUState *cs)
> > > > > > +{
> > > > > > +    struct {
> > > > > > +        struct kvm_msrs info;
> > > > > > +        struct kvm_msr_entry entries[1];
> > > > > > +    } msr_data;
> > > > > > +    int ret;
> > > > > > +
> > > > > > +    msr_data.info.nmsrs = 1;
> > > > > > +    msr_data.entries[0].index = HV_X64_MSR_VP_INDEX;
> > > > > > +
> > > > > > +    /*
> > > > > > +     * Some kernels don't support setting HV_X64_MSR_VP_INDEX.  
> > > > > > However,
> > > > > > +     * they use VP_INDEX equal to cpu_index due to the way vcpus 
> > > > > > are created.
> > > > > > +     * So ignore the errors from SET_MSRS, but verify that 
> > > > > > GET_MSRS returns the
> > > > > > +     * expected value.
> > > > > > +     */    
> > > > > 
> > > > > Oh.  This sounds like a problem.  As reference for others, this
> > > > > is the KVM code in kvm_hv_get_msr():
> > > > > 
> > > > >       case HV_X64_MSR_VP_INDEX: {
> > > > >               int r;
> > > > >               struct kvm_vcpu *v;
> > > > > 
> > > > >               kvm_for_each_vcpu(r, v, vcpu->kvm) {
> > > > >                       if (v == vcpu) {
> > > > >                               data = r;
> > > > >                               break;
> > > > >                       }
> > > > >               }
> > > > >               break;
> > > > >       }
> > > > > 
> > > > > The problem with that is that it will break as soon as we create
> > > > > VCPUs in a different order.  Unsolvable on hosts that don't allow
> > > > > HV_X64_MSR_VP_INDEX to be set, however.    
> > > > 
> > > > Right, thanks for putting together a detailed explanation.
> > > > 
> > > > This was a thinko back then, not to have HV_X64_MSR_VP_INDEX maintained
> > > > by QEMU.  I'm going to post a patch to KVM fixing that.
> > > > 
> > > > Meanwhile QEMU needs a way to maintain its notion of vp_index that is
> > > >   1) in sync with kernel's notion
> > > >   2) also with kernels that don't support setting the msr
> > > >   3) persistent across migrations
> > > > 
> > > > cpu_index looked like a perfect candidate.    
> > > 
> > > I would like to be able to stop using cpu_index as a persistent
> > > VCPU identifier in the future.  I would also like to be able to
> > > create VCPUs in any order without breaking guest ABI.  But it
> > > seems to be impossible to do that without breaking vp_index on
> > > older kernels.
> > > 
> > > So cpu_index looks like the only candidate we have.
> > >   
> > > >     
> > > > > > +    msr_data.entries[0].data = cs->cpu_index;
> > > > > > +    kvm_vcpu_ioctl(cs, KVM_SET_MSRS, &msr_data);
> > > > > > +    ret = kvm_vcpu_ioctl(cs, KVM_GET_MSRS, &msr_data);
> > > > > > +    assert(ret == 1);
> > > > > > +    assert(msr_data.entries[0].data == cs->cpu_index);    
> > > > > 
> > > > > If KVM already initializes the MSR to cpu_index and we will abort
> > > > > if it was not set to anything except cpu_index here, why exactly
> > > > > do we need the KVM_SET_MSRS call?    
> > > > 
> > > > The kernel made no obligations to keep initializing its vp_index
> > > > identically to QEMU's cpu_index.  So we'd better check and abort if that
> > > > got out of sync.  Once KVM gains the ability to learn vp_index from QEMU
> > > > we'll no longer depend on that as we'll do explicit synchronization.    
> > > 
> > > I think the kernel now has an obligation to keep initializing
> > > HV_X64_MSR_VP_INDEX in a compatible way, or it will break older
> > > QEMU versions that don't set the MSR.
> > > 
> > > But if you don't think we can rely on that, then the KVM_SET_MSRS
> > > call won't hurt.  
> > 
> > if we can tell apart old index based vp_index kernel and new that supports
> > MSR setting (add cap check perhaps) then we should be able to
> >  - leave out old vp_index broken as it is now (for old kernels/old QEMU 
> > and/ new QEMU old machine types)
> >      i.e. do not set vp_index MSR in QEMU
> >  - in case of (new QEMU new machine type/new kernel) use APIC ID which 
> > would work
> >    with out of order CPU creation  
> 
> I agree with the proposal, but I don't see the need for a
> capability check:
> 
> Old machine-types can simply not call KVM_SET_MSRS, as you
> suggested.
> 
> New machine-types can just call KVM_SET_MSRS unconditionally, and
> warn the user in case the it fails and the APIC ID really doesn't
> match the index in KVM (which in practice will happen only if the
> user is also configuring a CPU topology explicitly).

I don't know anything about hyperv so here goes question:
   new machine booted on new kernel (using ACPI ID) and
   than migrated to new qemu on old kernel.
   Will it work or break. If it breaks we need cap check and if not
   we probably could do without it.

[Prev in Thread]

Current Thread

[Next in Thread]

Re: [Qemu-devel] [PATCH 05/23] hyperv: ensure VP index equal to QEMU cpu_index, (continued)
- [Qemu-devel] [PATCH 07/23] hyperv_testdev: refactor for readability, Roman Kagan, 2017/06/06
- [Qemu-devel] [PATCH 08/23] hyperv: cosmetic: g_malloc -> g_new, Roman Kagan, 2017/06/06
- [Qemu-devel] [PATCH 09/23] hyperv: synic: only setup ack notifier if there's a callback, Roman Kagan, 2017/06/06
- [Qemu-devel] [PATCH 12/23] hyperv: make HvSintRoute reference-counted, Roman Kagan, 2017/06/06
  - Re: [Qemu-devel] [PATCH 12/23] hyperv: make HvSintRoute reference-counted, Eduardo Habkost, 2017/06/14
    - Re: [Qemu-devel] [PATCH 12/23] hyperv: make HvSintRoute reference-counted, Roman Kagan, 2017/06/14
    - Re: [Qemu-devel] [PATCH 12/23] hyperv: make HvSintRoute reference-counted, Eduardo Habkost, 2017/06/23
- [Qemu-devel] [PATCH 10/23] hyperv: allow passing arbitrary data to sint ack callback, Roman Kagan, 2017/06/06
- [Qemu-devel] [PATCH 14/23] kvm-all: make async_safe_run_on_cpu safe on kvm too, Roman Kagan, 2017/06/06
  - Re: [Qemu-devel] [PATCH 14/23] kvm-all: make async_safe_run_on_cpu safe on kvm too, Paolo Bonzini, 2017/06/08

Prev by Date: [Qemu-devel] [PATCH 4/4] iotests: Add test for non-string option reopening
Next by Date: Re: [Qemu-devel] [PATCH 17/23] hyperv: add synic message delivery
Previous by thread: Re: [Qemu-devel] [PATCH 05/23] hyperv: ensure VP index equal to QEMU cpu_index
Next by thread: [Qemu-devel] [PATCH 07/23] hyperv_testdev: refactor for readability
Index(es):
- Date
- Thread