qemu-ppc
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 9/9] spapr: implement nested-hv capability for the virtual hy


From: Nicholas Piggin
Subject: Re: [PATCH 9/9] spapr: implement nested-hv capability for the virtual hypervisor
Date: Wed, 16 Feb 2022 11:16:32 +1000

Excerpts from Cédric Le Goater's message of February 16, 2022 4:21 am:
> On 2/15/22 04:16, Nicholas Piggin wrote:
>> This implements the Nested KVM HV hcall API for spapr under TCG.
>> 
>> The L2 is switched in when the H_ENTER_NESTED hcall is made, and the
>> L1 is switched back in returned from the hcall when a HV exception
>> is sent to the vhyp. Register state is copied in and out according to
>> the nested KVM HV hcall API specification.
>> 
>> The hdecr timer is started when the L2 is switched in, and it provides
>> the HDEC / 0x980 return to L1.
>> 
>> The MMU re-uses the bare metal radix 2-level page table walker by
>> using the get_pate method to point the MMU to the nested partition
>> table entry. MMU faults due to partition scope errors raise HV
>> exceptions and accordingly are routed back to the L1.
>> 
>> The MMU does not tag translations for the L1 (direct) vs L2 (nested)
>> guests, so the TLB is flushed on any L1<->L2 transition (hcall entry
>> and exit).
>> 
>> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
>> ---
>>   hw/ppc/spapr.c         |  32 +++-
>>   hw/ppc/spapr_caps.c    |  11 +-
>>   hw/ppc/spapr_hcall.c   | 321 +++++++++++++++++++++++++++++++++++++++++
>>   include/hw/ppc/spapr.h |  74 +++++++++-
>>   target/ppc/cpu.h       |   3 +
>>   5 files changed, 431 insertions(+), 10 deletions(-)
>> 
>> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
>> index 3a5cf92c94..6988e3ec76 100644
>> --- a/hw/ppc/spapr.c
>> +++ b/hw/ppc/spapr.c
>> @@ -1314,11 +1314,32 @@ static bool spapr_get_pate(PPCVirtualHypervisor 
>> *vhyp, PowerPCCPU *cpu,
>>   {
>>       SpaprMachineState *spapr = SPAPR_MACHINE(vhyp);
>>   
>> -    assert(lpid == 0);
>> +    if (!cpu->in_spapr_nested) {
> 
> Since 'in_spapr_nested' is a spapr CPU characteristic, I don't think
> it belongs to PowerPCCPU. See the end of the patch, for a proposal.

SpaprCpuState. Certainly that's a better place, I must have missed it.

> 
> btw, this helps the ordering of files :
> 
> [diff]
>       orderFile = /path/to/qemu/scripts/git.orderfile
> 
>> +        assert(lpid == 0);
>>   
>> -    /* Copy PATE1:GR into PATE0:HR */
>> -    entry->dw0 = spapr->patb_entry & PATE0_HR;
>> -    entry->dw1 = spapr->patb_entry;
>> +        /* Copy PATE1:GR into PATE0:HR */
>> +        entry->dw0 = spapr->patb_entry & PATE0_HR;
>> +        entry->dw1 = spapr->patb_entry;
>> +
>> +    } else {
>> +        uint64_t patb, pats;
>> +
>> +        assert(lpid != 0);
>> +
>> +        patb = spapr->nested_ptcr & PTCR_PATB;
>> +        pats = spapr->nested_ptcr & PTCR_PATS;
>> +
>> +        /* Calculate number of entries */
>> +        pats = 1ull << (pats + 12 - 4);
>> +        if (pats <= lpid) {
>> +            return false;
>> +        }
>> +
>> +        /* Grab entry */
>> +        patb += 16 * lpid;
>> +        entry->dw0 = ldq_phys(CPU(cpu)->as, patb);
>> +        entry->dw1 = ldq_phys(CPU(cpu)->as, patb + 8);
>> +    }
>>   
>>       return true;
>>   }
>> @@ -4472,7 +4493,7 @@ PowerPCCPU *spapr_find_cpu(int vcpu_id)
>>   
>>   static bool spapr_cpu_in_nested(PowerPCCPU *cpu)
>>   {
>> -    return false;
>> +    return cpu->in_spapr_nested;
>>   }
>>   
>>   static void spapr_cpu_exec_enter(PPCVirtualHypervisor *vhyp, PowerPCCPU 
>> *cpu)
>> @@ -4584,6 +4605,7 @@ static void spapr_machine_class_init(ObjectClass *oc, 
>> void *data)
>>       nc->nmi_monitor_handler = spapr_nmi;
>>       smc->phb_placement = spapr_phb_placement;
>>       vhc->cpu_in_nested = spapr_cpu_in_nested;
>> +    vhc->deliver_hv_excp = spapr_exit_nested;
>>       vhc->hypercall = emulate_spapr_hypercall;
>>       vhc->hpt_mask = spapr_hpt_mask;
>>       vhc->map_hptes = spapr_map_hptes;
>> diff --git a/hw/ppc/spapr_caps.c b/hw/ppc/spapr_caps.c
>> index 5cc80776d0..4d8bb2ad2c 100644
>> --- a/hw/ppc/spapr_caps.c
>> +++ b/hw/ppc/spapr_caps.c
>> @@ -444,19 +444,22 @@ static void cap_nested_kvm_hv_apply(SpaprMachineState 
>> *spapr,
>>   {
>>       ERRP_GUARD();
>>       PowerPCCPU *cpu = POWERPC_CPU(first_cpu);
>> +    CPUPPCState *env = &cpu->env;
>>   
>>       if (!val) {
>>           /* capability disabled by default */
>>           return;
>>       }
>>   
>> -    if (tcg_enabled()) {
>> -        error_setg(errp, "No Nested KVM-HV support in TCG");
> 
> I don't like using KVM-HV (which is KVM-over-PowerNV) when talking about
> KVM-over-pseries. I think the platform name is important. Anyhow, this is
> a more global discussion but we should talk about it someday because these
> HV mode are becoming confusing ! We have PR also :)

The cap is nested-hv and QEMU describes it nested KVM HV. Are we stuck
with that? That could make a name change even more confusing.

It's really a new backend for the KVM HV front end. Like how POWER8 /
POWER9 bare metal backends are completely different now.

But I guess that does not help the end user to understand. On the other
hand, the user might not think "HV" is the HV mode of the CPU and just
thinks of it as "hypervisor".

I like paravirt-hv but nested-hv is not too bad. Anyway I'm happy to
change it.

> 
> 
>> +    if (!(env->insns_flags2 & PPC2_ISA300)) {
>> +        error_setg(errp, "Nested KVM-HV only supported on POWER9 and 
>> later");
>>           error_append_hint(errp, "Try appending -machine 
>> cap-nested-hv=off\n");
> 
> return ?

Yep.

>> +static target_ulong h_enter_nested(PowerPCCPU *cpu,
>> +                                   SpaprMachineState *spapr,
>> +                                   target_ulong opcode,
>> +                                   target_ulong *args)
>> +{
>> +    PowerPCCPUClass *pcc = POWERPC_CPU_GET_CLASS(cpu);
>> +    CPUState *cs = CPU(cpu);
>> +    CPUPPCState *env = &cpu->env;
>> +    target_ulong hv_ptr = args[0];
>> +    target_ulong regs_ptr = args[1];
>> +    target_ulong hdec, now = cpu_ppc_load_tbl(env);
>> +    target_ulong lpcr, lpcr_mask;
>> +    struct kvmppc_hv_guest_state *hvstate;
>> +    struct kvmppc_hv_guest_state hv_state;
>> +    struct kvmppc_pt_regs *regs;
>> +    hwaddr len;
>> +    uint32_t cr;
>> +    int i;
>> +
>> +    if (cpu->in_spapr_nested) {
>> +        return H_FUNCTION;
> 
> That would be an L3 :)

Well if the L2 makes the hcall, vhyp won't handle it but rather it 
will cause L2 exit to L1 and the L1 will handle the H_ENTER_NESTED
hcall. So we can (and have) run an L3 guest under the L2 of this
machine :)

This is probably more of an assert(!cpu->in_spapr_nested). Actually
that assert could go in the general spapr hypercall handler.

> 
>> +    }
>> +    if (spapr->nested_ptcr == 0) {
>> +        return H_NOT_AVAILABLE;
>> +    }
>> +
>> +    len = sizeof(*hvstate);
>> +    hvstate = cpu_physical_memory_map(hv_ptr, &len, true);
> 
> When a CPU is available, I would prefer :
> 
>      hvstate = address_space_map(CPU(cpu)->as, hv_ptr, &len, true,
>                                MEMTXATTRS_UNSPECIFIED);
>      
> like ppc_hash64_map_hptes() does. This is minor.

I'll check it out. Still not entire sure about read+write access
though.

> 
>> +    if (!hvstate || len != sizeof(*hvstate)) {
>> +        return H_PARAMETER;
>> +    }
>> +
>> +    memcpy(&hv_state, hvstate, len);
>> +
>> +    cpu_physical_memory_unmap(hvstate, len, 0 /* read */, len /* access len 
>> */);
> 
> checkpatch will complain with the above comments.

Yeah it did. Turns out I also had a bug where I missed setting write 
access further down.

> 
>> +
>> +    /*
>> +     * We accept versions 1 and 2. Version 2 fields are unused because TCG
>> +     * does not implement DAWR*.
>> +     */
>> +    if (hv_state.version > HV_GUEST_STATE_VERSION) {
>> +        return H_PARAMETER;
>> +    }
>> +
>> +    cpu->nested_host_state = g_try_malloc(sizeof(CPUPPCState));
> 
> I think we could preallocate this buffer once we know nested are supported,
> or if we keep it, it could be our 'in_spapr_nested' indicator.

That's true. I kind of liked to allocate on demand, but for performance 
and robustness might be better to keep it around (could allocate when we
see a H_SET_PARTITION_TABLE.

I'll just keep it as is for the first iteration. Probably in fact we 
would rather make a specific structure for it that only has what we
require rather than the entire CPUPPCState so all this can be optimised
a bit in a later round.

>> +struct kvmppc_hv_guest_state {
>> +    uint64_t version;        /* version of this structure layout, must be 
>> first */
>> +    uint32_t lpid;
>> +    uint32_t vcpu_token;
>> +    /* These registers are hypervisor privileged (at least for writing) */
>> +    uint64_t lpcr;
>> +    uint64_t pcr;
>> +    uint64_t amor;
>> +    uint64_t dpdes;
>> +    uint64_t hfscr;
>> +    int64_t tb_offset;
>> +    uint64_t dawr0;
>> +    uint64_t dawrx0;
>> +    uint64_t ciabr;
>> +    uint64_t hdec_expiry;
>> +    uint64_t purr;
>> +    uint64_t spurr;
>> +    uint64_t ic;
>> +    uint64_t vtb;
>> +    uint64_t hdar;
>> +    uint64_t hdsisr;
>> +    uint64_t heir;
>> +    uint64_t asdr;
>> +    /* These are OS privileged but need to be set late in guest entry */
>> +    uint64_t srr0;
>> +    uint64_t srr1;
>> +    uint64_t sprg[4];
>> +    uint64_t pidr;
>> +    uint64_t cfar;
>> +    uint64_t ppr;
>> +    /* Version 1 ends here */
>> +    uint64_t dawr1;
>> +    uint64_t dawrx1;
>> +    /* Version 2 ends here */
>> +};
>> +
>> +/* Latest version of hv_guest_state structure */
>> +#define HV_GUEST_STATE_VERSION  2
>> +
>> +/* Linux 64-bit powerpc pt_regs struct, used by nested HV */
>> +struct kvmppc_pt_regs {
>> +    uint64_t gpr[32];
>> +    uint64_t nip;
>> +    uint64_t msr;
>> +    uint64_t orig_gpr3;    /* Used for restarting system calls */
>> +    uint64_t ctr;
>> +    uint64_t link;
>> +    uint64_t xer;
>> +    uint64_t ccr;
>> +    uint64_t softe;        /* Soft enabled/disabled */
>> +    uint64_t trap;         /* Reason for being here */
>> +    uint64_t dar;          /* Fault registers */
>> +    uint64_t dsisr;        /* on 4xx/Book-E used for ESR */
>> +    uint64_t result;       /* Result of a system call */
>> +};
> 
> The above structs are shared with KVM for this QEMU implementation.
> 
> I don't think they belong to asm-powerpc/kvm.h but how could we keep them
> in sync ? The version should be protecting us from unexpected changes.

Not sure how we should do that. How are other PAPR API definitions kept
in synch? I guess they just have a document spec for the upstream. Paul
made a spec document for the nested HV stuff, not sure if he's put it up
in public anywhere. Maybe we could maintain it in linux/Documentation/
or similar?

Anyway for now I guess we keep this?

>> diff --git a/target/ppc/cpu.h b/target/ppc/cpu.h
>> index d8cc956c97..65c4401130 100644
>> --- a/target/ppc/cpu.h
>> +++ b/target/ppc/cpu.h
>> @@ -1301,6 +1301,9 @@ struct PowerPCCPU {
>>       bool pre_2_10_migration;
>>       bool pre_3_0_migration;
>>       int32_t mig_slb_nr;
>> +
>> +    bool in_spapr_nested;
>> +    CPUPPCState *nested_host_state;
>>   };   
> 
> These new fields belong to SpaprCpuState. I shouldn't be too hard to adapt.

Thanks for the pointer, that's what I was looking for. Must not have
looked very hard :)

Thanks,
Nick



reply via email to

[Prev in Thread] Current Thread [Next in Thread]