qemu-ppc
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] spapr: Rework hash<->radix transitions at CAS


From: David Gibson
Subject: Re: [PATCH] spapr: Rework hash<->radix transitions at CAS
Date: Fri, 14 Feb 2020 09:28:35 +1100

On Thu, Feb 13, 2020 at 04:38:38PM +0100, Greg Kurz wrote:
> Until the CAS negotiation is over, an HPT can be allocated on three
> different paths:
> 
> 1) during machine reset if the host doesn't support radix,
> 
> 2) during CAS if the guest wants hash and doesn't support HPT resizing,
>    in which case we pre-emptively resize the HPT to accomodate maxram,
> 
> 3) during CAS if no CAS reboot was requested, the guest wants hash but
>    we're currently configured for radix.
> 
> Depending on the various combinations of host or guest MMU support,
> HPT resizing guest support and the possibility of a CAS reboot, it
> is quite hard to know which of these allocates the HPT that will
> be ultimately used by the guest that wants to do hash. Also, some of
> them have bugs:
> 
> - 2) calls spapr_reallocate_hpt() instead of spapr_setup_hpt_and_vrma()
>   and thus doesn't update the VRMA size, even though we've just extended
>   the HPT. Not sure what issues this can cause,
> 
> - 3) doesn't check for HPT resizing support and will always allocate a
>   small HPT based on the initial RAM size. This caps the total amount of
>   RAM the guest can see, especially if maxram is much higher than the
>   initial ram.
> 
> We only support guests that do CAS and we already assume that the HPT
> isn't being used when we do the pre-emptive resizing at CAS. It thus
> seems reasonable to only allocate the HPT at the end of CAS, when no
> CAS reboot was requested.
> 
> Consolidate the logic so that we only create the HPT during 3), ie.
> when we're done with the CAS reboot cycles, and ensure HPT resizing
> is taken into account. This fixes the radix->hash transition for
> all cases.

Uh.. I'm pretty sure this can't work for KVM on a POWER8 host.  We
need the HPT at all times there, or there's nowhere to put VRMA
entries, so we can't run even in real mode.

> The guest can theoretically call CAS several times, without a CAS
> reboot in between. Linux guests don't do that, but better safe than
> sorry, let's ensure we can also handle the symmetrical hash->radix
> transition correctly: free the HPT and set the GR bit in PATE.
> An helper is introduced for the latter since this is already what
> we do during machine reset when going for radix.
> 
> As a bonus, this removes one user of spapr->cas_reboot, which we
> want to get rid of in the future.
> 
> Signed-off-by: Greg Kurz <address@hidden>
> ---
>  hw/ppc/spapr.c         |   25 +++++++++++++++-----
>  hw/ppc/spapr_hcall.c   |   59 
> ++++++++++++++++++++----------------------------
>  include/hw/ppc/spapr.h |    1 +
>  3 files changed, 44 insertions(+), 41 deletions(-)
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index 828e2cc1359a..88bc0e4e3ca1 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -1573,9 +1573,19 @@ void spapr_setup_hpt_and_vrma(SpaprMachineState *spapr)
>  {
>      int hpt_shift;
>  
> +    /*
> +     * HPT resizing is a bit of a special case, because when enabled
> +     * we assume an HPT guest will support it until it says it
> +     * doesn't, instead of assuming it won't support it until it says
> +     * it does.  Strictly speaking that approach could break for
> +     * guests which don't make a CAS call, but those are so old we
> +     * don't care about them.  Without that assumption we'd have to
> +     * make at least a temporary allocation of an HPT sized for max
> +     * memory, which could be impossibly difficult under KVM HV if
> +     * maxram is large.
> +     */
>      if ((spapr->resize_hpt == SPAPR_RESIZE_HPT_DISABLED)
> -        || (spapr->cas_reboot
> -            && !spapr_ovec_test(spapr->ov5_cas, OV5_HPT_RESIZE))) {
> +        || !spapr_ovec_test(spapr->ov5_cas, OV5_HPT_RESIZE)) {
>          hpt_shift = spapr_hpt_shift_for_ramsize(MACHINE(spapr)->maxram_size);
>      } else {
>          uint64_t current_ram_size;
> @@ -1604,6 +1614,12 @@ static int spapr_reset_drcs(Object *child, void 
> *opaque)
>      return 0;
>  }
>  
> +void spapr_reset_patb_entry(SpaprMachineState *spapr)
> +{
> +    spapr->patb_entry = PATE1_GR;
> +    spapr_set_all_lpcrs(LPCR_HR | LPCR_UPRT, LPCR_HR | LPCR_UPRT);
> +}
> +
>  static void spapr_machine_reset(MachineState *machine)
>  {
>      SpaprMachineState *spapr = SPAPR_MACHINE(machine);
> @@ -1624,10 +1640,7 @@ static void spapr_machine_reset(MachineState *machine)
>           * without a HPT because KVM will start them in radix mode.
>           * Set the GR bit in PATE so that we know there is no HPT.
>           */
> -        spapr->patb_entry = PATE1_GR;
> -        spapr_set_all_lpcrs(LPCR_HR | LPCR_UPRT, LPCR_HR | LPCR_UPRT);
> -    } else {
> -        spapr_setup_hpt_and_vrma(spapr);
> +        spapr_reset_patb_entry(spapr);
>      }
>  
>      qemu_devices_reset();
> diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c
> index b8bb66b5c0d4..57ddf0fa6d05 100644
> --- a/hw/ppc/spapr_hcall.c
> +++ b/hw/ppc/spapr_hcall.c
> @@ -1677,6 +1677,7 @@ static target_ulong 
> h_client_architecture_support(PowerPCCPU *cpu,
>      bool raw_mode_supported = false;
>      bool guest_xive;
>      CPUState *cs;
> +    int maxshift = spapr_hpt_shift_for_ramsize(MACHINE(spapr)->maxram_size);
>  
>      /* CAS is supposed to be called early when only the boot vCPU is active. 
> */
>      CPU_FOREACH(cs) {
> @@ -1739,36 +1740,6 @@ static target_ulong 
> h_client_architecture_support(PowerPCCPU *cpu,
>  
>      guest_xive = spapr_ovec_test(ov5_guest, OV5_XIVE_EXPLOIT);
>  
> -    /*
> -     * HPT resizing is a bit of a special case, because when enabled
> -     * we assume an HPT guest will support it until it says it
> -     * doesn't, instead of assuming it won't support it until it says
> -     * it does.  Strictly speaking that approach could break for
> -     * guests which don't make a CAS call, but those are so old we
> -     * don't care about them.  Without that assumption we'd have to
> -     * make at least a temporary allocation of an HPT sized for max
> -     * memory, which could be impossibly difficult under KVM HV if
> -     * maxram is large.
> -     */
> -    if (!guest_radix && !spapr_ovec_test(ov5_guest, OV5_HPT_RESIZE)) {
> -        int maxshift = 
> spapr_hpt_shift_for_ramsize(MACHINE(spapr)->maxram_size);
> -
> -        if (spapr->resize_hpt == SPAPR_RESIZE_HPT_REQUIRED) {
> -            error_report(
> -                "h_client_architecture_support: Guest doesn't support HPT 
> resizing, but resize-hpt=required");
> -            exit(1);
> -        }
> -
> -        if (spapr->htab_shift < maxshift) {
> -            /* Guest doesn't know about HPT resizing, so we
> -             * pre-emptively resize for the maximum permitted RAM.  At
> -             * the point this is called, nothing should have been
> -             * entered into the existing HPT */
> -            spapr_reallocate_hpt(spapr, maxshift, &error_fatal);
> -            push_sregs_to_kvm_pr(spapr);
> -        }
> -    }
> -
>      /* NOTE: there are actually a number of ov5 bits where input from the
>       * guest is always zero, and the platform/QEMU enables them independently
>       * of guest input. To model these properly we'd want some sort of mask,
> @@ -1806,6 +1777,12 @@ static target_ulong 
> h_client_architecture_support(PowerPCCPU *cpu,
>              error_report("Guest requested unavailable MMU mode (hash).");
>              exit(EXIT_FAILURE);
>          }
> +        if (spapr->resize_hpt == SPAPR_RESIZE_HPT_REQUIRED &&
> +            !spapr_ovec_test(ov5_guest, OV5_HPT_RESIZE)) {
> +            error_report(
> +                "h_client_architecture_support: Guest doesn't support HPT 
> resizing, but resize-hpt=required");
> +            exit(1);
> +        }
>      }
>      spapr->cas_pre_isa3_guest = !spapr_ovec_test(ov1_guest, OV1_PPC_3_00);
>      spapr_ovec_cleanup(ov1_guest);
> @@ -1838,11 +1815,23 @@ static target_ulong 
> h_client_architecture_support(PowerPCCPU *cpu,
>          void *fdt;
>          SpaprDeviceTreeUpdateHeader hdr = { .version_id = 1 };
>  
> -        /* If spapr_machine_reset() did not set up a HPT but one is necessary
> -         * (because the guest isn't going to use radix) then set it up here. 
> */
> -        if ((spapr->patb_entry & PATE1_GR) && !guest_radix) {
> -            /* legacy hash or new hash: */
> -            spapr_setup_hpt_and_vrma(spapr);
> +        if (!guest_radix) {
> +            /*
> +             * Either spapr_machine_reset() did not set up a HPT but one
> +             * is necessary (because the guest isn't going to use radix),
> +             * or the guest doesn't know about HPT resizing and we need to
> +             * pre-emptively resize for the maximum permitted RAM. Set it
> +             * up here. At the point this is called, nothing should have
> +             * been entered into the existing HPT.
> +             */
> +            if (spapr->patb_entry & PATE1_GR || spapr->htab_shift < 
> maxshift) {
> +                /* legacy hash or new hash: */
> +                spapr_setup_hpt_and_vrma(spapr);
> +                push_sregs_to_kvm_pr(spapr);
> +            }
> +        } else {
> +            spapr_free_hpt(spapr);
> +            spapr_reset_patb_entry(spapr);
>          }
>  
>          if (fdt_bufsize < sizeof(hdr)) {
> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> index 09110961a589..9d88b5596481 100644
> --- a/include/hw/ppc/spapr.h
> +++ b/include/hw/ppc/spapr.h
> @@ -919,4 +919,5 @@ void spapr_check_pagesize(SpaprMachineState *spapr, 
> hwaddr pagesize,
>  
>  void spapr_set_all_lpcrs(target_ulong value, target_ulong mask);
>  hwaddr spapr_get_rtas_addr(void);
> +void spapr_reset_patb_entry(SpaprMachineState *spapr);
>  #endif /* HW_SPAPR_H */
> 

-- 
David Gibson                    | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
                                | _way_ _around_!
http://www.ozlabs.org/~dgibson

Attachment: signature.asc
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]