[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH v3 7/7] spapr_drc.c: use DRC reconfiguration to cleanup DIMM
From: |
David Gibson |
Subject: |
Re: [PATCH v3 7/7] spapr_drc.c: use DRC reconfiguration to cleanup DIMM unplug state |
Date: |
Wed, 17 Feb 2021 13:31:29 +1100 |
On Thu, Feb 11, 2021 at 07:52:46PM -0300, Daniel Henrique Barboza wrote:
> Handling errors in memory hotunplug in the pSeries machine is more complex
> than any other device type, because there are all the complications that other
> devices has, and more.
>
> For instance, determining a timeout for a DIMM hotunplug must consider if
> it's a
> Hash-MMU or a Radix-MMU guest, because Hash guests takes longer to hotunplug
> DIMMs.
> The size of the DIMM is also a factor, given that longer DIMMs naturally takes
> longer to be hotunplugged from the kernel. And there's also the guest memory
> usage to
> be considered: if there's a process that is consuming memory that would be
> lost by
> the DIMM unplug, the kernel will postpone the unplug process until the process
> finishes, and then initiate the regular hotunplug process. The first two
> considerations are manageable, but the last one is a deal breaker.
>
> There is no sane way for the pSeries machine to determine the memory load in
> the guest
> when attempting a DIMM hotunplug - and even if there was a way, the guest can
> start
> using all the RAM in the middle of the unplug process and invalidate our
> previous
> assumptions - and in result we can't even begin to calculate a timeout for the
> operation. This means that we can't implement a viable timeout mechanism for
> memory
> unplug in pSeries.
>
> Going back to why we would consider an unplug timeout, the reason is that we
> can't
> know if the kernel is giving up the unplug. Turns out that, sometimes, we can.
> Consider a failed memory hotunplug attempt where the kernel will error out
> with
> the following message:
>
> 'pseries-hotplug-mem: Memory indexed-count-remove failed, adding any removed
> LMBs'
>
> This happens when there is a LMB that the kernel gave up in removing, and the
> LMBs
> marked for removal of the same DIMM are now being added back. This process
> happens
We need to be a little careful about terminology here. From the
guest's point of view, there's no such thing as a DIMM, only LMBs.
What the guest is doing here is essentially rejecting a single "index
+ number" DRC unplug request, which corresponds to one DIMM on the
qemu side.
> in the pseries kernel in [1], dlpar_memory_remove_by_ic() into
> dlpar_add_lmb(), and
> after that update_lmb_associativity_index(). In this function, the kernel is
> configuring
> the LMB DRC connector again. Note that this is a valid usage in LOPAR, as
> stated in
> section "ibm,configure-connector RTAS Call":
>
> 'A subsequent sequence of calls to ibm,configure-connector with the same
> entry from
> the “ibm,drc-indexes” or “ibm,drc-info” property will restart the
> configuration of
> devices which were not completely configured.'
>
> We can use this kernel behavior in our favor. If a DRC connector
> reconfiguration
> for a LMB that we marked as unplug pending happens, this indicates that the
> kernel
> changed its mind about the unplug and is reasserting that it will keep using
> the
> DIMM. In this case, it's safe to assume that the whole DIMM unplug was
> cancelled.
>
> This patch hops into rtas_ibm_configure_connector() and, in the scenario
> described
> above, clear the unplug state for the DIMM device. This will not solve all the
> problems we still have with memory unplug, but it will cover this case where
> the
> kernel reconfigures LMBs after a failed unplug. We are a bit more resilient,
> without using an unreliable timeout, and we didn't make the remaining error
> cases
> any worse.
I wonder if we could use this as a beginning of a hotplug failure
reporting mechanism. As noted, this is explicitly allowed by PAPR and
I think in general it makes sense that a configure-connector would
re-assert that the guest is using the resource and we can't unplug it.
Could we extend guests to do an indicative configure-connector on any
unplug it knows it can't complete? Or if configure-connector is too
disruptive could we use an (extra) H_SET_INDICATOR to "UNISOLATE"
state? If I'm reading right, that should be both permitted and a no-op
for existing PAPR implementations, so it should be a pretty safe way
to add that indication.
>
> [1] arch/powerpc/platforms/pseries/hotplug-memory.c
>
> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
> ---
> hw/ppc/spapr.c | 30 ++++++++++++++++++++++++++++++
> hw/ppc/spapr_drc.c | 14 ++++++++++++++
> include/hw/ppc/spapr.h | 2 ++
> 3 files changed, 46 insertions(+)
>
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index ecce8abf14..4bcded4a1a 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -3575,6 +3575,36 @@ static SpaprDimmState
> *spapr_recover_pending_dimm_state(SpaprMachineState *ms,
> return spapr_pending_dimm_unplugs_add(ms, avail_lmbs, dimm);
> }
>
> +void spapr_clear_pending_dimm_unplug_state(SpaprMachineState *spapr,
> + PCDIMMDevice *dimm)
> +{
> + SpaprDimmState *ds = spapr_pending_dimm_unplugs_find(spapr, dimm);
> + SpaprDrc *drc;
> + uint32_t nr_lmbs;
> + uint64_t size, addr_start, addr;
> + int i;
> +
> + if (ds) {
> + spapr_pending_dimm_unplugs_remove(spapr, ds);
> + }
Hrm... how would !ds arise? Could this just be an assert?
> +
> + size = memory_device_get_region_size(MEMORY_DEVICE(dimm), &error_abort);
> + nr_lmbs = size / SPAPR_MEMORY_BLOCK_SIZE;
> +
> + addr_start = object_property_get_uint(OBJECT(dimm), PC_DIMM_ADDR_PROP,
> + &error_abort);
> +
> + addr = addr_start;
> + for (i = 0; i < nr_lmbs; i++) {
> + drc = spapr_drc_by_id(TYPE_SPAPR_DRC_LMB,
> + addr / SPAPR_MEMORY_BLOCK_SIZE);
> + g_assert(drc);
> +
> + drc->unplug_requested = false;
> + addr += SPAPR_MEMORY_BLOCK_SIZE;
> + }
> +}
> +
> /* Callback to be called during DRC release. */
> void spapr_lmb_release(DeviceState *dev)
> {
> diff --git a/hw/ppc/spapr_drc.c b/hw/ppc/spapr_drc.c
> index c143bfb6d3..eae941233a 100644
> --- a/hw/ppc/spapr_drc.c
> +++ b/hw/ppc/spapr_drc.c
> @@ -1230,6 +1230,20 @@ static void rtas_ibm_configure_connector(PowerPCCPU
> *cpu,
>
> drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
>
> + /*
> + * This indicates that the kernel is reconfiguring a LMB due to
> + * a failed hotunplug. Clear the pending unplug state for the whole
> + * DIMM.
> + */
> + if (spapr_drc_type(drc) == SPAPR_DR_CONNECTOR_TYPE_LMB &&
> + drc->unplug_requested) {
> +
> + /* This really shouldn't happen in this point, but ... */
> + g_assert(drc->dev);
I'm a little worried that a buggy or malicious guest could trigger
this assert.
> +
> + spapr_clear_pending_dimm_unplug_state(spapr, PC_DIMM(drc->dev));
> + }
> +
> if (!drc->fdt) {
> void *fdt;
> int fdt_size;
> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> index ccbeeca1de..5bcc8f3bb8 100644
> --- a/include/hw/ppc/spapr.h
> +++ b/include/hw/ppc/spapr.h
> @@ -847,6 +847,8 @@ int spapr_hpt_shift_for_ramsize(uint64_t ramsize);
> int spapr_reallocate_hpt(SpaprMachineState *spapr, int shift, Error **errp);
> void spapr_clear_pending_events(SpaprMachineState *spapr);
> void spapr_clear_pending_hotplug_events(SpaprMachineState *spapr);
> +void spapr_clear_pending_dimm_unplug_state(SpaprMachineState *spapr,
> + PCDIMMDevice *dimm);
> int spapr_max_server_number(SpaprMachineState *spapr);
> void spapr_store_hpte(PowerPCCPU *cpu, hwaddr ptex,
> uint64_t pte0, uint64_t pte1);
--
David Gibson | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson
signature.asc
Description: PGP signature
- Re: [PATCH v3 4/7] spapr: rename spapr_drc_detach() to spapr_drc_unplug_request(), (continued)
Re: [PATCH v3 0/7] CPU unplug timeout/LMB unplug cleanup in DRC reconfiguration, David Gibson, 2021/02/16