[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-ppc] [PATCH v2] spapr_drc: don't allow 'empty' DRCs to be unis
From: |
Michael Roth |
Subject: |
Re: [Qemu-ppc] [PATCH v2] spapr_drc: don't allow 'empty' DRCs to be unisolated |
Date: |
Wed, 09 Sep 2015 21:50:01 -0500 |
User-agent: |
alot/0.3.6 |
Quoting David Gibson (2015-09-09 20:18:01)
> On Wed, Sep 09, 2015 at 12:19:22PM -0500, Michael Roth wrote:
> > Quoting David Gibson (2015-09-08 23:10:41)
> > > On Tue, Sep 08, 2015 at 06:44:55PM -0500, Michael Roth wrote:
> > > > Logical resources start with allocation-state:UNUSABLE /
> > > > isolation-state:ISOLATED. During hotplug, guests will transition
> > > > them to allocate-state:USABLE, and then to isolate-state:UNISOLATED.
> > > > The former transition does not seem to have any failure path for
> > > > cases where a DRC does not have any resources associated with it to
> > > > allocate for guest, but instead relies on the subsequent
> > > > isolation-state:UNISOLATED transition to indicate failure in this
> > > > situation.
> > > >
> > > > Currently DRC code does not implement this logic, but instead
> > > > tries to indicate failure by refusing the allocation-state:USABLE
> > > > transition. Unfortunately, since that's not a documented failure
> > > > path, guests continue undeterred, causing undefined behavior in
> > > > QEMU and guest code.
> > > >
> > > > Fix this by handling things as PAPR defines (13.7 and 13.7.3.1).
> > > >
> > > > Cc: address@hidden
> > > > Cc: David Gibson <address@hidden>
> > > > Cc: Bharata B Rao <address@hidden>
> > > > Signed-off-by: Michael Roth <address@hidden>
> > > > ---
> > > > v2:
> > > > - actually include the full changeset in the patch
> > >
> > > Several queries for clarification:
> >
> > Looks like you've already picked this up, but just in case:
> >
> > >
> > > * Is this intended to replace Bharata's patch "spapr_drc:
> > > Return correct state for logical DR in entity_sense()" or to apply
> > > on top of it?
> >
> > Yes this should replace that patch.
> >
> > >
> > > * If I'm understanding correctly, the problem here is that although
> > > the guest is supposed to check for failures to set the allocation
> > > state, it's actually not? This patch is to make qemu gracefully
> > > handle the guest's failure to do this? Is that right?
> >
> > The root issue is that allocation state setting doesn't actually have
> > a documented failure path. Instead, the subsequent isolation state
> > setting is where we surface an error if we're unable to actually
> > provide the device to the guest (due to it not being attached
> > to the DRC in question).
>
> Um.. 13.7.3.1 (step 2) implies checking for failures on setting
> allocation state, and 13.5.3.5 Table 177 says "If the transition to
> the usable state is not possible the -3 (no such indicator
> implemented) status is returned."
>
> How is that not a documented failure path?
Hmm, I think I misunderpreted this statement from 13.7:
"The OS may use the get-sensor-state RTAS call with the dr-entity-sense
token to deter-
mine if a given drc-index refers to a connector that is currently usable
for DR operations. If the connector is not
currently usable the return state is “DR entity unusable” (2). A
set-indicator (isolation state) RTAS call to an unusable
connector or (dr-indicator) to any logical resource connector results in
a “No such indicator implemented” return sta-
tus."
To me it seemed to suggest that isolation-state is where we fail
attempts to bring up an UNUSABLE resource, but the statement doesn't
actually preclude us from returning error on setting allocation-state.
I'm not sure how I misinterpreted 13.7.3.1 to support this assumption,
it does seem to clearly indicate setting allocation-state can return
error (and in fact mentions nothing about accounting for isolation-state
errors in that case), and 13.5.3.5 makes it clear it should also be
-3 in that case.
I'll send a v3 with the set-allocation logic in place. I'll also look
into returning RTAS errors directly. Thanks for the catch!
>
> > We weren't surfacing that error in the isolation state setting, so the
> > guest was continuing on with configuring devices that aren't actually
> > present. This patch should correct that.
>
> So, to be clear, I think the check in set isolation-state is a good
> idea as well, to properly handle a misbehaving guest.
>
> > Personally it seems like allocation state setting is where that failure
> > should occur, since that's the earliest point to surface the error, but
> > that's how PAPR has it documented.
>
> .. but I'm not seeing how PAPR is documenting it as happening in set
> isolation state rather than set allocation-state.
>
> --
> David Gibson | I'll have my music baroque, and my code
> david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
> | _way_ _around_!
> http://www.ozlabs.org/~dgibson