qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH RFC] vfio-ccw: forward halt/clear errors


From: Cornelia Huck
Subject: Re: [PATCH RFC] vfio-ccw: forward halt/clear errors
Date: Mon, 17 May 2021 19:31:45 +0200

On Wed, 12 May 2021 16:19:15 -0400
Eric Farman <farman@linux.ibm.com> wrote:

> On Tue, 2021-05-11 at 17:11 +0200, Cornelia Huck wrote:
> > hsch and csch basically have two parts: execute the command,
> > and perform the halt/clear function. For fully emulated
> > subchannels, it is pretty clear how it will work: check the
> > subchannel state, and actually 'perform the halt/clear function'
> > and set cc 0 if everything looks good.
> > 
> > For passthrough subchannels, some of the checking is done
> > within QEMU, but some has to be done within the kernel. QEMU's
> > subchannel state may be such that we can perform the async
> > function, but the kernel may still get a cc != 0 when it is
> > actually executing the instruction. In that case, we need to
> > set the condition actually encountered by the kernel; if we
> > set cc 0 on error, we would actually need to inject an interrupt
> > as well.
> > 
> > Signed-off-by: Cornelia Huck <cohuck@redhat.com>
> > ---
> > 
> > Stumbled over this during the vfio-ccw kernel locking discussions.
> > 
> > This is probably a corner case, and I'm not sure how I can actually
> > get this path excercised, but it passes my smoke tests.  
> 
> I'll see if I can hammer my way into some of this.

Thanks, that would be cool.

> 
> > 
> > Not sure whether this is the way to go.   
> 
> I think it seems reasonable.
> 
> > The unit exceptions in the
> > halt/clear error paths also seem slightly fishy.  
> 
> It is peculiar. Looking at the old POPS references, the unit exception
> states that the --device-- detected something unusual, not really the
> subchannel (which is how vfio-ccw is behaving). But, providing some
> indication that something went seriously wrong is good. Which I guess
> was the point of the UE code, even though it's obviously set up to be
> generated after a failure on the START.

Yes, I had copied it over when I wired up halt/clear.

> 
> I guess at the least, we need to clean up the FCTL based on the
> function that failed, rather than only cleaning up the START function.

Makes sense.

> The UE itself may just be an extra "hey this is busted" indicator.

It still feels a bit odd, but I'm not sure that there's a better
alternative.

> 
> > 
> > ---
> >  hw/s390x/css.c | 34 ++++++++++++++++++++++++++++++----
> >  hw/vfio/ccw.c  |  4 ++--
> >  2 files changed, 32 insertions(+), 6 deletions(-)
> > 
> > diff --git a/hw/s390x/css.c b/hw/s390x/css.c
> > index bed46f5ec3a2..ce2e903ca25a 100644
> > --- a/hw/s390x/css.c
> > +++ b/hw/s390x/css.c
> > @@ -1206,23 +1206,49 @@ static void
> > sch_handle_start_func_virtual(SubchDev *sch)
> >  
> >  }
> >  
> > -static void sch_handle_halt_func_passthrough(SubchDev *sch)
> > +static IOInstEnding sch_handle_halt_func_passthrough(SubchDev *sch)
> >  {
> >      int ret;
> >  
> >      ret = s390_ccw_halt(sch);
> >      if (ret == -ENOSYS) {
> >          sch_handle_halt_func(sch);
> > +        return IOINST_CC_EXPECTED;  
> 
> This is the fallback, so makes sense. You could fold it into the switch
> table, but since that's for "stuff from the kernel" versus -ENOSYS says
> "there's no way to call the kernel" I guess this is fine too.

Yes, that was the reason I kept that separate. (I don't expect it to be
a heavily exercised path nowadays anyway.)

> 
> > +    }
> > +    /*
> > +     * Some conditions may have been detected prior to starting the
> > halt
> > +     * function; map them to the correct cc.
> > +     */
> > +    switch (ret) {
> > +    case -EBUSY:
> > +        return IOINST_CC_BUSY;
> > +    case -ENODEV:
> > +    case -EACCES:
> > +        return IOINST_CC_NOT_OPERATIONAL;
> > +    default:
> > +        return IOINST_CC_EXPECTED;
> >      }
> >  }

(...)

> > diff --git a/hw/vfio/ccw.c b/hw/vfio/ccw.c
> > index e752c845e9e4..39275a917bd2 100644
> > --- a/hw/vfio/ccw.c
> > +++ b/hw/vfio/ccw.c
> > @@ -199,7 +199,7 @@ again:  
> 
> // This is for CLEAR
> 
> >      case 0:
> >      case -ENODEV:
> >      case -EACCES:
> > -        return 0;
> > +        return ret;
> >      case -EFAULT:
> >      default:
> >          sch_gen_unit_exception(sch);
> > @@ -240,7 +240,7 @@ again:  
> 
> // This is for HALT
> 
> >      case -EBUSY:
> >      case -ENODEV:
> >      case -EACCES:
> > -        return 0;
> > +        return ret;  
> 
> Aside: How could we get EACCES for either HALT or CLEAR? I only see
> that set in the normal request path, if we got a CC3 on the SSCH.

We should only get -ENODEV, which basically indicates cc3 on the kernel
side. For start, the kernel distinguishes between "you didn't specify a
path in the orb that's actually available" and "device not
operational", but we map everything to cc 3 in QEMU (and I don't think
there's anything else we could do with that information anyway.)

> 
> Can we scrub them, or do we need to update kernel
> Documentation/s390/vfio-ccw.rst ? :)

We could remove them, or log something if they come up unexpectedly;
but their presence does not really hurt, either.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]