qemu-s390x
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v2] s390x/css: revert SCSW ctrl/flag bits on error


From: Thomas Huth
Subject: Re: [PATCH v2] s390x/css: revert SCSW ctrl/flag bits on error
Date: Sun, 6 Nov 2022 11:13:29 +0100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.13.0

On 28/10/2022 22.22, Eric Farman wrote:
On Thu, 2022-10-27 at 23:23 +0200, Peter Jin wrote:
Revert the control and flag bits in the subchannel status word in
case
the SSCH operation fails with non-zero CC (ditto for CSCH and HSCH).
According to POPS, the control and flag bits are only changed if
SSCH,
CSCH, and HSCH return CC 0, and no other action should be taken
otherwise.
In order to simulate that after the fact, the bits need to be
reverted on
non-zero CC.


I'm okay to this point...

This change is necessary due to the fact that the pwrite() in vfio-
ccw
which triggers the SSCH can fail at any time. Previously, there was
only virtio-ccw, whose do_subchannel_work function was only able to
return CC0. However, once vfio-ccw went into the mix, it has become
necessary to handle errors in code paths that were previously assumed
to always return success.

In our case, we found that in case of pwrite() failure (which was
discovered by strace injection), the subchannel could be stuck in
start
pending state, which could be problematic if the pwrite() call
returns
CC2. Experimentation shows that the guest tries to retry the SSCH
call as
normal for CC2, but it actually continously fails due to the fact
that
the subchannel is stuck in start pending state even though no start
function is actually taking place.

...but the two paragraphs above are a bit cumbersome to digest. Maybe
it's just too late in the week for me. What about something like this?

"""
While the do_subchannel_work logic for virtual (virtio) devices will
return condition code 0, passthrough (vfio) devices may encounter
errors from either the host kernel or real hardware that need to be
accounted for after this point. This includes restoring the state of
the Subchannel Status Word to reflect the subchannel, as these bits
would not be set in the event of a non-zero condition code from the
affected instructions.

Experimentation has shown that a failure on a START SUBCHANNEL (SSCH)
to a passthrough device would leave the subchannel with the START
PENDING activity control bit set, thus blocking subsequent SSCH
operations in css_do_ssch() until some form of error recovery was
undertaken since no interrupt would be expected.
"""


Signed-off-by: Peter Jin <pjin@linux.ibm.com>

We've talked previously about clearing this within the
do_subchannel_work_passthrough routine in order to keep the _virtual
paths untouched, but this seems like a reasonable approach to me.

The commit message is probably fine either way, but as far as the code
goes:

Reviewed-by: Eric Farman <farman@linux.ibm.com>

Thanks, I've queued the patch now to my s390x-next branch with the updated commit message. Please double-check whether that looks OK now:

 https://gitlab.com/thuth/qemu/-/commits/s390x-next/

 Thomas




reply via email to

[Prev in Thread] Current Thread [Next in Thread]