[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH 2/9] s390x: fix invalid use of cc 1 for SSCH
From: |
Halil Pasic |
Subject: |
Re: [Qemu-devel] [PATCH 2/9] s390x: fix invalid use of cc 1 for SSCH |
Date: |
Thu, 31 Aug 2017 12:41:05 +0200 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.2.0 |
On 08/31/2017 11:19 AM, Cornelia Huck wrote:
> On Wed, 30 Aug 2017 18:36:02 +0200
> Halil Pasic <address@hidden> wrote:
>
>> According to the POP a start subchannel instruction (SSCH) returning with
>> cc 1 implies that the subchannel was status pending when SSCH executed.
>>
>> Due to a somewhat confusing error handling, where error codes are mapped
>> to cc value, sane looking error codes result in non AR compliant
>> behavior.
>>
>> Let's fix this! Instead of cc 1 we use cc 3 which means device not
>> operational, and is much closer to the truth in the given cases.
>>
>> Signed-off-by: Halil Pasic <address@hidden>
>> Acked-by: Pierre Morel<address@hidden>
>> ---
>>
>> This patch turned out quite controversial. We did not reach a consensus
>> during the internal review.
>>
>> The most of the discussion revolved around the ORB flag which
>> architecturally must be supported, but are currently not supported by
>> vfio-ccw (not yet, or can't be). The idea showing the most promise for
>> consensus was to handle this via device status (along the lines better a
>> strange acting device than a non-conform machine) but since it's a
>> radical change we decided to first discuss upstream and then do whatever
>> needs to be done.
>> ---
>> hw/s390x/css.c | 15 ++++++---------
>> hw/s390x/s390-ccw.c | 2 +-
>> 2 files changed, 7 insertions(+), 10 deletions(-)
>>
>> diff --git a/hw/s390x/css.c b/hw/s390x/css.c
>> index a50fb0727e..0822538cde 100644
>> --- a/hw/s390x/css.c
>> +++ b/hw/s390x/css.c
>> @@ -1034,7 +1034,7 @@ static int sch_handle_start_func_passthrough(SubchDev
>> *sch)
>> */
>> if (!(orb->ctrl0 & ORB_CTRL0_MASK_PFCH) ||
>> !(orb->ctrl0 & ORB_CTRL0_MASK_C64)) {
>> - return -EINVAL;
>> + return -ENODEV;
>
> This feels wrong. If we don't support this yet, doing something like a
> channel-program check or an operand exception feels closer to the
> architecture than indicating a gone device.
I disagree, a channel-program check or an operand exception, or cc 1
(current solution) makes the machine obviously non-conform.
My train of thought was that architecturally you can loose connection
to the device at any time (you can't prohibit admins pulling cables or
smashing equipment with a 10kg hammer).
Also from the guest OS perspective I think saying device not operational
could provoke a proper reaction form the guest OS: that is just give up
on the device. The things you propose would in my opinion put the blame
on the guest OS driver (making non-conform requests) so in that case
it would make sense to give up on the driver (but the same driver could
wonderfully work with let's say a fully emulated device).
As I have stated in the cover letter of this patch, I would find
setting device status even better, but I wanted to discuss first
before going from setting cc to something else.
Setting cc was not my idea in the first place (AFAIK the -EINVAL
here effectively triggers cc 1).
>
>> }
>>
>> ret = s390_ccw_cmd_request(orb, s, sch->driver_data);
>> @@ -1046,16 +1046,13 @@ static int
>> sch_handle_start_func_passthrough(SubchDev *sch)
>> break;
>> case -ENODEV:
>> break;
>> + case -EFAULT:
>> + break;
>> case -EACCES:
>> /* Let's reflect an inaccessible host device by cc 3. */
>> - ret = -ENODEV;
>> - break;
>> default:
>> - /*
>> - * All other return codes will trigger a program check,
>> - * or set cc to 1.
>> - */
>> - break;
>> + /* Let's make all other return codes map to cc 3. */
>> + ret = -ENODEV;
>
> Why? This feels wrong. For those cases where we want to signal an error
> but cc 1 is conceptually wrong, either an operand exception (for very
> few cases) or a channel-program check feels more in line with the
> architecture.
You mean the original code feels wrong, or? I keep the program check for
-EFAULT (that's why it's added) and just change cc 1 to cc 3 for the
not explicitly handled error codes (reason stated in the commit message).
>
> That's a general problem with doing stuff in the hypervisor: We have
> sets of internal problems that obviously don't show up in the
> architecture, and we can either handle them internally or use what the
> architecture offers for problem signaling. z/VM has probably faced the
> same problems :)
I agree.
>
>> };
>>
>> return ret;
>> @@ -1115,7 +1112,7 @@ static int do_subchannel_work(SubchDev *sch)
>> if (sch->do_subchannel_work) {
>> return sch->do_subchannel_work(sch);
>> } else {
>> - return -EINVAL;
>> + return -ENODEV;
>
> This rather seems like a job for an assert? If we don't have a function
> for the 'asynchronous' handling of the various functions assigned for a
> subchannel, that looks like an internal error.
>
IMHO it depends. Aborting qemu is heavy handed, and as an user I would not
be happy about it. But certainly it is an assert situation. We can look for
an even better solution, but I think this is an improvement. The logic behind
is that the device is broken and can't be talked to properly.
>> }
>> }
>>
>> diff --git a/hw/s390x/s390-ccw.c b/hw/s390x/s390-ccw.c
>> index 8614dda6f8..2b0741741c 100644
>> --- a/hw/s390x/s390-ccw.c
>> +++ b/hw/s390x/s390-ccw.c
>> @@ -25,7 +25,7 @@ int s390_ccw_cmd_request(ORB *orb, SCSW *scsw, void *data)
>> if (cdc->handle_request) {
>> return cdc->handle_request(orb, scsw, data);
>> } else {
>> - return -ENOSYS;
>> + return -ENODEV;
>
> If we get here, it means that we called a request handler (which is
> only done for the passthrough variety) without having assigned a
> request handler beforehand. This also looks like an internal error to
> me...
>
Certainly. Again I was not the one who wrote or accepted the original
code. My previous comment about whether assert or not applies here as
well.
>> }
>> }
>>
>
>
[Qemu-devel] [PATCH 2/9] s390x: fix invalid use of cc 1 for SSCH, Halil Pasic, 2017/08/30
[Qemu-devel] [PATCH 3/9] s390x/css: be more consistent if broken beyond repair, Halil Pasic, 2017/08/30
[Qemu-devel] [PATCH 6/9] s390x: refactor error handling for CSCH handler, Halil Pasic, 2017/08/30
[Qemu-devel] [PATCH 4/9] s390x: refactor error handling for SSCH and RSCH, Halil Pasic, 2017/08/30
[Qemu-devel] [PATCH 5/9] s390x: refactor error handling for XSCH handler, Halil Pasic, 2017/08/30
[Qemu-devel] [PATCH 7/9] s390x: refactor error handling for HSCH handler, Halil Pasic, 2017/08/30
[Qemu-devel] [PATCH 8/9] s390x: refactor error handling for MSCH handler, Halil Pasic, 2017/08/30