[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH v2 1/5] msix_init: assert programming error
From: |
Markus Armbruster |
Subject: |
Re: [Qemu-devel] [PATCH v2 1/5] msix_init: assert programming error |
Date: |
Thu, 29 Sep 2016 15:11:27 +0200 |
User-agent: |
Gnus/5.13 (Gnus v5.13) Emacs/24.5 (gnu/linux) |
Alex Williamson <address@hidden> writes:
> On Tue, 13 Sep 2016 08:16:20 +0200
> Markus Armbruster <address@hidden> wrote:
>
>> Cc: Alex for device assignment expertise.
>>
>> Cao jin <address@hidden> writes:
>>
>> > On 09/12/2016 09:29 PM, Markus Armbruster wrote:
>> >> Cao jin <address@hidden> writes:
>> >>
>> >>> The input parameters is used for creating the msix capable device, so
>> >>> they must obey the PCI spec, or else, it should be programming error.
>> >>
>> >> True when the the parameters come from a device model attempting to
>> >> define a PCI device violating the spec. But what if the parameters come
>> >> from an actual PCI device violating the spec, via device assignment?
>> >
>> > Before the patch, on invalid param, the vfio behaviour is:
>> > error_report("vfio: msix_init failed");
>> > then, device create fail.
>> >
>> > After the patch, its behaviour is:
>> > asserted.
>> >
>> > Do you mean we should still report some useful info to user on invalid
>> > params?
>>
>> In the normal case, asking msix_init() to create MSI-X that are out of
>> spec is a programming error: the code that does it is broken and needs
>> fixing.
>>
>> Device assignment might be the exception: there, the parameters for
>> msix_init() come from the assigned device, not the program. If they
>> violate the spec, the device is broken. This wouldn't be a programming
>> error. Alex, can this happen?
>>
>> If yes, we may want to handle it by failing device assignment.
>
>
> Generally, I think the entire premise of these sorts of patches is
> flawed. We take a working error path that allows a driver to robustly
> abort on unexpected date and turn it into a time bomb. Often the
> excuse for this is that "error handling is hard". Tough. Now a
> hot-add of a device that triggers this changes from a simple failure to
> a denial of service event. Furthermore, we base that time bomb on our
> interpretation of the spec, which we can only validate against in-tree
> devices.
>
> We have actually had assigned devices that fail the sanity test here,
> there's a quirk in vfio_msix_early_setup() for a Chelsio device with
> this bug. Do we really want user experiencing aborts when a simple
> device initialization failure is sufficient?
>
> Generally abort code paths like this cause me to do my own sanity
> testing, which is really poor practice since we should have that sanity
> testing in the common code. Thanks,
I prefer to assert on programming error, because 1. it does double duty
as documentation, 2. error handling of impossible conditions is commonly
wrong, and 3. assertion failures have a much better chance to get the
program fixed. Even when presence of a working error path kills 2., the
other two make me stick to assertions.
However, input out-of-spec is not a programming error. For most users
of msix_init(), the arguments are hard-coded, thus invalid arguments are
a programming error. For device assignment, they come from a physical
device, thus invalid arguments can either be a programming error (our
idea of "invalid" is invalid) or bad input (the physical device is
out-of-spec). Since we can't know, we better handle it rather than
assert.
Bottom line: you convinced me msix_init() should stay as it is. But now
msi_init() looks like it needs a change: it asserts on invalid
nr_vectors parameter. Does that need fixing, Alex?