qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [RFC 00/15] Error API: Flag errors in *errp even if err


From: Markus Armbruster
Subject: Re: [Qemu-devel] [RFC 00/15] Error API: Flag errors in *errp even if errors are being ignored
Date: Fri, 30 Jun 2017 13:40:58 +0200
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/25.2 (gnu/linux)

Eduardo Habkost <address@hidden> writes:

> On Thu, Jun 29, 2017 at 08:54:29AM +0200, Markus Armbruster wrote:
>> Eduardo Habkost <address@hidden> writes:
>> 
>> > On Wed, Jun 28, 2017 at 11:05:26AM +0200, Markus Armbruster wrote:
>> >> Eduardo Habkost <address@hidden> writes:
> [...]
>> >> > I understand the reason we need to support errp==NULL, as it
>> >> > makes life simpler for callers that don't want any extra error
>> >> > information.  However, this has the cost of making the functions
>> >> > that report errors more complex and error-prone.
>> >> >
>> >> > (Evidence of that: the 34 ERR_IS_* cases handled by the "use
>> >> > ERR_IS_* macros" patches in the series.  Where existing code will
>> >> > crash or behave differently if errp is NULL.)
>> >> 
>> >> Which of them could *not* use a suitable return value instead of *errp?
>> >
>> > I don't know.  But I'm not trying to improve those 34 ERR_IS_* cases.  I
>> > am trying to improve the 700+ functions that need the
>> > local_err/error_propagate() boilerplate code today.  This series already
>> > handles 346 of them automatically (see patch 14/15).
>> 
>> I agree the goal is reducing error_propagate() boilerplate.  I latched
>> onto the 34 ERR_IS_* cases only because you presented them as examples.
>
> The 34 ERR_IS_* cases were evidence of how easy it is to introduce
> mistakes with the current API.  Probably most of them are instances of
> (1) and (2) below.

The current interface can be abused, but how much abuse actually creeps
in?  I think we've been doing reasonably well there since we got rid of
the bad examples and improved documentation.

Moreover, the revised interface could also be abused.  Nothing stops you
from dereferencing errp before or after, the only thing that changes are
the examples people see in code.  I'm afraid the people who reinvent bad
examples from scratch despite the documentation telling them not to will
also bypass any macros the documentation tells them to use.

*Especially* if we use macros only sometimes.  ERR_IS_SET(&err) makes no
sense, so we'd still test err directly there, wouldn't we?

> [...]
>> >> > The Proposal
>> >> > ------------
>> >> >
>> >> > I'm proposing replacing NULL errp with a special macro:
>> >> > IGNORE_ERRORS.  The macro will trigger special behavior in the
>> >> > error API that will make it not save any error information in the
>> >> > error pointer, but still keep track of boolean error state in
>> >> > *errp.
>> >> >
>> >> > This will allow us to simplify the documented method to propagate errors
>> >> > from:
>> >> >
>> >> >     Error *err = NULL;
>> >> >     foo(arg, &err);
>> >> >     if (err) {
>> >> >         handle the error...
>> >> >         error_propagate(errp, err);
>> >> >     }
>> >> >
>> >> > to:
>> >> >
>> >> >     foo(arg, errp);
>> >> >     if (ERR_IS_SET(errp)) {
>> >> >         handle the error...
>> >> >     }
>> >> >
>> >> > This will allow us to stop using local_err variables and
>> >> > error_propagate() on hundreds of cases.
>> >> 
>> >> Getting rid of unnecessary local_err boilerplate is good.  The question
>> >> is how.  A possible alternative to your proposal is to follow GLib and
>> >> make functions return success/failure.
>> >> 
>> >> How do the two compare?
>> >
>> > This proposal proposal already gets rid of 346 error_propagate() calls
>> > automatically, and we probably can get rid of many others with
>> > additional Coccinelle scripts.
>> >
>> > Making functions return success/failure, on the other hand, would
>> > require rewriting them manually.
>> 
>> Yes, but how would the *result* compare?  I feel we should at least
>> explore this.  I'll try to find some time to play with it.
>
> I think the results of using the return value to indicate errors are
> possibly better.  But even on that case, I think ERR_IS_SET will be
> useful to avoid error_propagate() boilerplate until we convert all of
> them.

See my assessment below.

>> Coccinelle might let us automate some, but determining success
>> vs. failure will commonly require human smarts.
>> 
>> How many such functions we have?  Hmm...
>> 
>>     @r@
>>     identifier fun, errp;
>>     typedef Error;
>>     position p;
>>     @@
>>      void fun(..., Error **errp)@p
>>      {
>>          ...
>>      }
>>     @script:python@
>>         p << r.p;
>>         fun << r.fun;
>>     @@
>>     print("%s:%s:%s: %s()" % (p[0].file, p[0].line, p[0].column, fun))
>> 
>> Finds 1525.  Correcting the void mistake will be a huge pain, but
>> procrastinating can only make it grow.
>> 
>> >                                   This proposal doesn't even prevent
>> > that from happening.  I'd say it helps on that, because we can now find
>> > cases that still need to be converted by grepping for ERR_IS_SET.
>> 
>> I honestly doubt finding the cases is a problem.  We just grep for
>> error_propagate().
>
> True.  But I think finding low-hanging fruits will still be a problem. e.g.:
>
>   void f1(Error *errp);
>
>   void f2(Error *errp)
>   {
>       do_something();
>       f1(errp);
>   }
>
> The Coccinelle script above will find f1() and f2() (grep won't find
> either, but will probably find f2() callers).  We will probably want to
> convert f1() before f2(), as converting f2() before f1() would require
> adding error_propagete() boilerplate to f2().
>
>
>> 
> [...]
>> >> > Desirable side-effects
>> >> > ----------------------
>> >> >
>> >> > Some of additional benefits from parts of this series:
>> >> >
>> >> > * Making code that ignore error information more visible and
>> >> >   greppable (using IGNORE_ERRORS).
>> >> 
>> >> True.
>> >> 
>> >> Drawback: Passing an address takes more code than passing null.  Not
>> >> sure it matters.
>> >
>> > I don't know what you mean by "more code".  It requires just replacing
>> > NULL with IGNORE_ERRORS.  The magic is hidden behind the macro.
>> 
>> Creating the IGNORE_ERRORS value and passing it requires more machine
>> instructions than passing NULL.  When ignored_error_unset is in another
>> DSO, it also requires a relocation.
>
> Yes, it requires a few more machine instructions.  Is this a problem in
> practice?

Quoting myself: "Not sure it matters."

>> >> >                                     I believe many of those cases
>> >> >   are actually bugs and should use &error_abort or &error_fatal
>> >> >   instead.
>> >> 
>> >> I've seen such bugs.
>> >> 
>> >> Of course, making possible offenders more greppable doesn't necessarily
>> >> mean existing ones will get fixed and new ones will be avoided.
>> >> 
>> >> > * Making code that depends on errp more visible and
>> >> >   greppable (using ERR_IS_* macros).  Some of those cases are
>> >> >   also likely to be bugs, and need to be investigated.
>> >> 
>> >> Grepping for (local_)?errp? works well enough, doesn't it?
>> >
>> > I don't see how.
>> 
>> Full list of possible Error ** variables not named errp:
>> 
>>     $ git-grep -E '\bError \*\*' | grep -vE '\bError \*\*(errp\b|\))'
>>     HACKING:the eyes than propagating an Error object through an Error ** 
>> parameter.
>>     HACKING:only the function really knows, use Error **, and set suitable 
>> errors.
>>     block/quorum.c:    /* XXX - would be nice if we could pass in the Error 
>> **
>>     block/snapshot.c:                             Error **err)
>>     blockjob.c:/* A wrapper around block_job_cancel() taking an Error ** 
>> parameter so it may be
>>     docs/devel/writing-qmp-commands.txt:3. It takes an "Error **" argument. 
>> This is required. Later we will see how to
>>     hw/core/qdev.c:static bool check_only_migratable(Object *obj, Error 
>> **err)
>>     hw/core/qdev.c:        Error **local_errp = NULL;
>>     hw/core/qdev.c:static bool device_get_hotplugged(Object *obj, Error 
>> **err)
>>     hw/i386/amd_iommu.c:static void amdvi_realize(DeviceState *dev, Error 
>> **err)
>>     hw/sd/sdhci.c:static void sdhci_sysbus_realize(DeviceState *dev, Error 
>> ** errp)
>>     hw/usb/dev-network.c:static void usb_net_realize(USBDevice *dev, Error 
>> **errrp)
>>     include/block/snapshot.h:                             Error **err);
>>     include/qapi/error.h:void error_propagate(Error **dst_errp, Error 
>> *local_err);
>>     include/qom/object.h:                                    const uint64_t 
>> *v, Error **Errp);
>>     include/qom/object.h:                                          const 
>> uint64_t *v, Error **Errp);
>>     qga/commands-posix.c:GuestUserList *qmp_guest_get_users(Error **err)
>>     qga/commands-win32.c:GuestUserList *qmp_guest_get_users(Error **err)
>>     qga/commands.c:GuestExecStatus *qmp_guest_exec_status(int64_t pid, Error 
>> **err)
>>     qga/commands.c:                       Error **err)
>>     qga/commands.c:GuestHostName *qmp_guest_get_host_name(Error **err)
>>     qmp.c:void qmp_system_powerdown(Error **erp)
>>     util/error.c:void error_propagate(Error **dst_errp, Error *local_err)
>> 
>> I'll post a patch to rename the offenders.
>> 
>> >                   I'm talking about two cases:
>> >
>> > 1) Code like this:
>> >
>> >   int func1(..., Error **errp)
>> >   {
>> >       do_something(errp);
>> >       if (errp && *errp) {
>> >           /* handle error */
>> >           return;
>> >       }
>> >       /* do something else */
>> >   }
>> >
>> > func1() is buggy because it behaves differently if errp is NULL.
>> 
>> Does your series fix any such bugs?  We hunted them all down long ago...
>> Perhaps a few new ones have crept in since.  Hmm... I can see two:
>> 
>>     $ git-grep -F 'errp && *errp'
>>     exec.c:        if (errp && *errp) {
>>     hw/mem/pc-dimm.c:        if (errp && *errp) {
>>     util/error.c:    assert(errp && *errp);
>
> This RFC doesn't fix those bugs, but just makes them more obvious and
> easier to fix.  I plan to fix them in the final version of the series.

Well, these two are plenty obvious already :)

>> > 2) Code like this:
>> >
>> >   int func2(..., Error **errp)
>> >   {
>> >       do_something(errp);
>> >       if (*errp) {
>> >           /* handle error */
>> >       }
>> >   }
>> >
>> > func2() is buggy because if crashes if errp is NULL.
>> 
>> Does your series fix any such bugs?  grep coughs up quite a few
>> candidates...
>> 
>> > I don't see an easy way to grep for them with the current code.  With
>> > this series, (1) can be detected by grepping for ERR_IS_IGNORED,
>> 
>> How would example (1) look like then?
>
>
>   diff --git a/hw/mem/pc-dimm.c b/hw/mem/pc-dimm.c
>   index 92fb482..4e5e2c9 100644
>   --- a/hw/mem/pc-dimm.c
>   +++ b/hw/mem/pc-dimm.c
>   @@ -316,7 +316,7 @@ uint64_t pc_dimm_get_free_addr(uint64_t 
> address_space_start,
>            uint64_t dimm_size = object_property_get_int(OBJECT(dimm),
>                                                         PC_DIMM_SIZE_PROP,
>                                                         errp);
>   -        if (errp && *errp) {
>   +        if (!ERR_IS_IGNORED(errp) && ERR_IS_SET(errp)) {
>                goto out;
>            }
>
> Without this series, the fix for (1) requires adding error_propagate()
> boilerplate.  With this series, the fix is to just drop the
> !ERR_IS_IGNORED check and keep the ERR_IS_SET check

I think creating the fix is roughly the same work as before.  The
resulting code is a bit more compact: we avoid error_propagate()
boilerplate, but pay for that with an ERR_IS_SET() macro.

>> > is fixed because ERR_IS_SET(errp) will work even if errp is NULL.
>> >
>> >> > TODO
>> >> > ----
>> >> >
>> >> > * Simplify more cases of local_error/error_propagate() to use
>> >> >   errp directly.
>> >> > * Update API documentation and code examples.
>> >> > * Add a mechanism to ensure errp is never NULL.
>> >> >
>> >> > Git branch
>> >> > ----------
>> >> >
>> >> > This series depend on a few extra cleanups that I didn't submit
>> >> > to qemu-devel yet.  A git branch including this series is
>> >> > available at:
>> >> >
>> >> >   git://github.com/ehabkost/qemu-hacks.git 
>> >> > work/err-api-rework-ignore-ptr-v1

I doubt the macros make the bug fixing materially easier, and I doubt
they can reduce future bugs of this kind.  What they can do is letting
us get rid of error_propagate() boilerplate with relative ease.

If we switch to returning success/failure (which also gets rid of the
boilerplate), then the macros may still let us get rid of boilerplate
more quickly, for some additional churn.  Worthwhile?  Depends on how
long the return value change takes us.

I think the first order of business is to figure out whether we want to
pursue returning success/failure.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]