[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 17/21] s390x: Fix latent query-cpu-model-FOO error handling b

From: Markus Armbruster
Subject: Re: [PATCH 17/21] s390x: Fix latent query-cpu-model-FOO error handling bugs
Date: Mon, 02 Dec 2019 06:01:46 +0100
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/26.2 (gnu/linux)

David Hildenbrand <address@hidden> writes:

> On 01.12.19 14:46, Aleksandar Markovic wrote:
>> On Saturday, November 30, 2019, David Hildenbrand <address@hidden
>> <mailto:address@hidden>> wrote:
>>     > Am 30.11.2019 um 20:42 schrieb Markus Armbruster
>>     <address@hidden <mailto:address@hidden>>:
>>     >
>>     > cpu_model_from_info() is a helper for
>>     qmp_query_cpu_model_expansion(),
>>     > qmp_query_cpu_model_comparison(), qmp_query_cpu_model_baseline().  It
>>     > crashes when the visitor or the QOM setter fails, and its @errp
>>     > argument is null.  Messed up in commit 137974cea3 's390x/cpumodel:
>>     > implement QMP interface "query-cpu-model-expansion"'.
>>     >
>>     > Its three callers have the same bug.  Messed up in commit 4e82ef0502
>>     > 's390x/cpumodel: implement QMP interface "query-cpu-model-comparison"'
>>     > and commit f1a47d08ef 's390x/cpumodel: implement QMP interface
>>     > "query-cpu-model-baseline"'.
>>     >
>>     > The bugs can't bite as no caller actually passes null.  Fix them
>>     > anyway.
>>     https://en.m.wikipedia.org/wiki/Software_bug
>>     <https://en.m.wikipedia.org/wiki/Software_bug>
>>       „ A software bug is an error, flaw or fault in a computer program
>>     or system that causes it to produce an incorrect or unexpected
>>     result, or to behave in unintended ways. „
>>     Please make it clear in the descriptions that these are cleanups and
>>     not bugfixes. It might be very confusing for people looking out for
>>     real bugs.
>> Disclaimer: I am not entirely familiar with the code in question, so
>> take my opinion with reasonablereservation.
>> It looks that we here deal with latent bugs. As you probably know from
>> experience, a latent bugs, when they are activated with some ostensibly
>> unrelated code change, can be much more difficult to diagnose and fix
>> than regular bugs.
> "https://economictimes.indiatimes.com/definition/latent-bug
> "Definition: An uncovered or unidentified bug which exists in the system
> over a period of time is known as the Latent Bug. The bug may persist in
> the system in one or more versions of the software."
> AFAIK, a latent BUG can be triggered, it simply was never triggered.

First search hit.  Here's my second one:

    Q: What are latent bugs?

    A: These bugs do not cause problems today. However, they are lurking
    just waiting to reveal themselves later.  The Ariane 5 rocket
    failure was caused by a float->int conversion error that lay dormant
    when previous rockets were slower; but the faster Ariane 5 triggered
    the problem.  The Millennium bug is another example of a latent bug
    that came to light when circumstances changed.  Latent bugs are much
    harder to test using conventional testing techniques, and finding
    them requires someone with foresight to ask.


My point is: common usage of the term is not as clear-cut as your quote
makes it seem.

> Do you think the following code is buggy?
> static int get_val(int *ptr)
> {
>       return *ptr;
> }
> int main()
> {
>       int a = 0;
>       return get_val(&a);
> }
> I claim, no, although we could access a NULL pointer if ever reworked.
> There is no invalid system state possible.

get_val() is silent on how it wants to be used.  error.h is anything
but: it spells out how it wantes to be used in quite some detail.  In

 * Receive an error and pass it on to the caller:
 *     Error *err = NULL;
 *     foo(arg, &err);
 *     if (err) {
 *         handle the error...
 *         error_propagate(errp, err);
 *     }
 * where Error **errp is a parameter, by convention the last one.
 * Do *not* "optimize" this to
 *     foo(arg, errp);
 *     if (*errp) { // WRONG!
 *         handle the error...
 *     }
 * because errp may be NULL!

My patch fixes this exact misuse of the interface.

>> In that light, this change is not a clean up. It is a fix of a latent
>> bugs, and Markus' aproach to treat it as a bug fix looks right to me. I
>> would just add a word "latent" or similar, which would even more
>> distance the patch from "cleanup" meaning.
> I agree iff there is some way to trigger it. Otherwise, to me it is a
> cleanup.If it's a BUG, it deserves proper Fixes tags and some
> description how it can be triggered.

Yes, a bug that can bite deserves a reproducer and a formal Fixes: tag.

The thing we're discussing (however we may want to call it) does not
have a reproducer, and I think we're in agreement that it doesn't need a
Fixes: tag.

However, my patch is not cleaning up something that's dirty, it's fixing
something that's unequivocally wrong: a violation of a stated interface

The violation happens to have no ill effects at this time due to the way
the violating code is being used.

I call that a "latent bug".  git-log has quite a few occurences of
"latent bug", by Richard Henderson, Daniel Berrangé, Paolo, ...

Your point that the commit message should not confuse people looking for
real bugs is well taken.  I think "latent bug" is clear enough, and also
concise.  I'm of course open to better phrasings.

   s390x: Fix currently harmless query-cpu-model-FOO error API violations

feels no clearer to me than

   s390x: Fix latent query-cpu-model-FOO error handling bugs

It's also too long.

I tried.  Your turn :)

reply via email to

[Prev in Thread] Current Thread [Next in Thread]