qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 17/21] s390x: Fix latent query-cpu-model-FOO error handling b


From: Markus Armbruster
Subject: Re: [PATCH 17/21] s390x: Fix latent query-cpu-model-FOO error handling bugs
Date: Mon, 02 Dec 2019 06:01:46 +0100
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/26.2 (gnu/linux)

David Hildenbrand <address@hidden> writes:

> On 01.12.19 14:46, Aleksandar Markovic wrote:
>> 
>> 
>> On Saturday, November 30, 2019, David Hildenbrand <address@hidden
>> <mailto:address@hidden>> wrote:
>> 
>> 
>> 
>>     > Am 30.11.2019 um 20:42 schrieb Markus Armbruster
>>     <address@hidden <mailto:address@hidden>>:
>>     >
>>     > cpu_model_from_info() is a helper for
>>     qmp_query_cpu_model_expansion(),
>>     > qmp_query_cpu_model_comparison(), qmp_query_cpu_model_baseline().  It
>>     > crashes when the visitor or the QOM setter fails, and its @errp
>>     > argument is null.  Messed up in commit 137974cea3 's390x/cpumodel:
>>     > implement QMP interface "query-cpu-model-expansion"'.
>>     >
>>     > Its three callers have the same bug.  Messed up in commit 4e82ef0502
>>     > 's390x/cpumodel: implement QMP interface "query-cpu-model-comparison"'
>>     > and commit f1a47d08ef 's390x/cpumodel: implement QMP interface
>>     > "query-cpu-model-baseline"'.
>>     >
>>     > The bugs can't bite as no caller actually passes null.  Fix them
>>     > anyway.
>> 
>>     https://en.m.wikipedia.org/wiki/Software_bug
>>     <https://en.m.wikipedia.org/wiki/Software_bug>
>> 
>>       „ A software bug is an error, flaw or fault in a computer program
>>     or system that causes it to produce an incorrect or unexpected
>>     result, or to behave in unintended ways. „
>> 
>>     Please make it clear in the descriptions that these are cleanups and
>>     not bugfixes. It might be very confusing for people looking out for
>>     real bugs.
>> 
>> 
>> 
>> Disclaimer: I am not entirely familiar with the code in question, so
>> take my opinion with reasonablereservation.
>> 
>> It looks that we here deal with latent bugs. As you probably know from
>> experience, a latent bugs, when they are activated with some ostensibly
>> unrelated code change, can be much more difficult to diagnose and fix
>> than regular bugs.
>
> "https://economictimes.indiatimes.com/definition/latent-bug
>
> "Definition: An uncovered or unidentified bug which exists in the system
> over a period of time is known as the Latent Bug. The bug may persist in
> the system in one or more versions of the software."
>
> AFAIK, a latent BUG can be triggered, it simply was never triggered.

First search hit.  Here's my second one:

    Q: What are latent bugs?

    A: These bugs do not cause problems today. However, they are lurking
    just waiting to reveal themselves later.  The Ariane 5 rocket
    failure was caused by a float->int conversion error that lay dormant
    when previous rockets were slower; but the faster Ariane 5 triggered
    the problem.  The Millennium bug is another example of a latent bug
    that came to light when circumstances changed.  Latent bugs are much
    harder to test using conventional testing techniques, and finding
    them requires someone with foresight to ask.

http://www.geekinterview.com/question_details/36689

My point is: common usage of the term is not as clear-cut as your quote
makes it seem.

> Do you think the following code is buggy?
>
> static int get_val(int *ptr)
> {
>       return *ptr;
> }
>
> int main()
> {
>       int a = 0;
>
>       return get_val(&a);
> }
>
> I claim, no, although we could access a NULL pointer if ever reworked.
> There is no invalid system state possible.

get_val() is silent on how it wants to be used.  error.h is anything
but: it spells out how it wantes to be used in quite some detail.  In
particular:

 * Receive an error and pass it on to the caller:
 *     Error *err = NULL;
 *     foo(arg, &err);
 *     if (err) {
 *         handle the error...
 *         error_propagate(errp, err);
 *     }
 * where Error **errp is a parameter, by convention the last one.
 *
 * Do *not* "optimize" this to
 *     foo(arg, errp);
 *     if (*errp) { // WRONG!
 *         handle the error...
 *     }
 * because errp may be NULL!

My patch fixes this exact misuse of the interface.

>> In that light, this change is not a clean up. It is a fix of a latent
>> bugs, and Markus' aproach to treat it as a bug fix looks right to me. I
>> would just add a word "latent" or similar, which would even more
>> distance the patch from "cleanup" meaning.
>
> I agree iff there is some way to trigger it. Otherwise, to me it is a
> cleanup.If it's a BUG, it deserves proper Fixes tags and some
> description how it can be triggered.

Yes, a bug that can bite deserves a reproducer and a formal Fixes: tag.

The thing we're discussing (however we may want to call it) does not
have a reproducer, and I think we're in agreement that it doesn't need a
Fixes: tag.

However, my patch is not cleaning up something that's dirty, it's fixing
something that's unequivocally wrong: a violation of a stated interface
contract.

The violation happens to have no ill effects at this time due to the way
the violating code is being used.

I call that a "latent bug".  git-log has quite a few occurences of
"latent bug", by Richard Henderson, Daniel Berrangé, Paolo, ...

Your point that the commit message should not confuse people looking for
real bugs is well taken.  I think "latent bug" is clear enough, and also
concise.  I'm of course open to better phrasings.

   s390x: Fix currently harmless query-cpu-model-FOO error API violations

feels no clearer to me than

   s390x: Fix latent query-cpu-model-FOO error handling bugs

It's also too long.

I tried.  Your turn :)




reply via email to

[Prev in Thread] Current Thread [Next in Thread]