[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH 17/21] s390x: Fix latent query-cpu-model-FOO error handling b
From: |
Markus Armbruster |
Subject: |
Re: [PATCH 17/21] s390x: Fix latent query-cpu-model-FOO error handling bugs |
Date: |
Mon, 02 Dec 2019 06:01:46 +0100 |
User-agent: |
Gnus/5.13 (Gnus v5.13) Emacs/26.2 (gnu/linux) |
David Hildenbrand <address@hidden> writes:
> On 01.12.19 14:46, Aleksandar Markovic wrote:
>>
>>
>> On Saturday, November 30, 2019, David Hildenbrand <address@hidden
>> <mailto:address@hidden>> wrote:
>>
>>
>>
>> > Am 30.11.2019 um 20:42 schrieb Markus Armbruster
>> <address@hidden <mailto:address@hidden>>:
>> >
>> > cpu_model_from_info() is a helper for
>> qmp_query_cpu_model_expansion(),
>> > qmp_query_cpu_model_comparison(), qmp_query_cpu_model_baseline(). It
>> > crashes when the visitor or the QOM setter fails, and its @errp
>> > argument is null. Messed up in commit 137974cea3 's390x/cpumodel:
>> > implement QMP interface "query-cpu-model-expansion"'.
>> >
>> > Its three callers have the same bug. Messed up in commit 4e82ef0502
>> > 's390x/cpumodel: implement QMP interface "query-cpu-model-comparison"'
>> > and commit f1a47d08ef 's390x/cpumodel: implement QMP interface
>> > "query-cpu-model-baseline"'.
>> >
>> > The bugs can't bite as no caller actually passes null. Fix them
>> > anyway.
>>
>> https://en.m.wikipedia.org/wiki/Software_bug
>> <https://en.m.wikipedia.org/wiki/Software_bug>
>>
>> „ A software bug is an error, flaw or fault in a computer program
>> or system that causes it to produce an incorrect or unexpected
>> result, or to behave in unintended ways. „
>>
>> Please make it clear in the descriptions that these are cleanups and
>> not bugfixes. It might be very confusing for people looking out for
>> real bugs.
>>
>>
>>
>> Disclaimer: I am not entirely familiar with the code in question, so
>> take my opinion with reasonablereservation.
>>
>> It looks that we here deal with latent bugs. As you probably know from
>> experience, a latent bugs, when they are activated with some ostensibly
>> unrelated code change, can be much more difficult to diagnose and fix
>> than regular bugs.
>
> "https://economictimes.indiatimes.com/definition/latent-bug
>
> "Definition: An uncovered or unidentified bug which exists in the system
> over a period of time is known as the Latent Bug. The bug may persist in
> the system in one or more versions of the software."
>
> AFAIK, a latent BUG can be triggered, it simply was never triggered.
First search hit. Here's my second one:
Q: What are latent bugs?
A: These bugs do not cause problems today. However, they are lurking
just waiting to reveal themselves later. The Ariane 5 rocket
failure was caused by a float->int conversion error that lay dormant
when previous rockets were slower; but the faster Ariane 5 triggered
the problem. The Millennium bug is another example of a latent bug
that came to light when circumstances changed. Latent bugs are much
harder to test using conventional testing techniques, and finding
them requires someone with foresight to ask.
http://www.geekinterview.com/question_details/36689
My point is: common usage of the term is not as clear-cut as your quote
makes it seem.
> Do you think the following code is buggy?
>
> static int get_val(int *ptr)
> {
> return *ptr;
> }
>
> int main()
> {
> int a = 0;
>
> return get_val(&a);
> }
>
> I claim, no, although we could access a NULL pointer if ever reworked.
> There is no invalid system state possible.
get_val() is silent on how it wants to be used. error.h is anything
but: it spells out how it wantes to be used in quite some detail. In
particular:
* Receive an error and pass it on to the caller:
* Error *err = NULL;
* foo(arg, &err);
* if (err) {
* handle the error...
* error_propagate(errp, err);
* }
* where Error **errp is a parameter, by convention the last one.
*
* Do *not* "optimize" this to
* foo(arg, errp);
* if (*errp) { // WRONG!
* handle the error...
* }
* because errp may be NULL!
My patch fixes this exact misuse of the interface.
>> In that light, this change is not a clean up. It is a fix of a latent
>> bugs, and Markus' aproach to treat it as a bug fix looks right to me. I
>> would just add a word "latent" or similar, which would even more
>> distance the patch from "cleanup" meaning.
>
> I agree iff there is some way to trigger it. Otherwise, to me it is a
> cleanup.If it's a BUG, it deserves proper Fixes tags and some
> description how it can be triggered.
Yes, a bug that can bite deserves a reproducer and a formal Fixes: tag.
The thing we're discussing (however we may want to call it) does not
have a reproducer, and I think we're in agreement that it doesn't need a
Fixes: tag.
However, my patch is not cleaning up something that's dirty, it's fixing
something that's unequivocally wrong: a violation of a stated interface
contract.
The violation happens to have no ill effects at this time due to the way
the violating code is being used.
I call that a "latent bug". git-log has quite a few occurences of
"latent bug", by Richard Henderson, Daniel Berrangé, Paolo, ...
Your point that the commit message should not confuse people looking for
real bugs is well taken. I think "latent bug" is clear enough, and also
concise. I'm of course open to better phrasings.
s390x: Fix currently harmless query-cpu-model-FOO error API violations
feels no clearer to me than
s390x: Fix latent query-cpu-model-FOO error handling bugs
It's also too long.
I tried. Your turn :)