bug-dejagnu
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#41824: Dejagnu's unknown proc aborts testsuite run when triggered in


From: Pedro Alves
Subject: bug#41824: Dejagnu's unknown proc aborts testsuite run when triggered in test-case
Date: Wed, 24 Jun 2020 10:40:45 +0100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.2.1

On 6/24/20 12:36 AM, Jacob Bachmeyer wrote:
> Pedro Alves wrote:
>> On 6/17/20 4:26 AM, Jacob Bachmeyer wrote:
>>  
>>>> I monitor ERROR and WARNING in my test runs, so I didn't think of this,
>>>> but indeed, those are not tracked in the summary, so that could be a
>>>> problem, agreed.
>>>>         
>>> Blowing up the entire test run is a good way to ensure that problems (like 
>>> typos in testcases) are noticed, but admittedly, that does not help when 
>>> people fail to even run the tests before checking in their typos and then 
>>> other developers' test runs blow up.  Murphy's Law says the offenders then 
>>> carry on to check in more typos in other code without testing their own 
>>> work there either...
>>>     
>>
>> There's conditional execution in the gdb testsuite, where tests may
>> run different code paths depending on the target.  Typos can happen in
>> code paths that the developer isn't able to exercise.  You can
>> imagine someone missing something when doing some large refactoring
>> in the testsuite.
>>   
> 
> This is a good point and is far more reasonable than checking in code without 
> testing it at all.  That sounds like an architectural issue with the GDB 
> testsuite, but could also be a legitimate problem if the per-target handling 
> really is different for some targets, especially if simulators for those 
> targets are unavailable or insufficient for the tests.

It's not an architectural issue -- simply different targets support
a different set of features, support the same features but differently,
run different OSs or even no OS at all, are built with different
compilers, support a different set of languages, etc., etc.

> 
>> I'd think that tracking ERROR and WARNING in the summary would sort
>> out this "test run blew up but no one noticed" issue.
> 
> The problem is that the DejaGnu internals often generate ERROR and WARNING 
> "directly" without using the procedures that update counters.  

I understand that that's what DejaGnu does currently, but couldn't it be 
changed?
Would it not be desirable?

The solution I am using in the current PR41824 patch (which has been rolled 
into the PR41918 patch) generates an extra UNRESOLVED result when a Tcl error 
is caught in runtest, which both appears in the summary and clearly indicates 
that the test run did not produce valid results.
> 
> I am currently considering also adding a '--no-keep-going'/'--no_keep_going' 
> option and making '--keep-going' the default for the rest of the 1.6 series.  
> Aborting on uncaught Tcl errors will be the default in 1.7, but I am trying 
> to decide whether changing it in 1.6 counts more as fixing a bug (DejaGnu 
> quietly sweeping Tcl errors under the proverbial rug) or introducing an 
> environment change (needing '--keep-going' to reach the end of some 
> testsuites that "worked" before).  This is further complicated by the fact 
> that any testsuite needing that option to reach the end means that, 
> technically, that testsuite is invalid -- it *did* throw a Tcl error during 
> testing.
It seems your designing this under the assumption that if a testsuite
run "blows up", it's not a big deal to just fix it and rerun the testsuite.

What that misses is that in some environments, particularly with slow embedded
system boards, running the full testsuite can take _days_.   This is all
compounded by the fact that the testsuite must be run in a large set of
different modes and boards to cover everything for a single target.  Even on
a modern GNU/Linux system, running all combinations of modes for full coverage
can take hours.

Having the testrun stop midway because of some specific testcase
blowing up can waste a lot of time.

The fact that the testsuite can take a while to fully run results in automated
testing systems, like GDB's buildbot, struggling to keep up with testing
all the commits that land in git.  On some buildbot setups, it's just not
possible to test all commits.  So if some testcase blows up the whole
test run, the next time you run the testsuite, it'll likely be that a
large number of unrelated commits went into the sources.  Losing a large
chunk of the testsuite results because of some small typo in one
testcase is thus undesirable, because noticing the typo in the first
place can take a while, and also you may be stuck with a "last good run"
for many of the testcases that gets older and older until
the typo is finally fixed.

Then there's the issue about testsuite order.  DejaGnu runs testcases
in alphabetical filename order, in sequence.  If the testrun blows up in
the first testcase run, say, a.exp, the whole testrun stops before b.exp,
etc. are run.  If the testcase that blows up is the last one, say named
zzz.exp, most of the testsuite runs.  This is not "fair".  If I rename
a.exp to zzzzz.exp, then suddenly that testcase gains more "importance" in
the sense that it can no longer blow up the whole test run.  A
testcase's name should not be significant like that.

On the same vein, consider parallel mode.  This is not native to DejaGnu,
but both GDB and GCC have parallel modes, where their makefiles set up a
make check such that they spawn a number of runtest processes running in
parallel, each exercising a subset the whole testsuite.  Imagine a testsuite
composed of exactly 32 testcases.  And further imagine that you run it
under "make check -j32".  That is, you'll end up with 32 runtest instances,
each running exactly one testcase.  Now, if one of the testcases blows up, all
the other 31 will still run.  Again, this shows that testcase name, and
testrun order should not be important.  There is no real reason that a
sequential run in that scenario ends up with different results (testcase a.exp
blew up, so no result for the other tests) compared to a parallel
run (all tests ran successfully but test a.exp).  Instead, I advocate
that the testresults summary should indicate that a problem happened,
so that "blow up event" isn't lost.  But do let the whole testsuite run.

Thanks,
Pedro Alves






reply via email to

[Prev in Thread] Current Thread [Next in Thread]