bug-dejagnu
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#41824: Dejagnu's unknown proc aborts testsuite run when triggered in


From: Jacob Bachmeyer
Subject: bug#41824: Dejagnu's unknown proc aborts testsuite run when triggered in test-case
Date: Wed, 24 Jun 2020 21:43:56 -0500
User-agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.8.1.22) Gecko/20090807 MultiZilla/1.8.3.4e SeaMonkey/1.1.17 Mnenhy/0.7.6.0

Pedro Alves wrote:
On 6/24/20 12:36 AM, Jacob Bachmeyer wrote:
Pedro Alves wrote:
On 6/17/20 4:26 AM, Jacob Bachmeyer wrote:
I'd think that tracking ERROR and WARNING in the summary would sort
out this "test run blew up but no one noticed" issue.
The problem is that the DejaGnu internals often generate ERROR and WARNING "directly" without using the procedures that update counters.

I understand that that's what DejaGnu does currently, but couldn't it be 
changed?
Would it not be desirable?

It could be changed, but those are much larger changes to the code and overall behavior than simply adding an UNRESOLVED result to report the failure, and if I understand correctly, the presence of UNRESOLVED results marks the test results as invalid or partially-valid at best, which is correct if a test script failed with a Tcl error.

Also, the current warning and error counters are not global, but are reset after each test result. Exceeding thresholds for warnings or errors causes the next test result to be changed to UNRESOLVED before the counters are reset.

I am currently considering also adding a '--no-keep-going'/'--no_keep_going' option and 
making '--keep-going' the default for the rest of the 1.6 series.  Aborting on uncaught 
Tcl errors will be the default in 1.7, but I am trying to decide whether changing it in 
1.6 counts more as fixing a bug (DejaGnu quietly sweeping Tcl errors under the proverbial 
rug) or introducing an environment change (needing '--keep-going' to reach the end of 
some testsuites that "worked" before).  This is further complicated by the fact 
that any testsuite needing that option to reach the end means that, technically, that 
testsuite is invalid -- it *did* throw a Tcl error during testing.
It seems your designing this under the assumption that if a testsuite
run "blows up", it's not a big deal to just fix it and rerun the testsuite.

For most packages, this is correct. For GDB, perhaps it is a bigger problem.

On the other hand, "blowing up" at least indicates clearly that something is wrong and prevents "whistling past the graveyard" where tests are not actually running, but everyone thinks the situation is fine.

What that misses is that in some environments, particularly with slow embedded
system boards, running the full testsuite can take _days_.   This is all
compounded by the fact that the testsuite must be run in a large set of
different modes and boards to cover everything for a single target.  Even on
a modern GNU/Linux system, running all combinations of modes for full coverage
can take hours.

Having the testrun stop midway because of some specific testcase
blowing up can waste a lot of time.

The fact that the testsuite can take a while to fully run results in automated
testing systems, like GDB's buildbot, struggling to keep up with testing
all the commits that land in git.  On some buildbot setups, it's just not
possible to test all commits.  So if some testcase blows up the whole
test run, the next time you run the testsuite, it'll likely be that a
large number of unrelated commits went into the sources.  Losing a large
chunk of the testsuite results because of some small typo in one
testcase is thus undesirable, because noticing the typo in the first
place can take a while, and also you may be stuck with a "last good run"
for many of the testcases that gets older and older until
the typo is finally fixed.

I presume buildbot can be easily configured to pass a '--keep_going' option, perhaps as 'RUNTESTFLAGS=--keep_going' if needed?

Still, needing a new option to get old behavior is changing the environment interface, so I will make --keep_going the default, at least for the rest of 1.6. That default is likely to change for 1.7. For automated systems like buildbot, passing '--keep_going' should be perfectly reasonable, which is why the option exists. (While the documented options will be changed to use hyphens to align with the rest of the GNU system in 1.7, the current option names will continue to be accepted, since the aliases are obvious.)

This has been pushed on the temporary PR41918 branch because the fix for PR41824 has been rolled into the fix for PR41918, as the bugs are closely related.

Then there's the issue about testsuite order.  DejaGnu runs testcases
in alphabetical filename order, in sequence.  If the testrun blows up in
the first testcase run, say, a.exp, the whole testrun stops before b.exp,
etc. are run.  If the testcase that blows up is the last one, say named
zzz.exp, most of the testsuite runs.  This is not "fair".  If I rename
a.exp to zzzzz.exp, then suddenly that testcase gains more "importance" in
the sense that it can no longer blow up the whole test run.  A
testcase's name should not be significant like that.

On the same vein, consider parallel mode.  This is not native to DejaGnu,
but both GDB and GCC have parallel modes, where their makefiles set up a
make check such that they spawn a number of runtest processes running in
parallel, each exercising a subset the whole testsuite.  Imagine a testsuite
composed of exactly 32 testcases.  And further imagine that you run it
under "make check -j32".  That is, you'll end up with 32 runtest instances,
each running exactly one testcase.  Now, if one of the testcases blows up, all
the other 31 will still run.  Again, this shows that testcase name, and
testrun order should not be important.  There is no real reason that a
sequential run in that scenario ends up with different results (testcase a.exp
blew up, so no result for the other tests) compared to a parallel
run (all tests ran successfully but test a.exp).  Instead, I advocate
that the testresults summary should indicate that a problem happened,
so that "blow up event" isn't lost.  But do let the whole testsuite run.

There are plans to eventually implement a native parallel testing mode. You are correct that the order in which testcases are run should be insignificant, and for valid testsuites that do not crash, it is insignificant -- all tests are run. The problems start when a testcase *crashes* and aborts with a Tcl error.

The entire reason for stopping the testsuite immediately when a Tcl error occurs is that it is undeniable that something is *very* *wrong* when the testsuite stops early. Software testing is hard enough when you know your testsuite is good, but it is far worse when you have bugs in the testsuite masking bugs in the actual program because a test script is crashing and part of the program that is supposed to be tested is not being exercised.


-- Jacob





reply via email to

[Prev in Thread] Current Thread [Next in Thread]