bug-dejagnu
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#41824: Dejagnu's unknown proc aborts testsuite run when triggered in


From: Jacob Bachmeyer
Subject: bug#41824: Dejagnu's unknown proc aborts testsuite run when triggered in test-case
Date: Sat, 27 Jun 2020 12:21:34 -0500
User-agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.8.1.22) Gecko/20090807 MultiZilla/1.8.3.4e SeaMonkey/1.1.17 Mnenhy/0.7.6.0

Pedro Alves wrote:
On 6/26/20 11:37 PM, Jacob Bachmeyer wrote:
Pedro Alves wrote:
On 6/25/20 11:39 PM, Jacob Bachmeyer wrote:
Pedro Alves wrote:

[...]

Could this be solved by putting each testcase in its own address space?
Yes.  Do people actually do that?  No -- it's a lot of make work and the
problem is rare.  Is that reuse global as array really revealing such a MAJOR
issue with the GDB testcase that it warrants stopping the whole run?  No...

I would argue that instead it is the DejaGnu framework that should make
it so that each testcase is run under as much isolation as possible.
So I call that a design issue in DejaGnu itself.  Whether that could
be fixed by something like sourcing each testcase under its own
Tcl slave interpreter, I'm not sure.

This is the current plan -- to eventually run test scripts in independent slave interpreters, which enforces the isolation necessary at the Tcl level for future native parallel testing and also prevents test scripts from accidentally clobbering globals that are important to DejaGnu, some of which have very generic names.

Another example of a Tcl bug that sometimes we trip on.  Say we have a
testcase that needs to extract some address from the inferior
in its setup phase, something like this:

 set test "get address"
 gdb_test_multiple "x /i \$pc" $test {
     -re "($hex).*$gdb_prompt $" {
         set addr $expect_out(1,string)
         pass $test
     }
 }
if {$addr == 0} {
   # The above failed, so no use continuing.
   return
 }

Now, gdb_test_multiple is a wrapper around expect, that internally catches
GDB crashes, internal GDB errors, timeouts and other things, so the bug above
is often missed.  The bug is of course that when e.g., a timeout occurs, $addr
is left uninitialized, so you get a Tcl error referencing a nonexisting 
variable.

But the thing is, we already issued a FAIL, and we want to bail out of
the testcase anyway.  Crashing the whole testrun for a bug that typically
happens in already-failing code is too strict.  And these are the
code paths that are not exercised in a normal run, so are easy to miss, and are just impossible to have full coverage for.

Fine; I will loosen that, but DejaGnu has no way to know that it is bailing out of a failing testcase and that there are no more tests from that script, so an UNRESOLVED result complaining about the error will still be inserted.

I think it's safe to say that we all understand the general principle,
but IMO that applies more to continuing the same testcase (if somehow that
were possible, like "unknown" suppressing the error), rather than continuing
with a different testcase.

Continuing the testrun when one testcase errors out does not mask any
bug in the program that is supposed to be tested.  Why would it?  The
testcase aborts, it doesn't continue.  The remaining testcases start
afresh.
Consider a hypothetical case where GDB has exec.exp (which tests starting a debugged process) and attach.exp 
(which tests attaching to a process).  Unknown to the GDB developers, the "attach" command does not 
work and causes GDB to dump core (oops!) but attach.exp fails with a Tcl error before reaching the point 
where "attach" is issued to the inferior GDB.  If DejaGnu stops immediately, it is obvious that the 
testsuite is broken, the bug in the testsuite is fixed, and the bug in GDB is found.  If DejaGnu sweeps the 
Tcl error under the proverbial rug (as previous versions did), the bug in the testsuite effectively hides the 
bug in GDB , because the developers think the "attach" command is being exercised, but those tests 
are being (almost) silently skipped.
First, that scenario doesn't normally cause a Tcl error.  If GDB trips
an internal error, the GDB testsuite catches it with a FAIL.  If GDB just dies
(whether it dumps core or not) the GDB testsuite catches that
with UNRESOLVED.
In the hypothetical scenario, attach.exp crashes with a Tcl error *before* issuing the 
"attach" command to the GDB-under-test.  Previous releases of DejaGnu mention a 
Tcl error in the log and carry on.  With this patch, DejaGnu catches that attach.exp 
crashed and itself inserts an UNRESOLVED result.

I also take this as more support for using the UNRESOLVED status here:  the 
test script crashing is little different from GDB exiting unexpectedly.

Well, I'm actually not sure why doesn't GDB FAIL in the crash case instead of
UNRESOLVED.  To me a GDB internal error or the GDB process crashing doesn't
look very different.  Both require pretty much the same level of attention
from a developer.  But that decision predates me.  Who knows, maybe GDB
internal error support was only added after the testsuite was originally
written, and thus the crash handling predates the internal error handling.

Considering that testing GDB was one of the motivations for writing DejaGnu and how old DejaGnu is, I suspect that the testsuite had to handle GDB crashes before GDB had support for reporting internal errors.

In some sense, UNRESOLVED is more serious than FAIL; the GDB testsuite has about a hundred FAIL results but typically no UNRESOLVED results.

And again, people run the GDB testsuite in parallel mode, so your
assumption that the testsuite would stop immediately anyway doesn't
hold.  One of the parallel runtest invocations bombs
out, while the others keep running, so by the time you get
back the prompt, other testcases already executed, and you don't
see the ERROR in the current terminal anymore.
This is a fundamental problem with slicing up the testsuite external to DejaGnu and 
running multiple instances of runtest -- output that is supposed to be "last at the 
end of the run" can be hidden away by other ongoing subset runs.

That's like arguing against parallelization, and saying that everyone
should run the testsuite in serial mode, and be OK with it taking 2 hours
instead of 30 mins to complete a run...

It is a problem with parallelization, just like aborting a test run is a problem when running parallel tests.

The parallel mode aggregates the summaries from the different runtest
invocations as a single summary at the end,

https://sourceware.org/git/?p=binutils-gdb.git;a=blob;f=contrib/dg-extract-results.py;h=30aa68771d491451f72d6dbd18f6d5ac339451b5;hb=HEAD

so I don't see what problem here is unsolvable.  Again, the issue is that Tcl
ERRORs aren't tallied up on the DejaGnu summary, so the aggregated summary
doesn't contain those either.

The ERRORs will be printed a second time, so all the script needs to do is collect text that appears after the summary and starts with "ERROR:". There seems to also be some confusion between Tcl errors and the ERROR messages produce by DejaGnu's perror procedure (or its internal equivalents). Would it be better to report Tcl errors with "CRASH" instead of "ERROR"?

  You'll need to
look at the testsuite .sum .log files anyway.  Only, you will
find out that one testcase bombed out, 50000 tests ran, but 1000 others
didn't run because of that one testcase.  And now you're missing
those 1000 results, which maybe included those which you were
actually interested in, because you were doing a builbot try run
across a number of systems.
This also applies on a smaller scale for which we have no workaround:  
gdb.foo/bar.exp bombs out and the remaining /N/ tests in that file did not run. 
 The only real solution is for the test scripts to avoid causing Tcl errors in 
the first place; anything else is papering over the problem with duct tape.

That's not the same thing.  The remaining /N/ tests in the file depend
on the previous /M/ tests.  E.g.,

 gdb_test "step"
 gdb_test "print foo"

If the "step" fails, the "print foo" test is meaningless.

However, there should be absolutely no dependency between foo.exp and
bar.exp.  It should be possible to run those two different testcases
in any order.  Thus, it should be irrelevant to foo.exp whether bar.exp
bombs out of not.  Consequently, there's no real reason to not let
foo.exp run regardless of what happens to bar.exp.

This will need to be documented... another item for the local TODO list.

  Now
you wish you had remembered to type --keep-going, but it's not
the default, and nobody ever writes Tcl bugs, so you didn't think
of it before.
Note that the snapshot of GDB I have been using for testing this patch does have such a Tcl bug in its testsuite. I did not make any particular effort to choose this snapshot, it was just the current state of GDB master when I started. I have not even looked at *which* commit it is.

Can you tell me what the error looks like?

With commit 7b958a48e1322880f23cdb0a1c35643dd27d3ddb and GDB built without Python support: ERROR: tcl error sourcing /home/jacob/arena/gdb-check/build-queue-test/gdb/testsuite/../../../src/gdb/testsuite/gdb.python/py-format-string.exp.
ERROR: invoked "continue" outside of a loop


-- Jacob





reply via email to

[Prev in Thread] Current Thread [Next in Thread]