groff
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Groff] Bug in html backend


From: Werner LEMBERG
Subject: Re: [Groff] Bug in html backend
Date: Sat, 17 Aug 2002 01:18:06 +0200 (CEST)

> I think a good regression test framework ought to be able to
> accommodate not only end-to-end tests like you're proposing, but
> also minimal test cases for particular bugs that have been
> fixed. This is of course a lot of work, but I'd cite perl as an
> excellent example of regression tests that really work and are
> widely applied. Lots of minimal test cases are usually more valuable
> than lots of end-to-end tests in the long run, since they generally
> reflect more thought and produce more efficient coverage.

Exactly.  Basically, I only want this, so I mentioned TeX's trip test
(see below).  I'm not sure that I can write something similar for
groff since I don't fully understand (yet) all aspects of groff, but I
think you get the idea.

> I'm a little concerned about how to tell whether a test succeeded.
> Checking that groff completed successfully is one level, but you
> also want to test correct output. On the other hand, you don't want
> to make it needlessly painful to make typographical improvements by
> restricting at too fine a level what the test framework will
> accept. I'm not sure how to solve this. Fortunately many of the
> things that it would be nice to test relate to groff's internal
> computational facilities, and so a test file could simply use
> -Tascii and print "ok" or "not ok" based on the result of its
> computation; likewise you might often want to test the exact
> groff_out(5) output without feeding it through a postprocessor,
> which I think will be more stable than something like the output of
> grops.

IMHO we have to add different tests for preprocessors, troff, and
postprocessors.  Mixing them is probably a bad idea (except for
grohtml probably due to the nature of pre-grohtml).

> > Can I send the patches in unified diff format ?
> 
> I always send patches in that format.

I prefer -u to any other format.


    Werner


======================================================================

[Introduction from tripman.tex]

People often think that their programs are debugged when large
applications have been run successfully.  But system programmers know
that a typical large application tends to use at most about 50 per
cent of the instructions in a typical compiler.  Although the other
half of the code which tends to be the harder half might be riddled
with errors, the system seems to be working quite impressively until
an unusual case shows up on the next day.  And on the following day
another error manifests itself, and so on; months or years go by
before certain parts of the compiler are even activated, much less
tested in combination with other portions of the system, if user
applications provide the only tests.

How then shall we go about testing a compiler?  Ideally we would like
to have a formal proof of correctness, certified by a computer.  This
would give us a lot of confidence, although of course the formal
verification program might itself be incorrect.  A more serious
drawback of automatic verification is that the formal specifications
of the compiler are likely to be wrong, since they aren't much easier
to write than the compiler itself.  Alternatively, we can substitute
an informal proof of correctness: The programmer writes his or her
code in a structured manner and checks that appropriate relations
remain invariant, etc.  This helps greatly to reduce errors, but it
cannot be expected to remove them completely; the task of checking a
large system is sufficiently formidable that human beings cannot do it
without making at least a few slips here and there.

Thus, we have seen that test programs are unsatisfactory if they are
simply large user applications; yet some sort of test program is
needed because proofs of correctness aren't adequate either.  People
have proposed schemes for constructing test data automatically from a
program text, but such approaches run the risk of circularity, since
they cannot assume that a given program has the right structure.

I have been having good luck with a somewhat different approach, first
used in 1960 to debug an ALGOL compiler.  The idea is to construct a
test file that is about as different from a typical user application
as could be imagined.  Instead of testing things that people normally
want to do, the file tests complicated things that people would never
dare to think of, and it embeds these complexities in still more
arcane constructions.  Instead of trying to make the compiler do the
right thing, the goal is to make it fail (until the bugs have all been
found).

To write such a fiendish test routine, one simply gets into a nasty
frame of mind and tries to do everything in the unexpected way.
Parameters that are normally positive are set negative or zero;
borderline cases are pushed to the limit; deliberate errors are made
in hopes that the compiler will not be able to recover properly from
them.

A user s application tends to exercise 50% of a compiler s logic, but
my first fiendish tests tend to improve this to about 90%.  As the
next step I generally make use of frequency-counting software to
identify the instructions that have still not been called upon.  Then
I add ever more fiendishness to the test routine, until more than 99%
of the code has been used at least once.  (The remaining bits are
things that can occur only if the source program is really huge, or if
certain fatal errors are detected; or they are cases so similar to
other well-tested things that there can be little doubt of their
validity.)

Of course, this is not guaranteed to work.  But my experience in 1960
was that only two bugs were ever found in that ALGOL compiler after it
correctly translated that original fiendish test.  And one of those
bugs was actually present in the results of the test; I simply had
failed to notice that the output was incorrect.  Similar experiences
occurred later during the 60s and 70s, with respect to a few
assemblers, compilers, and simulators that I wrote.

This method of debugging, combined with the methodology of structured
programming and informal proofs (otherwise known as careful desk
checking), leads to greater reliability of production software than
any other method I know.  Therefore I have used it in developing
TeX82, [...]

reply via email to

[Prev in Thread] Current Thread [Next in Thread]