[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: sensitivity vs. specificity in software testing

From: G. Branden Robinson
Subject: Re: sensitivity vs. specificity in software testing
Date: Fri, 7 Apr 2023 11:56:42 -0500

At 2023-04-07T13:38:38+0100, Ralph Corderoy wrote:
> > On the one hand I like the idea of detecting inadvertent changes to
> > vertical spacing (or anything else) in a document, but on the other,
> > I find narrowly scoped regression tests to be advantageous.
> Agreed.  I assume groff is a long way from a set of tests which give
> high code coverage.

I have no concrete data, but fear you're right.  Attacking the code base
with gcov(1) is on my long mental to-do list.

> I think that swings in favour of detecting inadvertent changes.

My personal test procedures, I think, adequately do this for man(7);
every time I'm about to push I render all of our man pages (about 60
source documents) to text and compare them to my cache of the ones I
rendered the last time I pushed.  This is better than nothing for
mdoc(7), which gets swept up in this screening procedure, but has

> > I think maybe the best-of-both-worlds solution is to have a model
> > document-based automated test--perhaps one that exercises as many
> > ms(7) macros as possible.
> A bit of a torture test?  Yes, worthy.
> > would add the highly sensitive Rumsfeldian "unknown unknowns" problem
> > detection that I think your suggestion is tuned to.
> I don't think it would catch the things not thought of, like the .bp
> within a display.  I've probably mentioned it before, but a corpus of
> real-life documents would be good input to a troff test harness.
> Render each at, say, 150 pixels per inch in monochrome by default and
> compare against a golden version made earlier.

You have mentioned it before.  I think there is a risk here of
confounding macro package and formatter problems with output driver
problems.  All should be tested, but not necessarily together except for
inputs designed as integration tests.

With the benefit of a few years experience, I would claim that our
defect rate in output drivers is pretty low compared to that in the
formatter and (particularly) macro packages.  This could be because the
effects of the latter are more dramatic, and problems with rendered
output being off by a pixel will come to the notice attention of our
users once earlier stages in the pipeline are of unimpeachable quality.

That is why the tests I've written have demonstrated an increasing bent
toward use of "groff -Z" and groff "-a"; these produce
device-independent output and "an abstract preview of output",

> Commands like ‘gm compare’ or gmic(1) can do the pixel comparison.

I wasn't familiar with these tools; thanks for mentioning them.  I
reiterate though, that the bugs we tend to encounter are detectable
before getting to the output driver.


> > This is a good suggestion for handling blank line-happy output, of
> > which we have quite a bit in groff.
> I of course produced it by doing the opposite, line feeds into commas,
> and then reverted a comma by hand where I wanted to show a page break
> with a linefeed.

Several ways to skin this cat.  :)



Attachment: signature.asc
Description: PGP signature

reply via email to

[Prev in Thread] Current Thread [Next in Thread]