Re: sensitivity vs. specificity in software testing

groff

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: sensitivity vs. specificity in software testing

From:	G. Branden Robinson
Subject:	Re: sensitivity vs. specificity in software testing
Date:	Sun, 9 Apr 2023 18:00:42 -0500

At 2023-04-08T18:26:13+0100, Ralph Corderoy wrote:
> > My personal test procedures, I think, adequately do this for man(7);
> > every time I'm about to push I render all of our man pages (about 60
> > source documents) to text and compare them to my cache of the ones I
> > rendered the last time I pushed.
> 
> Yes, that's good as a lone developer.  Making it a hurdle for others
> could be nice to have.

I've considered it, but rapidly hit 2 problems: Where would the golden
master of the latest revision of the rendered man page texts be stored?
Our man page text changes frequently.  How do we ensure that other
developers' golden masters are updated in sync with the text changes?

Nevertheless if anyone wants to see my grody little shell scripts for
doing this, I can pass them along.  There's no cleverness involved.

> > I think there is a risk here of confounding macro package and
> > formatter problems with output driver problems.  All should be
> > tested, but not necessarily together except for inputs designed as
> > integration tests.
> 
> I think a distinction between us is I'm not talking about designed
> inputs.  ‘.DS .bp .DE’ isn't a typical designed input.

No, but the under-specification of ms(7) means that it can be difficult
to decide which inputs are within the model.  Does a single `ad` request
send you into uncharted waters?  A ridiculously strict reading of Lesk
1978 says it does.  A less ridiculously strict reading of Lesk 1978 is
not codified anywhere.

I think we're in the same boat with ms(7) that we are with man(7); we
are reduced to relying on historical practice is suggestive but not
dispositive, and on exercising judgment.  These amount to engraving
invitations for arguments, sometimes bitter ones.

Maybe mm(7) is nailed down a bit better; Damian recently mentioned that
as being one of its virtues as far back as the mid-1970s, and you've
been pretty emphatic that groff mm went wrong back around 1.18.[1]  With
DWB 3.3 mm available to me, maybe I'll even get the chance to fix that.
(Or concretely justify the current behavior.)

But that doesn't help us with other macro packages.  Even mdoc(7) has
proven a little slippery.  The other day Ingo asked me explicitly _not_
to make groff mdoc(7)'s indentation amounts align with our man(7)
settings, I suppose to avoid messing up his test suite for mandoc(1).

https://savannah.gnu.org/bugs/?64018

(Update: Also, indenting a paragraph by 7n is "objectively wrong".)

> > That is why the tests I've written have demonstrated an increasing
> > bent toward use of "groff -Z" and groff "-a"; these produce
> > device-independent output and "an abstract preview of output",
> > respectively.
> 
> troff's output is device dependent, as I just mentioned in another
> thread, but I know what you mean.

This is why I say (in our documentation) "device-independent but not
device-agnostic".

> The aim of formatting a corpus to pixels would be to quickly test a
> growing set of real-world documents.

I asked on this list multiple times over the years for contributions of
these and never got any response.  I think the main reason is not that
people don't want to help, but a familiar old problem: many such *roff
documents were works-for-hire or otherwise came into existence owned by
some corporate entity that, if it even still exists, has neither
sufficient records establishing present title, nor any interest in
researching such issues for the sake of making a few Internet hobbyists
happy or potentially giving up even the most marginal precious market
advantage.  This is part of the rentier mindset endemic to monopolized,
as opposed to commodity, markets: you may subjectively value something
at near zero, but as soon as someone asks about it, you tell yourself
you're not serving the firm unless you demand a billion dollars for it.

So as a pilot project I constructued my own, and now I'm getting
pushback on the changes I made to _more_ faithfully render Kernighan &
Cherry because ms(7) authors apparently have a habit of throwing `sp`
requests around instead of using the package's documented facilities for
precisely producing vertical space.  Are they wrong to do so?  Well,
maybe not.  Casting pointers to ints and back always worked before.

Fidelity is two-edged sword; I can easily see being expected to modify
groff ms(7) again to contrive the removal of the equation that went
missing in Version 7 troff output--bug-for-bug compatibility.

Faithful rendering may thus be an unachievable goal in some subdomains
of *roff typesetting.

> It would be cheap to add another document.

Too steep a price for those who didn't reply to my requests for a
corpus, noted above.

> The output of a preprocessor, troff, or a device driver may change
> intentionally.  Eyeballing those changes for the corpus would be
> tedious and error prone.  The pixels intentionally change less often.
> And eyeballing pixels to see the nature of the change tends to be
> quick compared to comprehending what a diff at a stage of the pipeline
> represents.

Keep in mind that with pixel-for-pixel comparisons, we're also subject
to changes in the font files used to render the document, over which we
have no control.  A few years ago I had an idea for a renderer that
didn't bother with rendering glyphs at all; it would just draw little
rectangles corresponding to the glyph metrics.  _That_ might hold out
some hope for your pixel comparator.

> So a corpus diffed as pixels serves a different purpose to hand-written
> coverage or regression tests.  Just as fuzzing attacks from yet another
> angle.

I think you overestimate the utility of tests that attempt to enforce
pixel invariance.  I understand what Knuth was going for with eternal
reproducibility of archival materials (including the Computer Modern
typefaces).  It's a lot easier to engineer that in from the outset than
to bolt it on later.

> Yes.  I'd be tempted to have a standard encoding which gives a
> readable rendering but compresses two or more blank lines.
> 
>     awk '
>         !length   { b++; next }
>         b == 1    { print "" }
>         b > 1     { print "-" b }
>                   { b = 0 }
>         /^-[0-9]/ { printf "-" }
>         1
>     ' 
> 
> One could also highlight or encode tabs or end-of-line white-space to
> make it obvious to the reader and protect it from incorrect change.

Sounds like a good project for a volunteer to take up!  I've written
about 160 of the 164 automated tests in groff 1.23.0.rc3.  Maybe someone
else can show up my amateurish efforts and I can shift my focus back to
things I'm better at.[2]

Regards,
Branden

[1] https://savannah.gnu.org/bugs/?24047
[2] What those might be is an open research problem...

signature.asc
Description: PGP signature

[Prev in Thread]

Current Thread

[Next in Thread]

Re: pdfroff in groff 1.23.0.rc3 changes compared to 1.22.4, (continued)

Prev by Date: Re: Formatting difference with 1.23.0.rc3 (ms)
Next by Date: Re: A version of fmt for troff files
Previous by thread: Re: sensitivity vs. specificity in software testing
Next by thread: Re: pdfroff in groff 1.23.0.rc3 changes compared to 1.22.4
Index(es):
- Date
- Thread