groff
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Issue in man page ascii.7


From: Alejandro Colomar
Subject: Re: Issue in man page ascii.7
Date: Mon, 5 Dec 2022 15:31:23 +0100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.5.1

Hi Branden,

On 12/5/22 15:06, G. Branden Robinson wrote:
[...]

You're welcome, but I think we might have talked past each other below.

Sure, I try to do it consistently.  If I Cc you is a "just read it if
you want, not forced, maybe you're busy and someone else on groff@
picks it up".  :)

Works for me.  :)

what's going on here
[the problem that Helge reported]
is actually a GNU tbl(1) bug.

https://savannah.gnu.org/bugs/?61909
I think I'll keep this as a WONTFIX.

The man-pages don't have stable releases (i.e., what you get at the
time your distro releases is what you'll get forever), so stable users
will have this bug unfixed forever until they dist-upgrade, even if I
fixed it.

Soon (we hope), groff 1.23.0 will be released, so next OS releases
(e.g., Bookworm) won't have this bug (and many others that you fixed).

So, the only problem is for those who use stable distros, but somehow
install the fresh man-pages.

No, that is not the case.  Because there _aren't_ dummy characters \&
after the sentence ending punctuators [!?.] that are followed by
multiple space characters in the ascii(7) page today, _and_ every known
released version of GNU tbl incorrectly applies the configured
inter-sentence space to the second space character after such
punctuators, people are getting incorrect output _now_ from this table,
and any others that regex-match "[.!?]  " in ordinary text blocks if
their configured inter-sentence space amount is not the default.

My point was that:

-  groff 1.23.0 will (hopefully) be available since Bookworm, a few months from 
now.

- man-pages 6.02 will be available exactly at the same time to end users (okay, in other distros that might differ by a few months, but more or less, I expect to be available at the same time, more or less.

Since there will only be a small window from when I release to when you release, workarounding it in the man-pages will effectively be useful to 0 Debian users, and very few users of other distros. I can live with it.

In fact, I'd could just wait to release 6.02 after groff, since it would still be on time for Bookworm, and that would allow for more changes into that release. However:

-  I'd like to give some extra time for translators to work before the freeze, 
and

-  The solstice is a nice day for a release.  I prefer it over some random day 
:)


That last condition is in fact common for non-Anglophone users of groff.

Let me show you a simple exhibit and then I'll drown you with more
background.

---snip---
$ cat EXPERIMENTS/iss.man
.TH foo 1 2022-12-05 "groff test suite"
.SH Name
foo \- frobnicate a bar
.SH Description
.TS
L.
Foo.  Bar.
.TE
.ss 12 0
.TS
L.
Baz.  Qux.
.TE
.TS
L.
Hep.\&  Sid.
.TE
$ nroff -t -man EXPERIMENTS/iss.man # groff 1.22.4 (Debian)

foo(1)                      General Commands Manual                     foo(1)



Name
        foo - frobnicate a bar

Description
        Foo.  Bar.
        Baz. Qux.
        Hep.  Sid.

Yep, I think I can live with that bug for half a year.




groff test suite                  2022‐12‐05                            foo(1)
$ ./build/test-groff -t -man -Tascii EXPERIMENTS/iss.man # groff Git
foo(1)                      General Commands Manual                     foo(1)

Name
        foo - frobnicate a bar

Description
        Foo.  Bar.
        Baz.  Qux.
        Hep.  Sid.

groff test suite                  2022-12-05                            foo(1)
---snip---

So, a table entry _lacking_ these dummy character escape sequences \& is
exposed to the old groff bug, which still exists in the wild on every
system until last week, I suppose.  (This bug is not man(7)-specific.
It will affect any groff document regardless of macro package.)

Lengthy background
==================

It can be seen that the difference in output was prompted by this line.

.ss 12 0

The formatter's default is equivalent to this.

.ss 12 12

The function of the number "12" is not obvious here; it arises from
traditions of mechanical typography.  But what it _means_ is, "put one
word space between each word and put one (additional) word space between
sentences on the same output line".

Yeah, but nobody should be manipulating the inter-sentence spacing in a
man page, right?  Right.  But, localization files...

$ git grep 'ss 12 0' tmac
tmac/cs.tmac:.ss 12 0
tmac/de.tmac:.ss 12 0
tmac/fr.tmac:.ss 12 0
tmac/groff_man.7.man.in:\&.ss 12 0 \e" See groff(@MAN7EXT@).
tmac/it.tmac:.ss 12 0
tmac/sv.tmac:.ss 12 0

Not to mention the fact that this request could appear in a troffrc or
man.local file.  In short, this is a user-configurable parameter and a
portable man page should not assume the inter-sentence spacing amount.

\& works to hide the bug even on old (well, current :-/ ) GNU tbl
because it suppresses the detection of sentence endings altogether.

\& does have other semantics in tbl(1) tables; it is used to align
the units place in columns using a numeric format (classifier "N" rather
than "L" or "C", for instance), but I've never in my life seen that
format used in a man page.  (It is also hard to grep for without gagging
on false positives.)  But, in principle, telling people just to work
around the bug by adding \& in _all_ circumstances is a bad idea for
this reason.[1]

There's a lot of bloody history around inter-sentence spacing, enough
that we have to cover the subject in the groff Texinfo manual,[2] and it

Hmm, I think I'll refer to that link more than once.

is compounded by luminaries like the general editor of the Chicago
Manual of Style lying to the public about that history.  groff maintains
compatibility with AT&T troff in this area.

In Europe, supplemental inter-sentence space is _not_ common, and I
gather there is some kind of official European Union style guide that
militates against it.  It is binding only upon official EU publications,
but many organizations have adopted it nonetheless--it saves the expense
of maintaining a style guide of one's own, and plenty of people
in the U.K. who voted for and celebrate BrExit nevertheless slavishly
follow EU prescriptions in this area.

That can be random people that install random packages from source, or
contributors to the pages.  For both of them, I specify the
dependencies in the INSTALL file, so I hope they don't blame me too
much; they should ask their distributor about backporting groff 1.23.0
for installing the pages from source, or install groff from source, or
be happy with small glitches like this :)

I understand if you don't want to mess with a belt-and-suspenders
approach, but I want to make sure you're making an informed decision. :)

However, things like .MR concern me more.

Me too.  I'm trying to contain my expectations because history is
replete with nice new features that suffered deaths of neglect.

You already have a future user here.  I don't think it will die ;)


(warning: inside baseball^W^Wgroff internals)

Right now even email and web URLs in man pages aren't hyperlinked in
PDF, and that's silly.  So I'm trying to orthogonalize man(7) hyperlink
support so I can couple it to gropdf(1)'s "pdfmark" support.

Or I would be working on it, if the under-documented "pdfhref" macro
weren't structured to make it a pain in this ass.  I guess whoever
designed that didn't expect someone to format link text in a diversion.
Also I discovered an exciting new (old) bug when formatting HTML.  :(

Anyway, once that is done, I can integrate Deri James's cool trick for
converting "local" man page cross references into PDF bookmarks, so you
do something like, hypothetically,[3] produce a 380-page compilation of
60 man(7) and mdoc(7) documents that have hyperlinked cross-references
to each other, and present "man:blah(1)" hyperlinks for pages outside
that collection.

Hmm, this reminds me I also want to do that single PDF for the Linux man-pages. I'll ask you again about it when I have less stuff in queue. Right now I'm busy rewriting documentation about strings, killing some of them, and documenting when/where others should be used. I'm writing a new string(7), which should be a nice guide on which string function to use for your case.


I might fail at orthogonalizing, but I'll do my damnedest to at least
get this _working_.  ("groff 1.24: the same but with elegance"... :-| )

:)


I'd be happy doing some radical changes and requiring 1.23.0 as a bare
minimum, and use MR right after the Bookworm release.

[insert Kang and Kodos clip]

Cheers,

Alex


Hopefully that triggers backporting of groff; maybe you can do that as
a future maintainer of the Debian package?  :P

Maybe, if groff 1.23 proves not to have many surprising regressions,
that would be feasible, but I would prefer to delegate that sort of
task.  Build a team wherever you can.  A backport is more likely to
happen if groff 1.23 proves not to have many regressions from 1.22.4.
I've gone to considerable lengths to avoid that: I have automated test
#152 in my working copy now.  (groff 1.22.4 had three.)

[1] (groff insider stuff)

The parentheses in here help a lot with long messages :)

I fear "tl;dr" was coined around 1999 by people exposed to my emails.

Regards,
Branden

[1] tbl uses the _leftmost_ `\&` in a numerically formatted entry as the
     alignment position.  For instance, imagine a business that produced
     formatted reports by accepting text input from a terminal^Wweb
     form.  Also assume that the report generator wasn't too fastidious
     about tidying up that input.

.\" nroff -t | cat -s
.TS
tab(@);
C S
C S
L N.
Amy's Kennels
Boarded Animals, Week of 2022-12-05
Size@Name and check-in weight (kg)
Large@Max      25.6
\^@Sassy.      44.8
Small@Henrietta    6.24
\^@T. J. Peepers.\&  (chinchilla) 3.03
.TE

This is not a _well_-designed table, but it is a _plausible_ one.  Well,
almost.[4]  But adding another \& later at the "real" position where the
decimal point should be aligned will not help, because the leftmost one
controls.

[2] 
https://git.savannah.gnu.org/cgit/groff.git/tree/doc/groff.texi?id=aa20f5961cb0788e888180c57add5a452ce9d8d6#n4976
[3] 
https://git.savannah.gnu.org/cgit/groff.git/tree/doc/doc.am?id=aa20f5961cb0788e888180c57add5a452ce9d8d6#n257
[4] I'd like to meet the web-form-using kennel service staffer who
     knew to sneak *roff escape sequences into the input.  But we all
     know that failure to validate input is as common as street litter.

--
<http://www.alejandro-colomar.es/>

Attachment: OpenPGP_signature
Description: OpenPGP digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]