[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Groff] It is time to modernise "groff"

From: G. Branden Robinson
Subject: Re: [Groff] It is time to modernise "groff"
Date: Mon, 4 Sep 2017 15:38:08 -0400
User-agent: NeoMutt/20170113 (1.7.2)

At 2017-08-31T20:54:10+0000, Bjarni Ingi Gislason wrote:
> Introduction:
> A) There are about 1400 lines in the code in the "groff" repository that
> contains the backquote, grave (`) as a directional quote, but it is output
> exactly as itself, as it is not processed as an input to "groff" to be
> formatted and typeset.

No argument here.  A grave accent is not a directional quote; it's not
1968, and nobody is using a 7-bit display system nor a paper teletype
where the semantics of backspacing in the output data stream to compose
characters are respected.

Stuff like `this' should be burned with fire and extreme prejudice.
It's been ugly for over 20 years and is indefensible.

If LC_MESSAGES matches .*\.UTF-8 then people who love directional quotes
can write po files for them to their heart's content.

> B) There are too many manuals (man pages) that contain a syntax error that is
> not seen because
> a) The user does not use '-ww'
> b) Programs, that use "groff", like "man" (from "man-db"), suppress most
> warnings by default, so the user must himself arrange for the permission to
> see the warnings.

I agree with others in the thread that making groff's default behavior
into an extremely rigorous validator will exasperate our users.

However, there should exist an easy-to-use knob to turn this on, and
people like us who are trying to develop groff and/or improve man pages
and other *roff documents should be using it.

Bjarni, your efforts to make groff's own documentation reflect best
current practice are extremely laudable and we should make it easier for
others to help you out in this area.  You have stomped on many of the
problems that annoyed me, and noticed many others I had no idea about.
So before I get back to your proposal, THANK YOU!

> Changes:
> case A:
>   Instead of changing these '`' to "'", it is time to modernize "groff" and
> get rid of Americanism, old, obsolete, deprecated, bad, and worse decisions.

This is too broadly and vaguely phrased to make for a mission statement.
(I would say something about too susceptible to subjective
interpretation, but what mission statement _isn't_?) Further,
occasionally, Americans do set good precedents.  I'd be happy to cite an
example if I could think of one.  ;-)

>   Suggestions:
> a) Change command substitution `...` to a modern form, $(...) or equivalent

Yes.  As Tom Duff pointed out, the escaping required on ` within ` is
exponential in the nesting depth.  Almost no one will get this right.
Moreover, almost every Bourne-family shell user these days is using a
shell that is mostly POSIX-compliant, and $() is not one of the
less-well-respected parts of the standard.  I hear even Solaris came
around several years ago.

foo=`bar` is a fine shortcut for lazy typists but horrible pedagogy.

> b) Change backquote (`), that is used as a quote mark, to ' (single quote)

Iff proper directional quotes are not available, yes.

[skipping a few points that I think might cause legacy documents to
become ugly]

>   I have seen too many man pages, where a warning from "groff" is seen, if
> the user allows it.
>   Programs like "man" (from "man-db") suppress most warnings if the user
> does not turn on an environmental variable (which too many users and
> maintainers do not know of, or ignore).

A few things can be done here.

1) man-db's own man(1) page should document this environment variable.
All I see is the -w option which points me to the dang groff info page
for a list of its recognized arguments.

2) groff(7) and groff_man(7) should tell document writers and
maintainers how to turn warnings on.  Right now groff_man(7) says
nothing and groff(7) gives terse tabular descriptions of some
warning-related requests that I frankly mostly don't understand.
Tutorial material is needed.

3) man-pages's man(7) should also tell man page writers and maintainers
how to perform validity checking of their documents.

Best practices exist; let's get them written down and spread the news.

>   Writers, maintainers (up- and downstreams) should not be allowed to misuse
> "*roff" to produce, maintain or distribute faulty products!

That runs afoul of the FSF's Freedom Zero[1].  There are limits to how
much we can solve social problems with technology.

What we can and should do is make Doing Right easy, and Doing Wrong
tedious or discouraging.  Let Groff be a nice quiet Unix tool that emits
no spew to stderr when all is well.  But only then.

> h) Make the default page size be A4 (a4).
> i) Adjust default sizes to the metric system.
> j) Use a metric point as default.  1 such point is then 375 micrometres
> (15 x 250 um) or about 1 didot-point (0.376 mm).

The above items should be determined by consulting the locale, surely?

> k) Add a warning (error) to some macros, when they are misused.
>   Example: macros for two fonts (like .BR, .IR) but have only one argument.

Yes, these should warn because man page writers often screw this up
thanks to careless use of cut and paste.

Relatedly,  I've seen grief caused by attempts at cross-referencing the
titles of section headings:

.SH Description
The foobar frobnicates.
Some diagnostics are fatal; see
.BR Exit status ,
.\" ...
.SH Exit status
.\" ...

But that sort of problem is a lot harder to catch with logic.

> l) Remove some compatibility of "groff" with Unix troff, example:
>   preproc/tbl/table.cpp:// The only point of this is to make `\a' ``work'' as 
> in Unix tbl.  Grrr.
> m) Change ``...'' to "...".  Directional quotation mark are not useful in
> comments, output to the standard error, or output that is not processed by
> "groff" itself.  Applies also to "groff".
>   Such writing of quotes is a good example of how people get brainwashed.
> n) Let \[en] output '--' (en-dash) when glyph is missing
> Let \[em] output '---' (em-dash) when glyph is missing

I don't agree with this.  This is an aping of TeX's input conventions
and not standard English orthography.

In typewriter-like environments, there simply is no distinction between
the hyphen, the minus, and the en-dash in English, and an em-dash is
written as "--".

I think a far better solution is to identify and fix fonts that are
missing these extremely important glyphs.

At the risk of blaspheming, coverage of the Windows-1252 character
set[2] not only pays off immensely in terms of actually-encountered
character repetoire in English-language text (which is what most[?]
*roff documents and especially man pages are), but most fonts that are
either commercial or designed to replace commercial fonts have to have
glyphs for its codepoints.

(For that matter, any FLOSS font intended for text rendering has little
business not matching or exceeding Windows-1252's coverage.  It's only
an 8-bit encoding, and no glyphs are defined for the C0 controls, in
distinction to the Mulligan stew that the older IBM code pages crammed
in there.)

> o) Use the .ig request for longer (4-5 lines) comments, like
> .de comment
> ..
> .ig comment
> <Comments>
> .comment

1) Why is ".de comment" necessary?  .ig with no arguments is already
terminated by "..".

Some quick experimentation shows that .ig does actually call its
argument, _and_ ignores everything until .argument is seen, kind of
like a shell here document.

In any case,

multi-line comment
multi-line comment
multi-line comment
multi-line comment

is all that is required.

2) This is a good tip; Vim's autoindenter is too dumb to handle *roff
comments.  It happily flows .\" right into your paragraphs as if it were
an English word.  Sigh.  On the bright side, its syntax highlighter
recognizes .ig and handles it correctly, but only in the no-argument

>   Issue an error if there are more the (4-5) consecutive lines that begin
> with the comment request.

Disagree here.  Instead we should promote knowledge of the .ig request.

Though I suppose there is the perenially damnable question of its
portability in man pages.  As a rule I hate to see font escapes and
non-macro requests in man pages.  I mean REALLY hate.

> p) Remove the '-a' option (the ASCII approximation output).

I didn't even know this existed.  Looking at what it spits out, I find
myself wondering what good it is.  Is this for Unix troff compatibility?
For people who didn't even have glass TTYs and needed to imagine what
the typeset output would look like?

I see that SUS has not standardized the troff command at all, so I
suppose there is no specification requiring us to keep it.  Maybe warn
of its deprecation in the groff 1.22.4 release notes, and remove it for
groff 2.0[3]?

[3] I'm a believer in semantic versioning.


Attachment: signature.asc
Description: PGP signature

reply via email to

[Prev in Thread] Current Thread [Next in Thread]