[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Warn on semantic newlines

From: G. Branden Robinson
Subject: Re: Warn on semantic newlines
Date: Sat, 11 Jun 2022 07:50:32 -0500

Hi Alex,

At 2022-06-10T17:47:40+0200, Alejandro Colomar wrote:
> On 6/10/22 14:16, G. Branden Robinson wrote:
> > For groff, at least, the fundamental change is straightforward.  I
> > can made the troff(1) command do it with a 1-line patch.
> > 
> > diff --git a/src/roff/troff/env.cpp b/src/roff/troff/env.cpp
> > index d6a9e982d..d3f80a205 100644
> > --- a/src/roff/troff/env.cpp
> > +++ b/src/roff/troff/env.cpp
> > @@ -472,6 +472,7 @@ void environment::space(hunits space_width, hunits 
> > sentence_space_width)
> >         && node_list_ends_sentence(p->next) == 1) {
> >       hunits xx = translate_space_to_dummy ? H0 : sentence_space_width;
> >       if (p->merge_space(xx, space_width, sentence_space_width)) {
> > +      debug("end of sentence detected in input line");
> >         *tp += xx;
> >         return;
> >       }
> I hope we see it in 1.23 :)

I'm hoping we'll hear from Bertrand soon...

> When you add it, I'll test it in the Linux man-pages.

I'll be measuring its impact on groff's own documents carefully first.

So far I've seen nothing to alarm me, and no false positives, strictly

I've seen stuff like this trip it.

\&.de foo
\&.  ie 1 bar
\&.  el   baz

The indentation after the control characters prompts the complaint.  But
that is, strictly, correct, because inter-sentence space _is_ applied
even when filling is disabled, as is the case in man(7) EX/EE regions.
So if people change the supplementary inter-sentence space amount, the
indentation will become incorrect, or at least not what was intended.
More on this below.

> > I _will_ commit now to not turning this warning on by default.
> > Sentence endings internal to a line are not incorrect roff practice.
> > They're merely easy to screw up (and they make diff-based
> > maintenance work uglier than it needs to be).
> Is the following correct?  "foo. Bar"

It's not idiomatic roff input.  If you want to end a sentence, put at
least two spaces or a new line after the end-of-sentence character
(sentence-ending-transparent characters notwithstanding).

However, the regex '[A-Za-z]\. [A-Z]' matches roff text lines

For instance, I record my name as "G. Branden Robinson".  This is safe
enough as long as I don't accidentally break the line after "G." as part
of editing operations.  That is why our documentation recommends "G.\&
Branden Robinson" instead.  But it's not _wrong_ to leave out `\&` here.

5.1.10 Input Conventions


   * Use '\&' after '!', '?', and '.' if they are followed by space,
     tab, or newline characters and don't end a sentence.

   * In filled text lines, use '\&' before '.' and ''' if they are
     preceded by space, so that reflowing the input doesn't turn them
     into control lines.

Perhaps amid all the argument over what to formally name the `\&` escape
sequence, we risk losing sight of its tremendous utility in workaday
roff documents.

(Kind of a shame that Texinfo renders a quoted apostrophe as "'''", but
but that's not a windmill I want to tilt at.)

> As far as I know, that breaks the ability of groff(1) to produce the
> double space;

It's bad style.  I wouldn't say anything gets "broken"; not all periods
_should_ have inter-sentence space after them.  Further, by making
sentence-ending detection reliable, we better accommodate the people who
want the amount of supplementary inter-sentence space to be zero (and
then eventually to change their minds and come over to the other side of
the Force. ;-)).

> so I'd say something's wrong there?  It's not only a matter of
> maintainability, is it?

I'm not quite sure what you're asking here.  Knowing the difference
between an abbreviation dot and a sentence-terminating period is
important for prose composition in many languages.  For example, it's
considered bad style to end a sentence with an abbreviation dot because
it creates ambiguity and increases the mental burden on the reader.

> > No, that's not correct.  GNU troff has supported a '-W' flag to
> > disable warnings of the type given in the argument since 1991 or
> > earlier.  It goes all the way back to day one of our Git history.
> > 
> > If I implement the thing above, then the Make rules for the
> > (non-man(7)) documents that don't use semantic newlines will need
> > their command recipes updated to say "-ww -Wsentence" or whatever
> > the new category gets called.
> Hmm, no?
>        -wname Enable warning name.  Available warnings are
>               described  in  section "Warnings" below.  To
>               enable most useful warnings use -w all.   To
>               enable  absolutely all warnings use -w w in-
>               stead.  Multiple -w options are allowed.
> -ww enables absolutely all warnings.  This is part of "absolutely all"
> warnings, so if someone wanted absolutely all warnings, they should
> get this one too.  If not, they better read absolutely twice before
> using it.
> -wall enables most useful warnings.  This is a "most useful" warning,
> since most of the manual pages (if not all) should follow this.

The -W and -w options are processed serially and applied to the set of
enabled warning categories, which is represented as a bit mask.  No
other attempt to reconcile these values is made.


$ printf '\\n[bogus]\ \\n[nl]u\n' | nroff
$ printf '\\n[bogus]\ \\n[nl]u\n' | nroff -ww
troff: <standard input>:1: warning: number register 'bogus' not defined
$ printf '\\n[bogus]\ \\n[nl]u\n' | nroff -ww -Ww
$ printf '\\n[bogus]\ \\n[nl]u\n' | nroff -ww -Ww -wreg
troff: <standard input>:1: warning: number register 'bogus' not defined

I think this behavior is intuitive, useful, and above all ergonomic,
since we have over 20 warning categories.

> If someone just wants a specific set of warnings, and that the build
> doesn't break because a warning was added, I'd say they should use a
> specific set of warnings, instead of a wildcard.

The `warn` request enables this.

> The same happens with gcc(1)'s warnings.  Every release, builds break,
> for a good reason. Solution: fix the code.

I don't think I'm precisely following your analogy.  As I said before,
eschewing "semantic newlines" does not make roff input invalid or wrong
in any way.  It is a question of taste and style (and I would even argue
robustness), and projects can impose their own rules in this area.

Kernighan may live by the semantic newline practice but Eric Allman
didn't and Peter Schaffter doesn't.  Their works are not less worthy.

I'm supportive of adding detection of within-line sentence breaks
because it is _really_ hard to reliably detect them with anything less
powerful than a roff interpreter.  Regexes are not up to the task.
People can change the control and escape characters.  Strings can be
interpolated at sentence endings.  groff further allows you to define
your own special characters, redefine exiting ones, and for any
character, change the attributes that determine whether it terminates a
sentence, cancels sentence termination, or is transparent to same.

If you or anyone else is tempted to interpret the construction of a
warning category for this purpose as a derogation of bad style, then I
would much prefer to realize the feature in a different way, which will
probably be heavier to implement (and take longer).  We're already
nearly out of option letters for groff(1), so maybe I would need to add
a writable register to enable the diagnostic.  (This could still be
accessed via the command like thanks to the '-r' option.)

Maybe that's not a bad idea, since people might want to turn this
diagnostic on and off at a finer-grained level than an entire formatter
run.  In that respect it resembles the notional 'backtrace' register
that I discussed with Ingo a while back.[1]

> I'd enable it [the warning? --GBR] for everyone.  Why not?

Because as I said above, it's not presumptively invalid to write roff
input as I have written most of this email.  That is one of roff's
virtues.  You can start out by writing everyday English prose as it was
taught in schoolroom typewriting or computer class[2] and progressively
supplement it with higher-precision information and formatting detail.

> BTW, I guess the "no-op escape" \& can be used to silence this
> warning, right?

Not correctly.  If I write

The quick brown fox jumps over the lazy dog.\&  Hello, world!

then end-of-sentence detection is defeated just as our documentation
says it will be.  That means that people who change the supplementary
inter-sentence space amount won't get what they want.

For example, let's take the popular case of reducing that supplementary
space amount to zero.

.ss 12 0
The quick brown fox jumps over the lazy dog.\&  Hello, world!
.pl \n(nlu

What do we get?

The quick brown fox jumps over the lazy dog.  Hello, world!

Note the two spaces after the period.  Russell Harper will send his
space-eating zombie hordes after us for sure.

Some might think that this point doesn't apply to man pages, but it
does.  As groff_man_style(7) notes,[3] one of the things you can use the
man.local (and mdoc.local) file for is to configure this parameter, so
that inter-sentence space appears in man pages as _you_, the reader,
want to see it.  Not as imposed by the man page writer, who may try to
insist that you accept his dubious typographical esthetic alongside his
technical content.[4]

I want very much to aid the construction of excellent man pages, but I
am not willing to damage GNU roff in other application domains to do so.
For that matter, not even all worthwhile man pages adhere to semantic
newline practice.  It's a project-level decision.  We should respect it.


[2] Or as it used to be, at any rate.  [beard grayness intensifies]
              Put site‐local changes and customizations into this file.

                     .\" Use narrower indentation on terminals and similar.
                     .if n .nr IN 4n
                     .\" Put only one space after the end of a sentence.
                     .ss 12 0 \" See groff(7).
                     .\" Keep pages narrow even on wide terminals.
                     .if n .if \n[LL]>78n .nr LL 78n
                     .\" Ensure hyperlinks are enabled for terminals.
                     .nr U 1

              On multi‐user systems, it is more considerate to users
              whose preferences may differ from the administrator’s to
              be less aggressive with such settings, or to permit their
              override with a user‐specific man.local file.  This can be
              achieved by placing one or both of following requests at
              the end of the site‐local file.
                     .soquiet \V[XDG_CONFIG_HOME]/man.local
                     .soquiet \V[HOME]/.man.local
              However, a security‐sandboxed man(1) program may lack
              permission to open such files.


Attachment: signature.asc
Description: PGP signature

reply via email to

[Prev in Thread] Current Thread [Next in Thread]