[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Warn on semantic newlines

From: Ingo Schwarze
Subject: Re: Warn on semantic newlines
Date: Thu, 16 Jun 2022 21:08:48 +0200

Hi Alejandro,

Alejandro Colomar wrote on Fri, Jun 10, 2022 at 11:52:30AM +0200:

> As far as I know, there's currently no tool that warns on "foo. bar"
> in filled test.  Not `mandoc -Tlint`,

That's not entirely accurate.

Instead of the strange example "foo. bar", let's try a
more realistic example:

   $ echo "He tried hard.  He really did." | mandoc -mdoc -Tlint
  mandoc: <stdin>:1:17: WARNING: new sentence, new line
   $ echo "He tried hard. He really did." | mandoc -mdoc -Tlint
  mandoc: <stdin>:1:16: WARNING: new sentence, new line

The mandoc(1) program warns if all of the following conditions hold:

 1. The input file uses the mdoc(7) language.
    The rationale for doing this for mdoc(7) only and not for man(7)
    is that the average markup quality for real-world man(7) pages
    is so much below the markup quality of average real-world mdoc(7)
    pages that warning about this detail would not really be helpful
    in man(7) but rather amount to even more noise.
    When you run mandoc -T lint on some run-of-the mill man(7) page
    found in the wild, you typically already get buried under a deluge
    of warnings and errors, usually including several rather serious
    ones.  Adding even more that are of little importance seems
    unwise to me while that situation persists.

 2. The input line in question is a text line, not a request
    or macro line.

 3. The line contains an unescaped ASCII period.

 4. The period is immediately preceded by at least two characters
    that satisfy isalnum(3), but not by the combinations "nc"
    and "vs" because the abbreviations "Inc." and "vs." are used
    so often that they would cause too many false positives.

 5. The period is immediately followed by one, two, or three
    unescaped ASCII space character (U+0020).

 6. The next character after the one, two, or three spaces
    satisfies isupper(3).

These rules were selected after confirming that the amount of false
positives they cause on real-world mdoc(7) manual pages is tolerable.
Remember that mandoc -Tlint is trying very hard to keep false
positives down.

> The tool could have a secondary warning, not so important,
> for "foo, bar".

Absolutely not.  I consider the recommendation "break input lines
after commas" completely bogus.  It provides no benefit whatsover.
nobody wants additional spacing after commas,
which is the main reason for the rule "new sentence,
new line".  Then,
following this rule strictly,
which people would have to do when taking such a warning seriously,
would makes the source code hard to read and ugly.

Of course, there is nothing wrong with *occasionally* breaking
your input line after a comma even if the line is not yet full,
for example if the comma separates to relatatively independent
clauses that are both somewhat lengthy.  But making a hard rule
about this would be totally ridiculous.

> Also, as far as I know, neither of -ww nor -Tlint have something 
> equivalent to -Wno-switch (or -Wno-error=switch), which could be
> nice to silence (or make non-fatal) some warnings on purpose.

For mandoc(1), you are the first person asking for that, and i don't
remember having seen such a request for groff(1) either.

> Do you think that could be implemented in groff(1) or mandoc(1)?

I dislike the suggested feature for several reasons.

There are few areas of program design and implementation that are
as prone to overengineering and excessive complexity as the area of
diagnostic messages.  I know that from both my own projects in the
past, where i fell into that trap in multiple projects, and from other
projects - just look at OpenSSL if you want a particularly bad example,
but the problem is extremely widespread and similar examples are easy
to find.  Keeping the number of diagnostic levels low and avoiding
unimportant configurability are crucial tools to keep complexity at
bay in this area.

Mandoc has six diagnostic levels, in ascending severity:
 1. style recommendations
 2. warnings
 3. parsing errors
 4. unsupported features used by the document
 5. invalid command line arguments
 6. operating system errors (e.g. "Cannot allocate memory")

Mandoc has one related command line option, -W.
I'm not conting "-T lint" separately because the main use of -T
is unrelated and because "-T lint" simply implies "-W all".

Six levels and one option are already s lot, and adding even more
would be evil.

Even with tools where excessive configurability is common,
like with C compilers, the consequences are invariably unpleasant.
Rather than striving to keep false positives down, compiler
developers usually take the existing, excessive configurability
as an excuse for adding warnings that cause high rates of false
positives, offering the flimsy pretext "if the noise bothers
you, just switch it off."  The end result of this kind of laziness
at the design stage is that instead of having sane defaults, next
to everybody has to configure and maintain their own set of exceptions,
including those people who never asked for that, which is the
overwhelming majority.

For mandoc -T lint, there is an additional aspect.  It is well-known
how having a globally uniform style of manual page formatting that
readers automatically get familiar with helps readability.
Similarly, a globally uniform style in the source code helps
maintainability (and in addition, also reduces the risk that a
given page deviates from the common output style).

Gently guiding people towards such a uniform style is desirable;
encouraging them to pick their own ruleset according to personal
taste is not desirable, *in particular* not for those few who might
actually consider doing that.

All that said, in general, i'm open to suggestions regarding how
to improve mandoc warnings, in particular regarding the addition of
missing warnings and the suppression of false positives.
But i'm not sure yet what to change following your present


reply via email to

[Prev in Thread] Current Thread [Next in Thread]