[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: zero-width space

From: Ingo Schwarze
Subject: Re: zero-width space
Date: Mon, 6 Jun 2022 07:11:06 +0200

Hi Branden,

Deri wrote on Mon, Jun 06, 2022 at 12:23:19AM +0100:
> On Sunday, 5 June 2022 02:58:14 BST G. Branden Robinson wrote:

>> A big problem with "zero-width space" is that it falsifies the statement
>> that adding a newline or multiple (regular) space characters after a
>> candidate end-of-sentence character results in inter-sentence spacing
>> being added.  (Unless there's a break for some other reason, of course.)
>> A novice could quite easily reason that something we go to the trouble
>> of _calling a space_ behaves like one--but it doesn't.  Zero-width?
>> Sure.  But _this_ space _cancels_ end-of-sentence detection.

> I thought that the end of sentence detection relied on seeing regular
> spaces after the period, as you stated above. Only actual typed
> spaces will trigger the behaviour. If you follow the period with any
> of the other types of space, end of sentence is not triggered.
> So \& is behaving just the same as its other unbreakable space
> cousins. It is not special, it cancels end of sentence detection
> simply because it is not a regular space. Our friend the \Z''
> also cancels the behaviour as well.
> If your novice reasons that anything we call a space can be
> used instead of multiple regular spaces to trigger end of sentence
> detection, they will be disappointed because they all prevent
> detection just the same as \&.

This is all true and invalidates the argument that calling \&
a "zero-width [non-breaking] space" is prone to causing confusion
about end-of-sentence detection.

That led me to notice that the documention is indeed slightly fuzzy
regarding this point; "info groff" tells me:

  5.1.2 Sentences
  GNU 'troff' does this by flagging certain characters (normally '!',
  '?', and '.') as potentially ending a sentence.  When GNU 'troff'
  encounters one of these "end-of-sentence characters" at the end of a
  line, or one of them is followed by two spaces on the same input line,
  it appends an inter-word space followed by an inter-sentence space in
  the formatted output.

Branden, please consider improving the words "followed by two spaces".

Anything like

  followed by two ordinary space characters ("  ")
  followed by two unescaped space characters ("  ")

might do.

You might also consider saying "at the end of an input line"
rather than just "at the end of a line".

I'm not sending a specific patch because you are both prolific and
efficient at polishing documentation style and content, so i believe
you will easily find a solution that is both precise and reads well.

Note that even though mandoc does follow the same rules as groff
in this respect, mandoc documentation solves the problem by only
documenting that end of sentence detection happens at the end of
input lines and strongly discouraging putting the start of a new
sentence in the middle of an input line.  That has the benefit of
being significantly easier to understand and remember.  It's a small
price to pay for that benefit that it moves the exact behaviour of
a discouraged style into the domain of "undocumented quirk that you
should not rely on".

Then again, if groff wants to be more thorough and document the
possibility of starting sentences mid-line, the above wording
suggestions might help.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]