bug-groff
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[bug #65710] [preconv] require disambiguation of U+00A0 on input


From: Dave
Subject: [bug #65710] [preconv] require disambiguation of U+00A0 on input
Date: Tue, 11 Jun 2024 22:35:04 -0400 (EDT)

Follow-up Comment #4, bug #65710 (group groff):

[comment #3 comment #3:]
> Bjarni's prescription was much too strong.

Bjarni's "prescription" was an editorial recommendation to users that had no
bearing on his proposed code change.  Users would be free to ignore it.

The current proposal--to upgrade his proposed warning to a fatal error--only
intensifies the problems.  As I said over there, preconv should not be in the
business of policing which parts of valid Unicode users use.  The new proposal
kicks that policing from a written warning up to jail time.  It's wildly out
of proportion with the offense.

> _Maybe_ they mean `\ ` (an unadjustable space).  It's
> impossible to know, which is why they should disambiguate it.

It's technically ambiguous, in the same way that "We're going to spend the
week in a cabin" is technically ambiguous: that person could be talking about
either a house in the woods or the interior of an airplane.  But anyone
hearing it will know exactly what they mean--crucially, because if they did
mean the latter, they'd specifically note it.  When you say something mildly
ambiguous, but with one meaning far more likely than another, it's only the
exceptional meaning that tends to need to be disambiguated.

Even the roff language--hardly a paragon of DWIM design--understands this. 
You need only say ".sp 4" to space down four lines; you don't have to specify
"4v" because roff gives the request a sensible default unit.

Likewise, \~ is the sensible default meaning for U+00A0: in almost all normal
situations, the user will want \~.  The documentation clearly spells out how
to get different {units for .sp / types of nonbreaking spaces}, so users who
want the rarer \space in certain places can explicitly say so.

Making the user edit a bunch of valid Unicode characters (or valid ISO 8859-1,
or 8859-2, or any other encoding in the ISO 8859 family) only impedes
preconv's ability to import text from another source and use it directly.  We
should be making this easier for users, not putting up needless roadblocks in
the name of semantic purity, certainly not without wider discussion.

> And if there are multiple U+00A0 characters in sequence, the
> author might be better off supplying a `\h` sequence to
> express what it is they want, precisely.

Sure.  But the formatter allows \~\~\~\~\~ without complaint, and adding a
complaint here is beyond what this ticket is proposing, so this is tangential.


    _______________________________________________________

Reply to this item at:

  <https://savannah.gnu.org/bugs/?65710>

_______________________________________________
Message sent via Savannah
https://savannah.gnu.org/




reply via email to

[Prev in Thread] Current Thread [Next in Thread]