bug-groff
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[bug #63985] [troff] diagnose when attempting to remove an ordinary char


From: G. Branden Robinson
Subject: [bug #63985] [troff] diagnose when attempting to remove an ordinary character
Date: Wed, 5 Apr 2023 17:44:17 -0400 (EDT)

Update of bug #63985 (project groff):

                  Status:                    None => Postponed              
             Assigned to:                    None => gbranden               

    _______________________________________________________

Follow-up Comment #3:

[comment #1 comment #1:]
> There's no differentiation between input and output in that snippet, so in
case anyone is confused by it, Branden must have typed a ^D after the .pl
line.

Yep--I pasted my shell session and moved on without thinking much about
readability.  Whoops!

[comment #2 comment #2:]
> This problem is not limited to characters in the ASCII range; it seems to
apply to any Latin-1 (groff's native input encoding) character.  (The
following uses a Latin-1-encoded input file and a Latin-1 output
environment.)

> $ cat rchar_test
> .nf
> äbc
> .rchar ä
> äbc
> .pl \n(nlu
> $ nroff -ww rchar_test
> äbc
> äbc


I think this is because the printable characters in the Unicode Latin-1
supplement (U+00A0..U+00FF) are first-class citizens to groff.  (Because CCSID
["code page"] 1047 is a rearrangement of ISO 8859 Latin-1, and because GNU
troff is compiled expecting one or the other as its input encoding, the same
characters are first-class citizens in it despite their different code
points.)

The planned (but unscheduled) migration to accept UTF-8 input will abandon
that support in favor of being able to interpret UTF-8 multiple sequences.

Anyway, as a bit of status, I hit an impediment to implementing this.  Almost
everything in the tree is fine with it; all but one automated test passes. 
The exception is something internal to the mom(7) package which attempts to
remove a _whole bunch_ of ordinary ASCII/Basic Latin characters.  So this is
on hold pending my exploration of mom internals and a discussion with Peter
Schaffter over alternative solutions or whether, in fact, what mom is doing
today should block this change.


    _______________________________________________________

Reply to this item at:

  <https://savannah.gnu.org/bugs/?63985>

_______________________________________________
Message sent via Savannah
https://savannah.gnu.org/




reply via email to

[Prev in Thread] Current Thread [Next in Thread]