[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[bug #62300] [preconv] does not handle U+00A0 (NBSP) as it should
From: |
G. Branden Robinson |
Subject: |
[bug #62300] [preconv] does not handle U+00A0 (NBSP) as it should |
Date: |
Tue, 12 Apr 2022 06:53:36 -0400 (EDT) |
Follow-up Comment #2, bug #62300 (project groff):
Hi Bjarni,
[comment #1 comment #1:]
> commit f47b7dd139525bf3b8b4fbe767c3a45816c8445a
> Author: Bjarni Ingi Gislason <bjarniig@rhi.hi.is>
> Date: Sat Nov 17 15:59:09 2018 +0000
>
> The character \[u00A0] is not recognized
>
> The input character "no-break space" (' ', 0xA0) is mapped by "groff"
> to '\~' (groff_char(7)), but only the character name '\[char160]' is
> translated in the file "tmac/troffrc".
Yes.
> The "preconv" translates the no-break space to the name '\[u00A0]'.
That was an error and is the subject of this ticket.
> diff --git a/tmac/troffrc b/tmac/troffrc
> index 1bd4aa8c9..8895a9a01 100644
> --- a/tmac/troffrc
> +++ b/tmac/troffrc
> @@ -33,10 +33,14 @@ troffrc!X100 troffrc!X100-12 troffrc!lj4 troff!lbp
troffrc!html troffrc!pdf
> .
> .\" Test whether we work under EBCDIC and map the no-breakable space
> .\" character accordingly.
> -.do ie '\[char97]'a' \
> +.do ie '\[char97]'a' \{\
> . do tr \[char160]\~
> -.el \
> +. do tr \[u00A0]\~
> +.\}
> +.el \{\
> . do tr \[char65]\~
> +. do tr \[u0041]\~
> +.\}
> .
> .\" Set the hyphenation language to 'us'.
> .do hla us
>
I'm not sure I agree with this patch. It's preconv's job to produce valid
(GNU) troff _input_. It was not doing so.
The input sequence '\[u00A0]' is _syntactically_ valid...but like '\[uFFFF]'
and '\[u0000]', it's not _meaningful_, and should be warned about.
Here is the patch I have pending.
diff --git a/src/preproc/preconv/preconv.cpp
b/src/preproc/preconv/preconv.cpp
index 83feef8f7..b1027af17 100644
--- a/src/preproc/preconv/preconv.cpp
+++ b/src/preproc/preconv/preconv.cpp
@@ -404,9 +404,13 @@ unicode_entity(int u)
if (u < 0x80)
putchar(u);
else {
- // Handle soft hyphen specially -- it is an input character only,
- // not a glyph.
- if (u == 0xAD) {
+ // Handle no-break space and soft hyphen specially--they are input
+ // characters only, not glyphs. See groff_char(7).
+ if (u == 0xA0) {
+ putchar('\\');
+ putchar('~');
+ }
+ else if (u == 0xAD) {
putchar('\\');
putchar('%');
}
_______________________________________________________
Reply to this item at:
<https://savannah.gnu.org/bugs/?62300>
_______________________________________________
Message sent via Savannah
https://savannah.gnu.org/