[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[bug #62300] [preconv] does not handle U+00A0 (NBSP) as it should
From: |
G. Branden Robinson |
Subject: |
[bug #62300] [preconv] does not handle U+00A0 (NBSP) as it should |
Date: |
Mon, 11 Apr 2022 23:14:54 -0400 (EDT) |
URL:
<https://savannah.gnu.org/bugs/?62300>
Summary: [preconv] does not handle U+00A0 (NBSP) as it should
Project: GNU troff
Submitted by: gbranden
Submitted on: Tue 12 Apr 2022 03:14:52 AM UTC
Category: Preprocessor preconv
Severity: 3 - Normal
Item Group: Incorrect behaviour
Status: In Progress
Privacy: Public
Assigned to: gbranden
Open/Closed: Open
Discussion Lock: Any
Planned Release: None
_______________________________________________________
Details:
preconv handles the soft hyphen by translating it into an appropriate escape
sequence (\%), but does not do the same for the no-break space. groff_char(7)
has long defined the semantics in these as input code points (for ISO
character encodings).
$ cat whaaa.man
.TH ISO_8859-2 7 2014-10-02 "Linux" "Linux Programmer's Manual"
.TS
l l l c lp-1.
240 160 A0 NO-BREAK SPACE
255 173 AD SOFT HYPHEN
.TE
$ xxd whaaa.man
00000000: 2e54 4820 4953 4f5f 3838 3539 2d32 2037 .TH ISO_8859-2 7
00000010: 2032 3031 342d 3130 2d30 3220 224c 696e 2014-10-02 "Lin
00000020: 7578 2220 224c 696e 7578 2050 726f 6772 ux" "Linux Progr
00000030: 616d 6d65 7227 7320 4d61 6e75 616c 220a ammer's Manual".
00000040: 2e54 530a 6c20 6c20 6c20 6320 6c70 2d31 .TS.l l l c lp-1
00000050: 2e0a 3234 3009 3136 3009 4130 09c2 a009 ..240.160.A0....
00000060: 4e4f 2d42 5245 414b 2053 5041 4345 0a32 NO-BREAK SPACE.2
00000070: 3535 0931 3733 0941 4409 c2ad 0953 4f46 55.173.AD....SOF
00000080: 5420 4859 5048 454e 0a2e 5445 0a T HYPHEN..TE.
$ groff -t -kz -man whaaa.man # groff 1.22.4
troff: whaaa.man:4: warning: can't find special character 'u00A0'
$ ./build/test-groff -ww -t -kz -man whaaa.man $ groff Git HEAD
troff:whaaa.man:4: warning: can't find special character 'u00A0'
$ preconv whaaa.man # groff 1.22.4 and Git HEAD
.lf 1 whaaa.man
.TH ISO_8859-2 7 2014-10-02 "Linux" "Linux Programmer's Manual"
.TS
l l l c lp-1.
240 160 A0 \[u00A0] NO-BREAK SPACE
255 173 AD \% SOFT HYPHEN
.TE
preconv should put \~ on the output as documented in groff_char(7) even in
groff 1.22.4.
160 the ISO latin1 no‐break space is mapped to ‘\~’, the
stretchable space character.
173 the soft hyphen control character. groff never uses
this character for output (thus it is omitted in the
table below); the input character 173 is mapped onto
‘\%’.
This remapping should occur because the diagnostic itself is not the problem;
there are many Unicode code points that are not valid groff input; expressing
them as special character escape sequences does not change that fact.
Working on this.
_______________________________________________________
Reply to this item at:
<https://savannah.gnu.org/bugs/?62300>
_______________________________________________
Message sent via Savannah
https://savannah.gnu.org/
- [bug #62300] [preconv] does not handle U+00A0 (NBSP) as it should,
G. Branden Robinson <=