[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[bug #58962] Latin-1 NO-BREAK SPACE does not behave as documented
From: |
G. Branden Robinson |
Subject: |
[bug #58962] Latin-1 NO-BREAK SPACE does not behave as documented |
Date: |
Wed, 13 Apr 2022 21:42:31 -0400 (EDT) |
Update of bug #58962 (project groff):
Status: None => In Progress
Assigned to: None => gbranden
_______________________________________________________
Follow-up Comment #5:
Hi Dave,
I believe I've cracked this.
$ xxd EXPERIMENTS/dave-58962.roff
00000000: 2e69 6620 27a0 275c 7e27 202e 746d 2069 .if '.'\~' .tm i
00000010: 6e70 7574 2030 7841 3020 6d61 7463 6865 nput 0xA0 matche
00000020: 7320 5c5c 7e0a 2e69 6620 27ad 275c 2527 s \\~..if '.'\%'
00000030: 202e 746d 2069 6e70 7574 2030 7841 4420 .tm input 0xAD
00000040: 6d61 7463 6865 7320 5c5c 250a matches \\%.
$ ./build/troff -F ./build/font -F ./font -M ./build/tmac -M ./tmac
./EXPERIMENTS/dave-58962.roff
input 0xA0 matches \~
input 0xAD matches \%
$ ./build/troff -F ./build/font -F ./font -T utf8 -M ./build/tmac -M ./tmac
./EXPERIMENTS/dave-58962.roff
input 0xA0 matches \~
input 0xAD matches \%
(Not like the output device should really matter.)
It seems like these cases just weren't ever dealt with in the formatter's
input parser. Maybe there was some dithering because the input encoding could
be either ISO or EBCDIC.
Here's the patch.
$ git diff
diff --git a/src/roff/troff/input.cpp b/src/roff/troff/input.cpp
index 36822033a..015c17a87 100644
--- a/src/roff/troff/input.cpp
+++ b/src/roff/troff/input.cpp
@@ -1,4 +1,4 @@
-/* Copyright (C) 1989-2020 Free Software Foundation, Inc.
+/* Copyright (C) 1989-2022 Free Software Foundation, Inc.
Written by James Clark (jjc@jclark.com)
This file is part of groff.
@@ -1743,6 +1743,29 @@ void token::next()
int cc = input_stack::get(&n);
if (cc != escape_char || escape_char == 0) {
handle_normal_char:
+ // Handle no-break space and soft hyphen.
+ if (0x41 == 'A') { // ASCII/ISO 8859/Unicode
+ if (0xA0 == cc) {
+ type = TOKEN_STRETCHABLE_SPACE;
+ return;
+ }
+ else if (0xAD == cc) {
+ type = TOKEN_HYPHEN_INDICATOR;
+ return;
+ }
+ }
+ else if (0xC1 == 'A') { // code page 1047 (EBCDIC)
+ if (0x41 == cc) {
+ type = TOKEN_STRETCHABLE_SPACE;
+ return;
+ }
+ else if (0xCA == cc) {
+ type = TOKEN_HYPHEN_INDICATOR;
+ return;
+ }
+ }
+ else
+ fatal("unrecognized input character encoding");
switch(cc) {
case PUSH_GROFF_MODE:
input_stack::save_compatible_flag(compatible_flag);
_______________________________________________________
Reply to this item at:
<https://savannah.gnu.org/bugs/?58962>
_______________________________________________
Message sent via Savannah
https://savannah.gnu.org/
- [bug #58962] Latin-1 NO-BREAK SPACE does not behave as documented, Dave, 2022/04/13
- [bug #58962] Latin-1 NO-BREAK SPACE does not behave as documented,
G. Branden Robinson <=
- [bug #58962] Latin-1 NO-BREAK SPACE does not behave as documented, Dave, 2022/04/13
- [bug #58962] Latin-1 NO-BREAK SPACE does not behave as documented, G. Branden Robinson, 2022/04/13
- [bug #58962] Latin-1 NO-BREAK SPACE does not behave as documented, Dave, 2022/04/14
- [bug #58962] Latin-1 NO-BREAK SPACE does not behave as documented, G. Branden Robinson, 2022/04/14
- [bug #58962] Latin-1 NO-BREAK SPACE does not behave as documented, G. Branden Robinson, 2022/04/16
- [bug #58962] Latin-1 NO-BREAK SPACE does not behave as documented, Dave, 2022/04/16
- [bug #58962] Latin-1 NO-BREAK SPACE does not behave as documented, Dave, 2022/04/16