[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[bug #59397] Assign default .hcode values to alphabetic characters in gr
From: |
Dave |
Subject: |
[bug #59397] Assign default .hcode values to alphabetic characters in groff's default character set |
Date: |
Mon, 2 Nov 2020 04:00:00 -0500 (EST) |
User-agent: |
Mozilla/5.0 (X11; Linux i686; rv:45.0) Gecko/20100101 Firefox/45.0 |
URL:
<https://savannah.gnu.org/bugs/?59397>
Summary: Assign default .hcode values to alphabetic
characters in groff's default character set
Project: GNU troff
Submitted by: barx
Submitted on: Mon 02 Nov 2020 02:59:58 AM CST
Category: Core
Severity: 1 - Wish
Item Group: New feature
Status: None
Privacy: Public
Assigned to: None
Open/Closed: Open
Discussion Lock: Any
Planned Release: None
_______________________________________________________
Details:
This is copied wholesale from two comments in bug #42870, both of which I
think I wrote, that are only tangentially related to the topic of that bug.
This is really a separate issue deserving of its own report.
== The problem ==
Groff's default input character set, Latin-1, does not align with its default
hyphenation codes, which are assigned only to ASCII alphabetic characters. By
default groff should assign hyphenation codes to all alphabetic characters in
the Latin-1 character set, to reflect the default input character set.
== Analysis ==
init_charset_table() in src/roff/troff/input.cpp appears to be what defines
the default hcode values, in particular the lines:
for (int i = 0; i < 256; i++) {
...
if (csalpha(i))
charset_table[i]->set_hyphenation_code(cmlower(i));
}
So the csalpha() call must be returning false for any characters that are
ISO-8859-1 (a.k.a. Latin-1) alphabetic characters but outside the ASCII
range.
Indeed, a peek into cset_init::cset_init() in src/libs/libgroff/cset.cpp
supports this:
for (int i = 0; i <= UCHAR_MAX; i++) {
csalpha.v[i] = ISASCII(i) && isalpha(i);
...
}
The isalpha() call is part of the C standard library's <ctype.h>. Its return
value depends on the current locale. In groff, which operates in the
ISO-8859-1 locale, it's undesirable for this function's behavior to change
based on the user's environment; it's for this reason, I presume, that the
additional test ISASCII() is imposed, to force non-ASCII characters to return
0 regardless of what isalpha() returns. And in the ASCII range, isalpha()
should function the same no matter the current locale.
But a more robust solution may be to call <ctype.h>'s isalpha_l() instead, so
that the ISO-8859-1 locale can be enforced. By doing this and removing the
ISASCII() test (from the csalpha.v[i] line and all the following lines setting
other attributes), the character attributes set in cset_init::cset_init()
would be accurate for all ISO-8859-1 characters, not just ASCII ones.
This could have implications beyond the hcode values, of course, and I confess
I'm not familiar enough with groff's internals to determine what they might
be.
_______________________________________________________
Reply to this item at:
<https://savannah.gnu.org/bugs/?59397>
_______________________________________________
Message sent via Savannah
https://savannah.gnu.org/
- [bug #59397] Assign default .hcode values to alphabetic characters in groff's default character set,
Dave <=