bug-groff
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[bug #42870] `.hcode' and `.hw' are limited to raw 8bit characters but s


From: Dave
Subject: [bug #42870] `.hcode' and `.hw' are limited to raw 8bit characters but should accept any characters entities.
Date: Wed, 20 Mar 2019 12:28:47 -0400 (EDT)
User-agent: Mozilla/5.0 (X11; Linux i686; rv:45.0) Gecko/20100101 Firefox/45.0

Follow-up Comment #3, bug #42870 (project groff):

To address the point raised in comment #2, init_charset_table() in
src/roff/troff/input.cpp appears to be what defines the default hcode values,
in particular the lines:

  for (int i = 0; i < 256; i++) {
...
    if (csalpha(i))
      charset_table[i]->set_hyphenation_code(cmlower(i));
  }


So the csalpha() call must be returning false for any characters that are
ISO-8859-1 (a.k.a. Latin-1) alphabetic characters but outside the ASCII
range.

Indeed, a peek into cset_init::cset_init() in src/libs/libgroff/cset.cpp
supports this:

  for (int i = 0; i <= UCHAR_MAX; i++) {
    csalpha.v[i] = ISASCII(i) && isalpha(i);
...
  }

The isalpha() call is part of the C standard library's <ctype.h>.  Its return
value depends on the current locale.  In groff, which lives in the ISO-8859-1
locale, it's undesirable for this function's behavior to change based on the
user's environment; it's for this reason, I presume, that the additional test
ISASCII() is imposed, to force non-ASCII characters to return 0 regardless of
what isalpha() returns.  And in the ASCII range, isalpha() should function the
same no matter the current locale.

But a more robust solution may be to call <ctype.h>'s isalpha_l() instead, so
that the ISO-8859-1 locale can be enforced.  By doing this and removing the
ISASCII() test (from the csalpha.v[i] line and all the following lines setting
other attributes), the character attributes set in cset_init::cset_init()
would be accurate for all ISO-8859-1 characters, not just ASCII ones.

This could have implications beyond the hcode values, of course, and I confess
I'm not familiar enough with groff's internals to determine what they might
be.

    _______________________________________________________

Reply to this item at:

  <https://savannah.gnu.org/bugs/?42870>

_______________________________________________
  Message sent via Savannah
  https://savannah.gnu.org/




reply via email to

[Prev in Thread] Current Thread [Next in Thread]