[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[bug #62830] [PATCH] [grops] support CJK fonts encoded in UTF16

From: G. Branden Robinson
Subject: [bug #62830] [PATCH] [grops] support CJK fonts encoded in UTF16
Date: Sat, 3 Dec 2022 19:23:26 -0500 (EST)

Follow-up Comment #5, bug #62830 (project groff):

Thank you for the update.

I don't think I am going to have time to properly consider this patch in the
depth it requires before the groff 1.23.0 release, which I am hoping to get at
least a release candidate out for before the end of the calendar year.  Only 5
open Savannah tickets remain for that.

I would however like to take a fresh look at this issue (and Russian
localization, bug #63076) early in the groff 1.24 development cycle.  How soon
that begins will depend on how many urgent bug reports we get against 1.23.0.

On the bright side you can expect relatively few changes to occur between now
and that might make your patches difficult to apply or maintain out of the

Here are some thoughts I have for when I can return to this work (or for
another groff developer to step up and consider discussing or addressing).

1. I was uncertain about the wisdom of shipping more font description files,
but it's not like there isn't a precedent; except for the FreeEuro font, we
don't ship _any_ fonts proper--just descriptions of fonts that the user must
obtain elsewhere.  So  shipping CSH, CSS, CTH, CTS, JPG, JPM, KOG, and KOM
font descriptions for the "dvi", "html", "ps", and "utf8" output devices is
not without precedent.

2. src/devices/grohtml/post-html.cpp:

2a. I wonder if defaulting to ASCII for the html output device is   necessary.
 Apparently UTF-8 is overwhelmingly the encoding used by most web pages in the
world.  [https://w3techs.com/technologies/details/en-utf8]

2b. The new `to_utf8_string` function might be better housed in libgroff or
libdriver, if in fact there is not already a suitable function present in one
of those libraries.  Another possibility is that there is some gnulib module
we could use here, and not have to carry our own implementation at all.

2c. I am uneasy with switching text styling properties (bold, italic) off
based on the groff font _name_ in use.  I think it might be better to have a
new font description file directive (see groff_font(5)) that tags a font as
being unstyled.  Any font with this property would cause the disablement of
bold and italic flags.

2d. Maybe the existing `to_unicode` function should be renamed; from the name
along, it's not obvious how it is distinct from `to_utf8_string`.

2e. The `-U` option seems like a good idea, and perhaps is a flag letter we
can re-use elsewhere in groff as we improve its Unicode support.

3. src/devices/grops/ps.cpp

3a. `is_utf16` should be renamed to reflect whether it uses UTF-16BE or

3b. I'm uneasy with the use of wchar_t.  I think maybe we want to use int32_t,
or if that can't be assumed to be available in C++98 (check this), then we
should have a type alias ("typedef" [sic]) and use an int, which must be at
least 32 bits on any GNU system.

3c. Again we're inferring properties from font names, it looks like:

+  const char *psname = f->get_internal_name();
+  if (psname && strstr(psname,"-UTF16-")) {

And again I think I'd prefer a font description file property to communicate
this information.

4. src/include/font.h, src/libs/libgroff/font.cpp

I wouldn't have a preprocessor-based feature gate like this "ENABLE_UCSRANGE"
macro.  I would enable the feature for all builds.  This will give it exercise
and help uncover bugs.

5. Thank you for the 'dvi' and 'ps' device smoke tests!  It might be necessary
to rewrite the UTF-8-encoded literals for CJK glyphs as octal escape sequences
to the printf(1) command for portability, sadly.  Surprising things go wrong
on *BSD and macOS systems.

I emphasize that I don't require any changes to be made at this time to
address the above points; they are for consideration and discussion by
developers (including the patch author!) before any revision occurs.  I simply
wanted to get these points down while they were fresh in my mind.


Reply to this item at:


Message sent via Savannah

reply via email to

[Prev in Thread] Current Thread [Next in Thread]