[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[bug #62830] [PATCH] [grops] support CJK fonts encoded in UTF16
From: |
TANAKA Takuji |
Subject: |
[bug #62830] [PATCH] [grops] support CJK fonts encoded in UTF16 |
Date: |
Sat, 15 Apr 2023 03:18:21 -0400 (EDT) |
Follow-up Comment #7, bug #62830 (project groff):
I updated my patch.
1. Font description files
There is a precedent of font description file for Japanese support of groff by
Japanese developers (Fumitoshi UKAI et al.)
https://answers.launchpad.net/ubuntu/+source/groff/1.18.1.1-12
They defined font description named "M", "G" for Japanese.
M : Japanese Mincho style
G : Japanese Gothic style
"M", "G" are possible candidates.
But I wonder if Chinese/Korean people might feel uncomfortable.
It is the reason that I proposed font description "JPM", "JPG" and CK fonts.
2. src/devices/grohtml/post-html.cpp:
2a & 2e. Encoding US-ASCII or UTF-8, -U option
I tried three step of option setting:
-U0 : US-ASCII : use named character references or numerical character
references
-U1 : UTF-8 (partial) : use named character references for known characters,
UTF-8 literals for unknown characters (default)
-U2 : UTF-8 (full) : use UTF-8 literals
2b. `to_utf8_string`.
I have moved it to libgroff/font.cpp for trial.
2c. switching text styling properties.
I have removed the function from my patch.
2d. `to_unicode`.
I have renamed it to_unicode() to to_numerical_char_ref().
3. src/devices/grops/ps.cpp
3a. I have renamed is_utf16 to is_utf16be
3b. I have replaced wchar_t by uint16_t.
3c. postscript name and encoding.
For CJK fonts, encoding is always explicitly shown in PostScript font name
by the structure of (Specific font name)-(style)(-(character
set))-(encoding)-(direction).
For example:
/Ryumin-Light-Identity-H
/Ryumin-Light-UniJIS-UTF16-H
/Ryumin-Light-UniJIS-UTF8-H
/Ryumin-Light-EUC-H
/Ryumin-Light-RKSJ-H
/GothicBBB-Medium-Identity-H
/GothicBBB-Medium-UniJIS-UTF16-H
/GothicBBB-Medium-UniJIS-UTF8-H
/GothicBBB-Medium-EUC-H
/GothicBBB-Medium-RKSJ-H
This is a sample PostScript file:
https://github.com/t-tk/PostScript-CJK-samples/blob/master/box-multi.eps
Therefore, I think it is reasonable to get encoding information from
PostScript font names.
I guess most of PostScript interpreters do so.
4. src/include/font.h, src/libs/libgroff/font.cpp
I removed "ENABLE_UCSRANGE" macro from my patch.
5. smoke tests.
I replaced UTF-8 literal by octal code expression.
(file #54631)
_______________________________________________________
Additional Item Attachment:
File name: cjk-ps-html_20230415.patch Size:86 KB
<https://file.savannah.gnu.org/file/cjk-ps-html_20230415.patch?file_id=54631>
_______________________________________________________
Reply to this item at:
<https://savannah.gnu.org/bugs/?62830>
_______________________________________________
Message sent via Savannah
https://savannah.gnu.org/
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- [bug #62830] [PATCH] [grops] support CJK fonts encoded in UTF16,
TANAKA Takuji <=