groff
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Questions concerning hyphenation patterns for non-Latin languages, e


From: Oliver Corff
Subject: Re: Questions concerning hyphenation patterns for non-Latin languages, e.g. Russian
Date: Wed, 26 Apr 2023 10:30:53 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.10.1

Hi Branden,

I'll take the route you suggest, i.e. install a 1.23.0 version where
I'll place the macros; but I'll have to postpone this until Saturday ---
so no earlier feedback possible.

Best regards,

Oliver.


On 26/04/2023 10:18, G. Branden Robinson wrote:
Hi Oliver,

At 2023-04-26T09:19:41+0200, Oliver Corff wrote:
thank you very much for the sharing your insight regarding groff
internals.
I wish they were deeper!  There is still plenty I have to learn.

I tried your demonstration, replacing the text file with my own file
(utf8-encoded Cyrillic), and I did not succeed to reproduce your
results.

I copied all Russian-related macros (ru.tmac, hyphen.ru and
koi8-ru.tmac) into my ../current/tmac directory (production system is
still 1.22.4), and running groff results in unusable output.
No, I wouldn't expect this to work.

The headline "Abstract" gets translated into Russian, but is displayed
in non-utf8 format. All utf8-text is ok. If I omit the -k option then
utf8-encoded text is unusable as well, but this is no surprise.
As noted in my previous mail, if you want hyphenation to work with
Russian, neither UTF-8 input (processed by preconv(1)) not Unicode code
points from the Cyrillic code block in their groff special character
escape form, like \[u0400], can be used.

Do I miss something from post-1.23.0 that enables the described magic?
Yes.  I refactored localization handling extensively to enable the
current approach.  As noted earlier in my compliment on your demo
document, I wanted to make it easy to change localizations an arbitrary
number of times within a document.

I worked on this stuff a while back.  In about January 2021 I made an
attempt, some of which I had to revert, and re-landed the work in its
current form around July of that year.  More work specifically on
hyphenation followed in early 2022.

Some relevant commit IDs, not including the must more recent Spanish and
Russian localization work (which slotted right in as I had hoped) are:

a86d9251ed05cec18f6279a9e613449ae7aa7315
a60784b82a5c53caff5443fc036b8d13f4084a32
7eb25c45b5ec67f1037abcc670793b734584987c
7c31d53f83888d88262075875b6ba5463dcfa5c5
2a36cf12b865be4c1df1c27139b1c58798cafb60
920fff1cf59d38bacd9b1b99b3d1ce3ce4e1e13f

I don't recall having to change anything in the formatter to enable
this work, so in principle you could replace an entire tmac directory
from a groff 1.22.4 installation with one from 1.23.0 (RC), but I can't
claim that as a supported configuration.  It's probably better just to
build and install groff 1.23.0.rc4, and _then_ add in the Russian
localization files.  If you're comfortable setting up chroots or virtual
machines, you might prefer to evaluate things that way.

Regards,
Branden

--
Dr. Oliver Corff
Wittelsbacherstr. 5A
10707 Berlin
GERMANY
Tel.: +49-30-85727260
mailto:oliver.corff@email.de




reply via email to

[Prev in Thread] Current Thread [Next in Thread]