[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Questions concerning hyphenation patterns for non-Latin languages, e

From: Oliver Corff
Subject: Re: Questions concerning hyphenation patterns for non-Latin languages, e.g. Russian
Date: Wed, 26 Apr 2023 10:30:53 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.10.1

Hi Branden,

I'll take the route you suggest, i.e. install a 1.23.0 version where
I'll place the macros; but I'll have to postpone this until Saturday ---
so no earlier feedback possible.

Best regards,


On 26/04/2023 10:18, G. Branden Robinson wrote:
Hi Oliver,

At 2023-04-26T09:19:41+0200, Oliver Corff wrote:
thank you very much for the sharing your insight regarding groff
I wish they were deeper!  There is still plenty I have to learn.

I tried your demonstration, replacing the text file with my own file
(utf8-encoded Cyrillic), and I did not succeed to reproduce your

I copied all Russian-related macros (ru.tmac, and
koi8-ru.tmac) into my ../current/tmac directory (production system is
still 1.22.4), and running groff results in unusable output.
No, I wouldn't expect this to work.

The headline "Abstract" gets translated into Russian, but is displayed
in non-utf8 format. All utf8-text is ok. If I omit the -k option then
utf8-encoded text is unusable as well, but this is no surprise.
As noted in my previous mail, if you want hyphenation to work with
Russian, neither UTF-8 input (processed by preconv(1)) not Unicode code
points from the Cyrillic code block in their groff special character
escape form, like \[u0400], can be used.

Do I miss something from post-1.23.0 that enables the described magic?
Yes.  I refactored localization handling extensively to enable the
current approach.  As noted earlier in my compliment on your demo
document, I wanted to make it easy to change localizations an arbitrary
number of times within a document.

I worked on this stuff a while back.  In about January 2021 I made an
attempt, some of which I had to revert, and re-landed the work in its
current form around July of that year.  More work specifically on
hyphenation followed in early 2022.

Some relevant commit IDs, not including the must more recent Spanish and
Russian localization work (which slotted right in as I had hoped) are:


I don't recall having to change anything in the formatter to enable
this work, so in principle you could replace an entire tmac directory
from a groff 1.22.4 installation with one from 1.23.0 (RC), but I can't
claim that as a supported configuration.  It's probably better just to
build and install groff 1.23.0.rc4, and _then_ add in the Russian
localization files.  If you're comfortable setting up chroots or virtual
machines, you might prefer to evaluate things that way.


Dr. Oliver Corff
Wittelsbacherstr. 5A
10707 Berlin
Tel.: +49-30-85727260

reply via email to

[Prev in Thread] Current Thread [Next in Thread]