[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Questions concerning hyphenation patterns for non-Latin languages, e

From: G. Branden Robinson
Subject: Re: Questions concerning hyphenation patterns for non-Latin languages, e.g. Russian
Date: Wed, 26 Apr 2023 03:18:20 -0500

Hi Oliver,

At 2023-04-26T09:19:41+0200, Oliver Corff wrote:
> thank you very much for the sharing your insight regarding groff
> internals.

I wish they were deeper!  There is still plenty I have to learn.

> I tried your demonstration, replacing the text file with my own file
> (utf8-encoded Cyrillic), and I did not succeed to reproduce your
> results.
> I copied all Russian-related macros (ru.tmac, and
> koi8-ru.tmac) into my ../current/tmac directory (production system is
> still 1.22.4), and running groff results in unusable output.

No, I wouldn't expect this to work.

> The headline "Abstract" gets translated into Russian, but is displayed
> in non-utf8 format. All utf8-text is ok. If I omit the -k option then
> utf8-encoded text is unusable as well, but this is no surprise.

As noted in my previous mail, if you want hyphenation to work with
Russian, neither UTF-8 input (processed by preconv(1)) not Unicode code
points from the Cyrillic code block in their groff special character
escape form, like \[u0400], can be used.

> Do I miss something from post-1.23.0 that enables the described magic?

Yes.  I refactored localization handling extensively to enable the
current approach.  As noted earlier in my compliment on your demo
document, I wanted to make it easy to change localizations an arbitrary
number of times within a document.

I worked on this stuff a while back.  In about January 2021 I made an
attempt, some of which I had to revert, and re-landed the work in its
current form around July of that year.  More work specifically on
hyphenation followed in early 2022.

Some relevant commit IDs, not including the must more recent Spanish and
Russian localization work (which slotted right in as I had hoped) are:


I don't recall having to change anything in the formatter to enable
this work, so in principle you could replace an entire tmac directory
from a groff 1.22.4 installation with one from 1.23.0 (RC), but I can't
claim that as a supported configuration.  It's probably better just to
build and install groff 1.23.0.rc4, and _then_ add in the Russian
localization files.  If you're comfortable setting up chroots or virtual
machines, you might prefer to evaluate things that way.


Attachment: signature.asc
Description: PGP signature

reply via email to

[Prev in Thread] Current Thread [Next in Thread]