groff
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Questions concerning hyphenation patterns for non-Latin languages, e


From: G. Branden Robinson
Subject: Re: Questions concerning hyphenation patterns for non-Latin languages, e.g. Russian
Date: Wed, 26 Apr 2023 03:18:20 -0500

Hi Oliver,

At 2023-04-26T09:19:41+0200, Oliver Corff wrote:
> thank you very much for the sharing your insight regarding groff
> internals.

I wish they were deeper!  There is still plenty I have to learn.

> I tried your demonstration, replacing the text file with my own file
> (utf8-encoded Cyrillic), and I did not succeed to reproduce your
> results.
> 
> I copied all Russian-related macros (ru.tmac, hyphen.ru and
> koi8-ru.tmac) into my ../current/tmac directory (production system is
> still 1.22.4), and running groff results in unusable output.

No, I wouldn't expect this to work.

> The headline "Abstract" gets translated into Russian, but is displayed
> in non-utf8 format. All utf8-text is ok. If I omit the -k option then
> utf8-encoded text is unusable as well, but this is no surprise.

As noted in my previous mail, if you want hyphenation to work with
Russian, neither UTF-8 input (processed by preconv(1)) not Unicode code
points from the Cyrillic code block in their groff special character
escape form, like \[u0400], can be used.

> Do I miss something from post-1.23.0 that enables the described magic?

Yes.  I refactored localization handling extensively to enable the
current approach.  As noted earlier in my compliment on your demo
document, I wanted to make it easy to change localizations an arbitrary
number of times within a document.

I worked on this stuff a while back.  In about January 2021 I made an
attempt, some of which I had to revert, and re-landed the work in its
current form around July of that year.  More work specifically on
hyphenation followed in early 2022.

Some relevant commit IDs, not including the must more recent Spanish and
Russian localization work (which slotted right in as I had hoped) are:

a86d9251ed05cec18f6279a9e613449ae7aa7315
a60784b82a5c53caff5443fc036b8d13f4084a32
7eb25c45b5ec67f1037abcc670793b734584987c
7c31d53f83888d88262075875b6ba5463dcfa5c5
2a36cf12b865be4c1df1c27139b1c58798cafb60
920fff1cf59d38bacd9b1b99b3d1ce3ce4e1e13f

I don't recall having to change anything in the formatter to enable
this work, so in principle you could replace an entire tmac directory
from a groff 1.22.4 installation with one from 1.23.0 (RC), but I can't
claim that as a supported configuration.  It's probably better just to
build and install groff 1.23.0.rc4, and _then_ add in the Russian
localization files.  If you're comfortable setting up chroots or virtual
machines, you might prefer to evaluate things that way.

Regards,
Branden

Attachment: signature.asc
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]