bug-gettext
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [bug-gettext] Emacs i18n


From: Juri Linkov
Subject: Re: [bug-gettext] Emacs i18n
Date: Wed, 20 Mar 2019 23:32:52 +0200
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/27.0.50 (x86_64-pc-linux-gnu)

> Richard Stallman wrote in
> <https://lists.gnu.org/archive/html/emacs-devel/2019-03/msg00328.html>:
>
>> I can envision something like this:
>>
>>       "russian-nom:%d байт%| скопирован%|, %s, %s"
>>
>> where the 'russian-nom' operator would replace the two %| sequences
>> with the appropriate declensional suffixes for the nominative case.
>
> It is, of course, tempting to try to do morphological analysis in an
> algorithmic way, based on our background as algorithm hackers. François
> Pinard and others considered this, back in 1995 when they started i18n in GNU.
>
> The reason this approach was not chosen is still valid today:
>
> When you design a translation system, you have two personas:
>   - the programmer,
>   - the translator.
>
> The translation system defines
>   1) which information flows from the programmer to the translator,
>      and in which format,
>   2) which information flows back from the translator to the programmer,
>      and in which format.
>
> And it has to cope with the assumed skills of these personas:
>
>   - The programmer, you can assume, can write and understand algorithms,
>     but does not master the grammar of more than one language (usually).
>
>   - The translator, you can assume, can translate sentences and knows
>     about the different meanings of words in different context. But they
>     cannot write nor understand algorithms. Many translators, in fact,
>     don't see the grammar as a set of rules.
>
> You may find some people on the intersection, such as a Russian hacker,
> but it is hard to find people with both skills for languages such as
> Vietnamese, Slovenian, or Basque. So, you better design the system in
> such a way that no person is assumed to have both skills.
>
> The challenge is to define these formats 1) and 2) in a way that
>
>   * Programmers can do their job with their skills (i.e. don't need to
>     understand Russian).
>
>   * Translators can do their job with their skills (i.e. don't need to
>     understand algorithms).
>
> In the gettext approach (where 1) are POT files and 2) are PO files) we
> added plural form handling, which is just a small morphological variation,
> and it required a significant amount of documentation and education for
> translators. I would say, it is on the limit what we can make translators
> grok.
>
> Now, when you give a translator a string
>
>    "russian-nom:%d байт%| скопирован%|, %s, %s"
>
> you need to think about the appropriate tooling that will make the
> translator understand
>   - what 'russian-nom' means,
>   - what the '|' characters mean,
>   - what the '%' characters mean.
> Either the translator tool should somehow highlight these characters
> and present on-line help, or it should present it as a sequence of
> strings to translate:
>
>   Rule: russian-nom
>   "%d байт"
>   " скопирован"
>   ", %s, %s"
>
> It is important to realize that each such case of morphological variation
> requires translator tooling support. And unfortunately different such tools
> exist, and every translator has their preferred one. For the plural form
> handling alone, it took several years until the main tools had support for
> it in their UI.

Indeed, a complete implementation of all Russian morphological rules
takes ~1600 lines of dense Perl code:

http://www.linkov.net/files/nlp/Lingua-RU-Inflect.pm

I can't imagine how to include all these rules to gettext.

But there is no need because gettext already strikes a decent balance
between complexity of natural languages and practical needs of
program internationalization where translators themselves decide
how words in messages should be inflected for different plural forms.

Currently we have more urgent tasks after the first step of adding
‘ngettext’ like in CLISP, the development stalled on the problem of
splitting messages into domains.

But maybe CLISP already provides a good way to map packages to gettext
domains?  Does it require every package to have a separate domain or
it collects translations from all packages into one domain?



reply via email to

[Prev in Thread] Current Thread [Next in Thread]