Re: [bug-gettext] Plural rule definitions

2015-05-19 11:12 GMT+02:00 Daiki Ueno <address@hidden>:

Hi Michele,

Sorry for late response. I didn't forget it, but was thinking about
what is the best way to adopt CLDR in gettext. Currently we are doing:

0. Mention it in the documentation and guide users to the generated
plural rules. I'll do that really soon, before the next release.

1. Update plural-table.c, so a new PO file created with msginit will
have a usable "Plural-Forms" header.

I think it would be nice if the step 1 is semi-automated somewhere in
gettext, at least in the release procedure. In order to that, the diff
against the previous plural-table.c should be minimal, so that people
can review the changes easily. Also, gettext could ship with a helper
program of msginit (like "urlget"), that retrieves the latest CLDR data
if plural-table.c doesn't have a definition.

That would be great. But we have a problem here: CLDR data defines the plural rules for integers and for floats.

gettext only works with unsigned integers.

So, the process to translate the plural rules is not so simple.

For example, in Czech the CLDR defines the plural rules "one", "few", "many", "other": that would lead to nplurals = 4.

(see http://www.unicode.org/cldr/charts/latest/supplemental/language_plural_rules.html#cs)

But the "many" category is never used in gettext (because it's only for floats - v != 0), so we have nplurals = 3

(see http://unicode.org/reports/tr35/tr35-numbers.html#Operands )

That's why I created https://github.com/mlocati/cldr-to-gettext-plural-rules

In the "gettext" branch of that repo I just added a test exporter that can be used to automatically generate the plural rules for gettext.

IMHO it can be used to statically generate the rules for all the languages (simply call "bin/export.sh gettext"), so that they can be included statically in gettext (making the "urlget" approach useless).

But I agree, it's quite big change, and you/other reviewers could have to spend some time on it.

Michele Locati <address@hidden> writes:

> Yes, that would lead to a more complete languages table. I can easily
> add a new option to automatically generate the plural_table.
> BTW, do you think it could be possible to add more infos to that
> table? I mean, currently gettext offers the number of plurals and the
> formula to distinguish between them, but the only way to know the
> meaning of the different plural cases is to inspect the formula.
> What about adding the CLDR names of the cases and their relative
> examples? I think that it could help many people if the gettext
> headers could be extended to something like this:
> "Language: ar\n"
> "Plural-Forms: nplurals=6; plural=(n == 0) ? 0 : ((n == 1) ? 1 : ((n
> == 2) ? 2 : ((n % 100 >= 3 && n % 100 <= 10) ? 3 : ((n % 100 >= 11 &&
> n % 100 <= 99) ? 4 : 5))));\n"
> "Plural-Case-0: name=zero; examples=0;\n"
> "Plural-Case-1: name=one; examples=1;\n"
> "Plural-Case-2: name=two; examples=2;\n"
> "Plural-Case-3: name=few; examples=3~10, 103~110, 1003, …;\n"
> "Plural-Case-4: name=many; examples=11~26, 111, 1011, …;\n"
> "Plural-Case-5: name=other; examples=100~102, 200~202, 300~302,
> 400~402, 500~502, 600, 1000, 10000, 100000, 1000000, …;\n"

I think it's a good idea, but the format looks a bit too verbose.
Perhaps normal comment lines before the header entry might be
sufficient? Something like:

# There are 6 different plural forms in this language:
#
# ・0
# ・1
# ・2
# ・3~10, 103~110, 1003, …
# ・11~26, 111, 1011, …
# ・100~102, 200~202, 300~302, 400~402, 500~502, 600, 1000, 10000,
# 100000, 1000000, …
#
# For more details see <http://www.unicode.org/cldr/charts/latest/supplemental/language_plural_rules.html#ar>.

msgid ""
msgstr ""
...
""

The approach that I proposed was meant to help programs like poedit.

Having a standardized/structured way to represent plural forms (with a representative name like "one"/"few"/"many" and some example) can be helpful in such cases...

Michele

From:	Michele Locati
Subject:	Re: [bug-gettext] Plural rule definitions
Date:	Tue, 19 May 2015 11:59:35 +0200