koha-translate
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Koha-translate] Language, Script, Country, Encoding - an Explanation


From: Dorian Meid
Subject: [Koha-translate] Language, Script, Country, Encoding - an Explanation
Date: Sun, 20 Jan 2008 15:53:04 +0100

I recognize slight uncertainties when submitting the metadata for your translations. So I wanted to explain the basics a little. Koha uses RFC4646 http://rfc.net/rfc4646.html for language identification. It states, that a language is identified by several tags, separated by a hyphen:

Language tag - Script tag - Region/Country tag

The language tag is written in lowercase, the script tag is written in lowercase with the first letter in uppercase and the region or country tag is written in uppercase.
Example: zh-Hans-CN
zh is Chinese, Hans is the simplified Chinese script and CN is China.

The language is how you speak or what word you use to name a thing.
The language tags are standardised in ISO 639-1 or ISO 639-2 http:// www.loc.gov/standards/iso639-2/php/code_list.php As we are in a library environment it may be useful to mention the difference between ISO 639-2/T and ISO 639-2/B. T refers the terminology code and B refers the bibliographic code, e.g. german has the tag "deu" in ISO 639-2/T and "ger" in ISO 639-2/B. The reason for this inconvenience is that some libraries assigned some tags for languages (the B-tags) before the ISO (T) standardisation was made. The T and B differences are only in the three-letter tags of ISO 639-2. So far we use the two-letter tags of ISO 639-1, but RFC4646 allows also 639-2.

The script is how your characters look like or what you paint to produce a specific sound. The script tags are standardised in ISO 15942 http://www.unicode.org/ iso15924/codelists.html You have to add the script tag if your language can be written in more than one script, e.g. Hans for simplified Chinese or Hant for traditional chinese, or if the specified language is not written in the normal script e.g. de-Latf-DE for German in Fraktur. You should, but don't have to omit the script tag if there is only one commonly used script for your language.

The region or country is where the language is spoken, this is important because there often are differences between countries, which basically share the same language, e.g. British English and American English. The region/country tag is either a two letter Country code as sandardised in ISO 3166-1 http://www.iso.org/iso/country_codes/ iso_3166_code_lists.htm or a three digit Region code as standardised in UN M.49 http://unstats.un.org/unsd/methods/m49/m49.htm Normally we use the ISO letter code, but the UN region code can be handy when specifying a language spoken in more than one country, e.g. es-005 (Spanish as spoken in South America).

When given a script tag we know how your script should look like, but computers are dumb. They don't know written characters, the just know bytes. The assignment of written (visual) characters to byte values is called character encoding. There are many different character encodings and to make it even worse there are some scripts, which can be successfully encoded in different ways. Normal character encodings are capable of assigning 128 or 256 characters. Unicode is capable of several billions of characters and can encode all used scripts, so it is the preffered choice for Koha themes and translations. So please use UTF-8 for your document character encoding http:// www.unicode.org/standard/WhatIsUnicode.html If you can't use UTF-8 or don't know how to use it please ask the list or at least specifiy the encoding you are using, so we can transcode your document.

Hope that helps.
Maybe this should be added to the readme on translate or the wiki.

Dorian Meid








reply via email to

[Prev in Thread] Current Thread [Next in Thread]