[Koha-devel] Sharing experience with utf-8

Here is what we had to do to use koha in utf-8 hoping that it helps in some of your discussions:

1- We are using koha since 2.2.0 now at 2.2.2b

2- We use English for intranet and English-Turkish for opac

3- The platform is Windows

4- We changed the character set of the database to utf-8 with the iso-xxxx data in it. No problem for Mysql as you are moving up the ladder. No need to reload the data (10 min)

5- Changed all the charset=iso-xxxx in the templates to read utf-8 and saved the files as utf-8 (15 min.) in a simple text editor.

6- Character decode in biblio for MARC21 is very ambigious for us because it is not very clear which character encoding it is changing from. All the marc records we bulkimport are MARC-8 , iso2709 or ANSEL or whatever you want to call them. So we simply wrote a one to one character mapping of MARC-8 to utf-8 for our Turkish accented characters. Here it is:

#Additional Turkish characters

s/(\xf0)s/ş/gm;

s/(\xf0)S/Ş/gm;

s/(\xf0)c/ç/gm;

s/(\xf0)C/Ç/gm;

s/\xe7\x49/İ/gm;

s/(\xe6)G/Ğ/gm;

s/(\xe6)g/ğ/gm;

s/\xB8/ı/gm;

s/\xB9/£/gm;

s/(\xe8|\xc8)o/ö/gm ;

s/(\xe8|\xc8)O/Ö/gm ;

s/(\xe8|\xc8)u/ü/gm ;

s/(\xe8|\xc8)U/Ü/gm ;

s/\xc2\xb8/ı/gm;

All the character codes are directly from LC’s website about MARC21. Since we provided the actual characters rather than their codes we saved the biblio.pm as utf8 to save time. (Half a day together with research)

7- We have a full working koha as utf8 supporting all characters and we keep doing the same thing everytime we get an update.

8- Translation of opac files through .po files do not work for us. As we see it, this po translator is simply a search and replace text engine. So it converts the string ‘ English English <somevariable> English.’ to ‘ Turkish Turkish <somevariable> Turkish’. Which is useless as it should be ‘Turkish <somevariable> Turkish Turkish’.

9- So we sat down and translated the opac templates to proper Turkish. It is now more easier for our people to follow the changes in cvs and implement the changes to templates rather than doing complete translations every time.

10- The whole update up till now is taking less than half a day with one person doing it.

11- We as Windows people do not have much experience with this po editor. But as far as I know it supports utf8 so whats the hassle about these translations? As far as we understand it the official language of KOHA is English and if someone is translating it to some other language it is their responsibility to find the resources to translate it in time to be implemented as an additional language. Even if this requires a complete rewrite of some templates.

12- Finally we believe that koha should start using utf8 ASAP before the move to zebra to gain experience. If zebra is implemented with all this iso stuff we will have more problems with each translation requiring a different character set and sort order set and character mapping to set etc. etc.

Koha is more powerful with more features, stability and performance and I believe people will be more happy to see improvement in these even if they have to spend a little bit more resource on their own translations.

With no prejudice,

Tumer Garip

Near East Univ. Library Director

Cyprus

address@hidden

From:	Tumer Garip
Subject:	[Koha-devel] Sharing experience with utf-8
Date:	Tue Aug 23 16:26:09 2005