koha-zebra
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Koha-zebra] Re: Unimarc, marc21, Unicode, and MARC::File::XML


From: Adam Dickmeiss
Subject: Re: [Koha-zebra] Re: Unimarc, marc21, Unicode, and MARC::File::XML
Date: Tue, 21 Mar 2006 22:54:37 +0100
User-agent: Mozilla/5.0 (X11; U; Linux i686; da-DK; rv:1.7.12) Gecko/20060205 Debian/1.7.12-1.1

Tümer Garip wrote:
I thought I explained it but here it is again:

I do not think which method you use is relevant here but but just try
this:

In the release version ZEBRA test/usmarc folder change the zebra.cfg to
read
recordType: grs.xml
in the tabs folder change marc21.abs to read record.abs Use zebraidx to create the database with the single XML record I sent to
you.
Start the zebrasrv at the required port.
Use yaz-client
f @attr 1=1016 book
format xml
show

I see the xml record header saying
<?xml version="1.0" encoding="ISO-8859-1" standalone="yes"?>

Further down you'll see utf-8 characters of correct hex as
\XC5\X9F

Now stop  the server.
Add line encoding:utf-8 to your zebra.cfg
Restart the server
Do the same search you get
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>

Conclusion:
The database does keep the data in UTF-8 as expected.
Server does not know about database character set or the xml record taht
was parsed in and unless specificly set to UTF-8 in Zebra.cfg srever
goes ahead and changes the header or in fact it produces itself a header
saying iso-8859-1 while giving out utf-8 characters.

Correct. I was unable to reproduce this fault.. becauase my XML test record was able to be represented in UNICODE/UTF-8. Your sample is NOT

Converion from UTF-8 to ISO-8859-1 fails in Zebra.. And in this case, Zebra keeps data as is, but unfortunately alters the header anyway. That's the mistake. Better behavior would probably be for Zebra to not return the data at all, but return a surrogate diagnostic for the record ..

As you say, Zebra can be forced to use utf-8 in retrieval phase in the configuration. You can also specify utf-8 via the Z39.50 protocol .. (charset utf-8 in yaz-client).. and you should be able to achieve the same with ZOOM-Perl.

For Zebra 1.3 we kept Latin-1 as defualt character set because of a number of installations using that.. For Zebra 1.4 default is UTF-8.. so there should not be a problem with that - in this case.

/ Adam


I did not ask any help on this thanks. Just clearing some issues with
Paul's problem.
Tumer
-----Original Message-----
From: Adam Dickmeiss [mailto:address@hidden Sent: Tuesday, March 21, 2006 9:58 PM
To: Tümer Garip
Cc: address@hidden
Subject: Re: [Koha-zebra] Re: Unimarc, marc21, Unicode, and
MARC::File::XML


Tümer Garip wrote:

Hi Adam,
You seem a bit offended that was not my intention, just frustation sometimes makes me use harsh words and translanting them to english may be too harsh.

I do not need to send you any config+examples cause I tested this with


your default config files. I am attaching an xml record in utf-8

If you're to receive help from me you need to to tell me which zebra.cfg

you're using. And show me the record + the way you indexed it (zebraidx update ?)

Briefly I had default configuration files and build zebra with xml records. When I noticed the problem I used yaz-client to see what was going on. On my log I could see data going in the zebra was with encoding utf-8 While yaz client was returning xml with headers saying iso-8859-1 while I could actually see the utf-8 characters as they show as hex in yaz client.

I also need to know what you see? And you you'd expect to see.

/ Adam


I have retried this procedures just now and it seems the same. Just adding encoding:UTF-8 to zebra.cfg and restarting the server you get correct heading and correct data. Please note that server has to be restarted but zebradb does not have to be rebuilt.

Thanks
Tumer

-----Original Message-----
From: Adam Dickmeiss [mailto:address@hidden
Sent: Tuesday, March 21, 2006 9:00 PM
To: Tümer Garip
Cc: address@hidden; address@hidden
Subject: Re: [Koha-zebra] Re: Unimarc, marc21, Unicode, and
MARC::File::XML


Tümer Garip wrote:


Hi,

This problem if I understood it correctly has got nothing to do with
mysql or perl it has to do with ZEBRA unless it is to do with UNIMARC which I am not familiar with. As you know (Paul) I have an utf-8 version working.

I had the same problem from records coming from zebra and found out
that it is not doing what it is supposed to do unless you explicitly set it to utf-8. You have to explicitly put "encoding utf-8" in all your zebra config files especially the zebra.cfg and your .abs . Otherwise unlike the documentation saying that zebra character code is


automatically set by the xml encoding it DOES NOT.

I can't reproduce this (bug). Care to share a a config+example that
illustrates this (Inserts an XML record from Perl in UTF-8) ?



Perl send xml to zebra with encoding utf-8 on the header and utf-8
data in it. Zebra saves all the data in utf-8 but returns an xml saying encoding iso8859-1 at the header and utf-8 characters in data. No module can correct this as it is stupid.

Just need to know when the stupidity starts:-)

/ Adam



I corrected the problem by adding encoding:UTF-8 in zebra.cfg,
record.abs, sort-string.chr

Hope it solves yours,

Tumer



_______________________________________________
Koha-zebra mailing list
address@hidden
http://lists.nongnu.org/mailman/listinfo/koha-zebra





_______________________________________________
Koha-zebra mailing list
address@hidden http://lists.nongnu.org/mailman/listinfo/koha-zebra





_______________________________________________
Koha-zebra mailing list
address@hidden
http://lists.nongnu.org/mailman/listinfo/koha-zebra






reply via email to

[Prev in Thread] Current Thread [Next in Thread]