[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Koha-devel] utf-8, probable solution
From: |
Paul POULAIN |
Subject: |
[Koha-devel] utf-8, probable solution |
Date: |
Wed, 15 Feb 2006 16:08:32 +0100 |
User-agent: |
Mozilla Thunderbird 1.0.6-7.2.20060mdk (X11/20050322) |
Thanks to Heikki Levanto, Tümer Garip & Mike Rylander, you pointed 3
things useless alone, but very useful when mixed.
I think I have the solution to our problem. It's not a zebra or
html::template or marc::record problem, it's a Perl one !
Let me explain :
I followed my utf-8 string in my perl Code until printed and it was
always utf-8 (\x9c...)
But in firefox, it was iso8859-1.
Heikki told me that the first 255 char were shared by unicode and
iso8859-1. So, I told myself : OK, Paul, add a "true utf-8 character to
your string". I choose \x{263a} (the smiley, because i'm always
optimistic & that is what is used in perluniintro)
Surprise ... now my é was a utf-8 é in firefox !!!!
Conclusion : perl looked at my string before sendint it, and, as it
finds it's not "true utf-8", Perl did something to change it in iso8859-1.
I also had a brand new message in my log :
> Wide character in print at ...
Mike R. and Tümer G. suggestions make me investigate perldoc on unicode.
and here it is :
A user of Perl does not normally need to know nor care how Perl happens
to encode its internal strings, but it becomes rele-
vant when outputting Unicode strings to a stream without a PerlIO layer -- one
with the "default" encoding. In such a case,
the raw bytes used internally (the native character set or UTF-8, as
appropriate for each string) will be used, and a "Wide
character" warning will be issued if those strings contain a character
beyond 0x00FF.
For example,
perl -e 'print "\x{DF}\n", "\x{0100}\x{DF}\n"'
produces a fairly useless mixture of native bytes and UTF-8, as well as
a warning:
Wide character in print at ...
To output UTF-8, use the ":utf8" output layer. Prepending
binmode(STDOUT, ":utf8");
to this sample program ensures that the output is completely UTF-8, and
removes the program's warning.
GOTCHA ! I have added binmode(STDOUT, ":utf8"), and now, even without
the smiley, my éà... are correctly shown.
Still having to investigate mySQL utf-8, but it seems that
> set NAMES=utf8
is useless.
Thanks everybody for helping me. I'll continue this thread on koha-devel
only, as zebra & perl4lib are not interested probably.
--
Paul POULAIN et Henri Damien LAURENT
Consultants indépendants
en logiciels libres et bibliothéconomie (http://www.koha-fr.org)
- [Koha-devel] utf-8, probable solution,
Paul POULAIN <=