demexp-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Demexp-dev] Character encoding


From: Lyu Abe
Subject: Re: [Demexp-dev] Character encoding
Date: Mon, 22 Oct 2007 16:05:46 +0900
User-agent: Thunderbird 2.0.0.6 (Windows/20070728)

Hi Thomas and David,

Thanks for the clarification!

        Lyu.

Thomas Petazzoni a écrit :
Hi,

Le Mon, 22 Oct 2007 14:40:46 +0900,
Lyu Abe <address@hidden> a écrit :

There's one thing I do not understand in character coding of the
server's reply. When I display, for example, tag sets, I can read
this:

'a_tag_label': u'citoyennet\xe9'

in which  " u'citoyennet\xe9' " corresponds to an unicode encoded
text, right? Then I do not understand why we get unicode encoded
strings, while DEMEXP is supposed to have UTF-8 encoding...

The string you mention is encoded in ISO-8859-1 (or ISO-8859-15) : the
special character é is encoded on one byte only, so it's not UTF-8.

You're also making a confusion between Unicode and UTF-8. Unicode
associates each character with an unique number, and UTF-8 allows to
encode that number is a certain way. There are various way of encoding
Unicode numbers (UTF-7, UTF-8, UTF-16, UTF-32, UCS-2, etc.).

See http://en.wikipedia.org/wiki/Unicode for more information.

Sincerly,

Thomas




reply via email to

[Prev in Thread] Current Thread [Next in Thread]