[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Demexp-dev] Character encoding
From: |
Thomas Petazzoni |
Subject: |
Re: [Demexp-dev] Character encoding |
Date: |
Mon, 22 Oct 2007 09:01:14 +0200 |
Hi,
Le Mon, 22 Oct 2007 14:40:46 +0900,
Lyu Abe <address@hidden> a écrit :
> There's one thing I do not understand in character coding of the
> server's reply. When I display, for example, tag sets, I can read
> this:
>
> 'a_tag_label': u'citoyennet\xe9'
>
> in which " u'citoyennet\xe9' " corresponds to an unicode encoded
> text, right? Then I do not understand why we get unicode encoded
> strings, while DEMEXP is supposed to have UTF-8 encoding...
The string you mention is encoded in ISO-8859-1 (or ISO-8859-15) : the
special character é is encoded on one byte only, so it's not UTF-8.
You're also making a confusion between Unicode and UTF-8. Unicode
associates each character with an unique number, and UTF-8 allows to
encode that number is a certain way. There are various way of encoding
Unicode numbers (UTF-7, UTF-8, UTF-16, UTF-32, UCS-2, etc.).
See http://en.wikipedia.org/wiki/Unicode for more information.
Sincerly,
Thomas
--
Thomas Petazzoni - address@hidden
http://{thomas,sos,kos}.enix.org - http://www.toulibre.org
http://www.{livret,agenda}dulibre.org
- [Demexp-dev] Web Client Draft [status], Lyu Abe, 2007/10/21
- Re: [Demexp-dev] Web Client Draft [status], David MENTRE, 2007/10/21
- [Demexp-dev] Character encoding, Lyu Abe, 2007/10/22
- Re: [Demexp-dev] Character encoding, David MENTRE, 2007/10/22
- Re: [Demexp-dev] Character encoding,
Thomas Petazzoni <=
- Re: [Demexp-dev] Character encoding, Lyu Abe, 2007/10/22
- Re: [Demexp-dev] Character encoding, David MENTRE, 2007/10/22
- Re: [Demexp-dev] Character encoding, Thomas Petazzoni, 2007/10/22
- Re: [Demexp-dev] Character encoding, David MENTRE, 2007/10/22