nuxeo-localizer
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Nuxeo-localizer] Non-MessageCatalog UTF-8 strings get encoded again


From: Juan David Ibáñez Palomar
Subject: Re: [Nuxeo-localizer] Non-MessageCatalog UTF-8 strings get encoded again
Date: Tue, 15 Oct 2002 01:23:15 +0200
User-agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.1) Gecko/20020913 Debian/1.1-1

Sean Treadway wrote:

First off, Localizer promises to make a world of difference for our
site.  We have customers all over the world.  Thanks for coming this
far!

Our site dates back to Zope 2.0 where I stored all the content in UTF-8
encoded python strings.  Displaying the content worked great because I
set the Content-Type header to "text/html;charset=utf-8" for all of the
pages that submit and display the content.

I've upgraded to Zope 2.6beta.  I installed a MessageCatalog (0.9.1) in
the folder of a virtual site and have some translations in place.

When I view pages that include the utf-8 encoded content from the new
site, the non ASCII characters look like they get an extra encoding
(from utf-8 to utf-8).  When I remove the Content-Type header and
display the page in Latin-1, it looks the same as if I have the
Content-Type header and display the page in UTF-8.  If I switch a page
without the Content-Type header from Latin-1 to UTF-8 it looks fine.
However, I need to tell the browser to view the page in UTF-8 and get
the content there without an extra encoding.

My suspicion is that the MessageCatalog is doing something with the
encoding of the response before the request is finished.  The same
content, with the "Content-Type: text/html;charset=utf-8" header from
the original site (without a message catalog) looks fine.  The content
from the message catalog is fine for both pages that have and do not
have the utf-8 charset header.  The content displays fine if I delete
the message catalog and include the charset=utf-8 in the Content-Type
header.

Any insight?  What can I do?  I would really like to use the Localizer,
but updating my content is a daunting task with many objects that have
many properties.  Is there a place in the code I can look for answers or
is this a fundamental behavior of the product?  If anyone can describe
the logic that applies per request or has some sane advice for this i18n
site, I am listening.

Thanks,
-Sean


Hi Sean,

The good news is that I know what is happening. It's not a
Localizer issue, it's Zope. To verify it, remove the Message
Catalog and remove Localizer if you like; then add a unicode
string in your template, for example, add:

<dtml-var "u'I am a unicode string'">

Now try again the experiment, you will see the same result.

Explanation comes now. The problem is, Python has two types
of strings, normal and unicode. In your current web site you
use normal strings encoded in UTF-8.

What happens when a normal string and a Unicode string are
concatenated? The normal string is promoted to a Unicode
string, to do that it must be encoded. It isn't posible to
detect the encoding, so a default one is used.

In Python the default is ASCII, start the Python interpreter
and type:

>>> 'a'' + u'a'
Traceback (most recent call last):
File "<stdin>", line 1, in ?
UnicodeError: ASCII decoding error: ordinal not in range(128)

Python interprets the string 'a'' as ASCII, but accented characters
can't be represented with ASCII, so it raises an exception. It's
posible to change the default encoding in Python.

However, things are different with Zope. The Unicode support in
Zope 2.6 was implemented by Toby Dickenson, who decided to give
Latin-1 a prominent role. Look at the line:

lib/python/DocumentTemplate/pDocumentTemplate.py:248

within the method "join_unicode", it is:

rendered[i] = unicode(rendered[i],'latin-1')

In Zope each time a normal string is concatenated with a Unicode
string it's interpreted as Latin-1, and it's hardcoded. Bad luck.

When I implemented Unicode support I repected and followed this
policy. I didn't bothered to address the problem.


So now, the solution is..

The good one is to fix Zope. Zope 2.6 is still in the works, if
this problem is seen as bug, and I think it is, then it has a
chance to be fixed. I will have to modify Localizer too, but this
won't be an issue.

Florent is who can help here. Actually, maybe this has already been
fixed, I don't follow the Zope CVS activity. Florent, could you give
some insight?


In the worst case, if it is not fixed in Zope, I will have to
implement a workaround in Localizer, with your help I hope :-)


Regards,

--
J. David Iba'n~ez, http://www.j-david.net
Software Engineer / Inge'nieur Logiciel / Ingeniero de Software






reply via email to

[Prev in Thread] Current Thread [Next in Thread]