gnuherds-app-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: What files can we convert to UTF-8? -- was: (Re: Localization not us


From: Antenore Gatta
Subject: Re: What files can we convert to UTF-8? -- was: (Re: Localization not using Unicode?)
Date: Fri, 13 Oct 2006 15:52:18 +0200

2006/10/13, Davi Leal :
Laurentiu Matei wrote:
> > Maybe we should change all files to an Unicode encoding, even the
> > .php, .tpl, etc., which could have some strings embedded.
>
> It's important too to make sure the encoding of the HTML generated by
> the app is UTF-8.

* 98% of the HTML come from the .tpl Smarty templates.
* 1% comes from embedded HTML on .php files.
* 1% Others files as the images/gnus-desc.html, which
   maybe we should add the translation support to.


I have processed all the dev_1_1 files, not being directories or
images, with the below command:
  iconv --from-code=ISO-8859-1 --to-code=UTF-8  file > file.u ; mv file.u file

I was surprised only some files produced a difference from the
original. Such files are:

./AfferoGPL
./Layer-0__Site_entry_point/doc/GNUHerds__SQL_Implementation.psql
./locale/es/LC_MESSAGES/messages.po
./locale/it/LC_MESSAGES/messages.po

I run the 'file *' command to check. The below file types does not
produce a difference after executing the above 'iconv' command:
  PHP script text
  ASCII English text
  ASCII English text, with very long lines
  HTML document text
  exported SGML document text


The result is that only these files have been converted to UTF-8:
./AfferoGPL
./Layer-0__Site_entry_point/doc/GNUHerds__SQL_Implementation.psql
./locale/es/LC_MESSAGES/messages.po
./locale/it/LC_MESSAGES/messages.po

Let me know if I fix something.

P.S.: I have committed right now the changes to the dev_1_1 branch,
which we will merge to the trunk where RMS agree with it. I have added
the locale/ro/LC_MESSAGES directories.

Davi

You cannot see differences if the files you are going to convert don't
contain "special" character. What I mean is that you should open the
file with hexdump or something like that and looking for additional
byte... An hard work that is not needed.

If you have the time this link explain better the utf8
http://www.cl.cam.ac.uk/~mgk25/unicode.html#utf-8

So if you have two files with only simple ascii characters you will
never see differences between them.
Don't worry about the conversion wath you have done is ok.

So what is needed:

1. utf8 enabled in php
2. Update all the html file with charset=UTF-8
3. Every time we use some shell command to work with files like vi,
emacs, gettext, msgen set the right LC_TYPE/LANG.

By me the better solution is to set the LANG var to LANG=en_US.UTF-8
as a system default




reply via email to

[Prev in Thread] Current Thread [Next in Thread]