[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Mldonkey-bugs] [bug #17618] Incorrect charset used for some locales
From: |
Gang Chen |
Subject: |
[Mldonkey-bugs] [bug #17618] Incorrect charset used for some locales |
Date: |
Wed, 6 Sep 2006 03:42:02 +0000 |
User-agent: |
Mozilla/5.0 (X11; U; Linux i686; zh-CN; rv:1.8.0.4) Gecko/20060406 Firefox/1.5.0.4 (Debian-1.5.dfsg+1.5.0.4-1) |
URL:
<http://savannah.nongnu.org/bugs/?17618>
Summary: Incorrect charset used for some locales
Project: mldonkey, a multi-networks file-sharing client
Submitted by: gangchen
Submitted on: Wednesday 09/06/06 at 03:42
Category: HTTP interface
Severity: 3 - Normal
Item Group: i18n issues
Status: None
Assigned to: None
Open/Closed: Open
Release:
Release: None
Operating System: None
Binaries Origin: None
CPU type: None
_______________________________________________________
Details:
Hi
I found a bug that the charset is incorrect in some environment. For example,
when the locale is set to "zh_CN" or "zh_CN.GBK" for Simplified Chinese, the
mldonkey should use charset GB2312 or GBK charset to convert filenames, but
it indeed using the BIG5 charset to convert filenames. This will result in
filenames with incorrect characters displayed on webpages.
I found the root cause of this problem is in src/utils/lib/charset.ml:
The charsets list are always set to "BIG5, ..., GBK" when the language is
"zh"
| "SH"
| "SR" -> li := central_european :: cyrillic ::!li
| "ZH" -> li := chinese_traditional :: chinese_simplified :: !li
| "BE"
But please note the charset BIG5 for zh_TW is totally different from GBK for
zh_CN, it's not a good idea to assume charsets for same language should
always be compatible with rests.
To workaround this problem, we can remove the chinese_traditional from above
code, but it's bad for poeple who using Traditional Chinese.
My suggestion is to match the locale in ll_CC format, while not only the
language tag. For example:
"zh_CN" -> li := chinese_simplified :: !li
"zh_TW" -> li := chinese_traditional :: !li
The normalize_language method should also be changed since it will trim the
"_CC" in the locale string.
A better solution for this issue may be that try to use the encoding for
current locale provided by glibc, such as nl_langinfo(), it will returns the
charset for current locale, same as result of the command: locale charmap
Thanks
_______________________________________________________
Reply to this item at:
<http://savannah.nongnu.org/bugs/?17618>
_______________________________________________
Message sent via/by Savannah
http://savannah.nongnu.org/
- [Mldonkey-bugs] [bug #17618] Incorrect charset used for some locales,
Gang Chen <=
- [Mldonkey-bugs] [bug #17618] Incorrect charset used for some locales, spiralvoice, 2006/09/06
- [Mldonkey-bugs] [bug #17618] Incorrect charset used for some locales, Gang Chen, 2006/09/07
- [Mldonkey-bugs] [bug #17618] Incorrect charset used for some locales, spiralvoice, 2006/09/07
- [Mldonkey-bugs] [bug #17618] Incorrect charset used for some locales, Gang Chen, 2006/09/10
- [Mldonkey-bugs] [bug #17618] Incorrect charset used for some locales, spiralvoice, 2006/09/11
- [Mldonkey-bugs] [bug #17618] Incorrect charset used for some locales, Gang Chen, 2006/09/12
- [Mldonkey-bugs] [bug #17618] Incorrect charset used for some locales, Gang Chen, 2006/09/12
- [Mldonkey-bugs] [bug #17618] Incorrect charset used for some locales, spiralvoice, 2006/09/12
- [Mldonkey-bugs] [bug #17618] Incorrect charset used for some locales, spiralvoice, 2006/09/12
- [Mldonkey-bugs] [bug #17618] Incorrect charset used for some locales, spiralvoice, 2006/09/14