mldonkey-bugs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Mldonkey-bugs] [bug #17618] Incorrect charset used for some locales


From: Gang Chen
Subject: [Mldonkey-bugs] [bug #17618] Incorrect charset used for some locales
Date: Wed, 6 Sep 2006 03:42:02 +0000
User-agent: Mozilla/5.0 (X11; U; Linux i686; zh-CN; rv:1.8.0.4) Gecko/20060406 Firefox/1.5.0.4 (Debian-1.5.dfsg+1.5.0.4-1)

URL:
  <http://savannah.nongnu.org/bugs/?17618>

                 Summary: Incorrect charset used for some locales
                 Project: mldonkey, a multi-networks file-sharing client
            Submitted by: gangchen
            Submitted on: Wednesday 09/06/06 at 03:42
                Category: HTTP interface
                Severity: 3 - Normal
              Item Group: i18n issues
                  Status: None
             Assigned to: None
             Open/Closed: Open
                 Release: 
                 Release: None
        Operating System: None
         Binaries Origin: None
                CPU type: None

    _______________________________________________________

Details:

Hi
I found a bug that the charset is incorrect in some environment. For example,
when the locale is set to "zh_CN" or "zh_CN.GBK" for Simplified Chinese, the
mldonkey should use charset GB2312 or GBK charset to convert filenames, but
it indeed using the BIG5 charset to convert filenames. This will result in
filenames with incorrect characters displayed on webpages.
I found the root cause of this problem is in src/utils/lib/charset.ml:
The charsets list are always set to "BIG5, ..., GBK" when the language is
"zh"
      | "SH"
      | "SR" -> li := central_european :: cyrillic ::!li
      | "ZH" -> li := chinese_traditional :: chinese_simplified :: !li
      | "BE"
But please note the charset BIG5 for zh_TW is totally different from GBK for
zh_CN, it's not a good idea to assume charsets for same language should
always be compatible with rests.
To workaround this problem, we can remove the chinese_traditional from above
code, but it's bad for poeple who using Traditional Chinese.
My suggestion is to match the locale in ll_CC format, while not only the
language tag. For example:
"zh_CN" -> li := chinese_simplified :: !li
"zh_TW" -> li := chinese_traditional :: !li

The normalize_language method should also be changed since it will trim the
"_CC" in the locale string.

A better solution for this issue may be that try to use the encoding for
current locale provided by glibc, such as nl_langinfo(), it will returns the
charset for current locale, same as result of the command: locale charmap

Thanks









    _______________________________________________________

Reply to this item at:

  <http://savannah.nongnu.org/bugs/?17618>

_______________________________________________
  Message sent via/by Savannah
  http://savannah.nongnu.org/





reply via email to

[Prev in Thread] Current Thread [Next in Thread]