[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [bug-gnu-libiconv] TILDE in Shift-jis
From: |
Bruno Haible |
Subject: |
Re: [bug-gnu-libiconv] TILDE in Shift-jis |
Date: |
Tue, 20 May 2008 01:25:57 +0200 |
User-agent: |
KMail/1.5.4 |
Hi,
Takemoto wrote:
> Sometimes the //TRANSLIT function of inconv does
> not produce the expected approximation, particularly with
> Japanese.
> ...
> and char(126) from utf-8 to shift-jis
> http://bugs.php.net/bug.php?id=45017
>
> The php people are saying this out of their juristiction being
> a libconv issue.
The PHP people are right, when they redirect you to bug-gnu-libiconv.
Shift_JIS does not contain a tilde: neither the ASCII TILDE (U+007E),
nor the FULLWIDTH TILDE (U+FF5E). You find the mapping table of libiconv
for Shift_JIS in the file libiconv/tests/SHIFT_JIS.TXT; please convince
yourself.
> I particularly need tildes in shift_jis encoded pages/email
Japanese web pages, nowadays, are most often encoded in CP932 from Microsoft
or UTF-8. ISO-2022-JP-2 is also used, but to a lesser extent.
You can learn about the difference between Shift_JIS and CP932 here:
http://www.haible.de/bruno/charsets/conversion-tables/Japanese.html
under "Shift_JIS and extensions".
> since, following the appache standard I have a tilde in my url.
In URLs you can always escape a tilde by %7E. It is a bit ugly, but when
you have character conversion problems, it is safer.
> Please see
> http://md2.cc.yamaguchi-u.ac.jp/~eigo/temp/tilde.php
It can also be written
http://md2.cc.yamaguchi-u.ac.jp/%7Eeigo/temp/tilde.php
> ps this email was sent encoded in Shift-JIS and you can see the tilde
Your mailer may surprise you: Your mail was labelled and encoded as ISO-2022-JP:
Content-Type: text/plain;
charset="iso-2022-jp"
Content-Transfer-Encoding: 7bit
Bruno