lynx-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Lynx-dev] Dumps Unicode file in broken encoding.


From: Atsuhito Kohda
Subject: [Lynx-dev] Dumps Unicode file in broken encoding.
Date: Mon, 29 Sep 2008 19:41:28 +0900 (JST)

Hi all,

I got the following bug report in the Debian BTS (Bug#498985).
As I have no knowledge on this, I'd like to forward the report
to this lists.

On Mon, 15 Sep 2008 16:10:38 +0900, Charles Plessy wrote:

> I have severe problems when converting HTML messages with Lynx while
> using Mutt, and it seems to me that the reason is that the output
> encoding is broken. Here is a simple example:
> 
> aqwa『~』$ cat test.html 
> <ul>
> <li>é</li>
> <li>à</li>
> </ul>
> 
> aqwa『~』$ hexdump -C test.html 
> 00000000  3c 75 6c 3e 0a 3c 6c 69  3e c3 a9 3c 2f 6c 69 3e  |<ul>.<li>..</li>|
> 00000010  0a 3c 6c 69 3e c3 a0 3c  2f 6c 69 3e 0a 3c 2f 75  |.<li>..</li>.</u|
> 00000020  6c 3e 0a                                          |l>.|
> 00000023
> 
> aqwa『~』$ lynx.cur --dump test.html 
>      * é
>      * 
> 
> 
> aqwa『~』$ lynx.cur --dump test.html > test.txt 
> 
> aqwa『~』$ hexdump -C test.txt 
> 00000000  20 20 20 20 20 2a 20 c3  a9 0a 20 20 20 20 20 2a  |     * ...     *|
> 00000010  20 c3 0a 0a                                       | ...|
> 00000014
> 
> Here are the expected files in latin and unicode encodings:
> 
> aqwa『~』$ cat test.unicode.txt 
>      * é
>      * à
> 
> 
> aqwa『~』$ hexdump -C test.unicode.txt 
> 00000000  20 20 20 20 20 2a 20 c3  a9 0a 20 20 20 20 20 2a  |     * ...     *|
> 00000010  20 c3 a0 0a 0a                                    | ....|
> 00000015
> 
> aqwa『~』$ cat test.iso.txt 
>      * 
>      * 
> 
> 
> aqwa『~』$ hexdump -C test.iso.txt 
> 00000000  20 20 20 20 20 2a 20 e9  0a 20 20 20 20 20 2a 20  |     * ..     * |
> 00000010  e0 0a 0a                                          |...|
> 00000013
> 
> So apparently, « à » is C3A0 in UTF-8, E0 in ISO 8859-1, but Lynx dumps it as
> C3. This causes encoding misdetection, and many downstream problems.

Thanks in advance.

Regards,                        2008-9-29(Mon)

-- 
 Debian Developer - much more I18N of Debian
 Atsuhito Kohda <kohda AT debian.org>
 Department of Math., Univ. of Tokushima

reply via email to

[Prev in Thread] Current Thread [Next in Thread]