[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#12291: [rev 109796] wrong UTF-8 handling
From: |
Werner LEMBERG |
Subject: |
bug#12291: [rev 109796] wrong UTF-8 handling |
Date: |
Tue, 28 Aug 2012 21:22:26 +0200 (CEST) |
> In both cases, user surely see them.
OK. BTW, the real use-case is a bug in emacs 23.x which prevented
correct conversion from emacs-mule encoding to utf-8, creating such
funnily encoded utf-8 files (I can't repeat this problem with my
recently compiled emacs, so it seems that it has been fixed
meanwhile).
>> Instead, such characters must be converted to correct
>> UTF-8.
>
> ??? I don't understand what you means by "correct UTF-8".
Sorry, I've meant correct Unicode. U+1351DE is larger than the
largest valid Unicode value. As my example demonstrates, the Chinese
character in the file is certainly *neither* a private character nor a
character from GB 18030, so it should be converted to a regular
Unicode value.
> I think the correct behaviour on reading such a file by utf-8 is to
> treat each byte as raw-byte.
Maybe. I'm not sure how Emacs should behave in reading such files.
Werner