[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Cyrillic vs UTF-8
From: |
Simon Josefsson |
Subject: |
Re: Cyrillic vs UTF-8 |
Date: |
Fri, 25 Apr 2003 19:09:07 +0200 |
User-agent: |
Gnus/5.090019 (Oort Gnus v0.19) Emacs/21.3.50 (gnu/linux) |
"Eli Zaretskii" <address@hidden> writes:
>> From: Simon Josefsson <address@hidden>
>> Date: Fri, 25 Apr 2003 18:12:17 +0200
>>
>> I think there are two problems. Opening the file the first time
>> should guess it is a utf-8 file.
>
> IIRC, you need to make the priority of utf-8 higher for this to
> happen. Unless that's changed in the current CVS, try evaluating the
> following expression:
>
> (prefer-coding-system 'utf-8)
>
> before you visit a utf-8 encoded file, and see if that helps. I think
> this is because the encoding detection routines cannot distinguish
> between Latin-n and utf encoding without some help.
This works, but note that Emacs didn't recognize the file as being in
any encoding without it. The modeline says '-:--'.
It seems binary is preferred over utf-8 and utf-16-* in
coding-category-list. This seems extremely conservative. I guess it
means UTF-8 can never be autodetected by default? Is the unicode
support so bad it shouldn't even be preferred over binary? UTF-8 is
well formed and restricted; detecting it properly (even compared to
Latin-n) can be done well enough that failures rarely happen in
practice.
Can't we move binary down below UTF-8 in CVS? IMHO we should move
UTF-8 earlier still, since determining whether data is UTF-8 or not
can be done with good probability. Prefering binary over UTF-8 seems
just wrong.
There used to be (in Emacs 21.2) a PROBLEMS entry suggesting what you
say, but it has been removed both in 21.3 and in CVS. I thought that
meant UTF-8 was better supported now, but this doesn't seem to be the
case.
- Cyrillic vs UTF-8, Simon Josefsson, 2003/04/25
- Re: Cyrillic vs UTF-8, Eli Zaretskii, 2003/04/25
- Re: Cyrillic vs UTF-8,
Simon Josefsson <=
- Re: Cyrillic vs UTF-8, Eli Zaretskii, 2003/04/25
- Re: Cyrillic vs UTF-8, Kenichi Handa, 2003/04/26
- Re: Cyrillic vs UTF-8, Simon Josefsson, 2003/04/26
- Re: Cyrillic vs UTF-8, Kenichi Handa, 2003/04/28
- Re: Cyrillic vs UTF-8, Simon Josefsson, 2003/04/28
- Re: Cyrillic vs UTF-8, Benjamin Riefenstahl, 2003/04/26
- Re: Cyrillic vs UTF-8, Benjamin Riefenstahl, 2003/04/26
- Re: Cyrillic vs UTF-8, Richard Stallman, 2003/04/28
- Re: Cyrillic vs UTF-8, Richard Stallman, 2003/04/26
- Re: Cyrillic vs UTF-8, Simon Josefsson, 2003/04/26