[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Questions on charset encoding detection and keyboard layout

From: Hou, Ruoyu
Subject: Re: Questions on charset encoding detection and keyboard layout
Date: Sat, 12 Dec 2009 03:51:50 +0800
User-agent: Thunderbird (Windows/20090812)

Dear Zaretskii,

Before switching to Emacs I've been using EmEditor, a proprietary editor under Windows. It could auto-detect those files with different encodings and prompt a coding list in statistical confidence order for me to determine the most likely file encoding. So I guess it may implements certain statistical algorithm to detect the proper encoding.

I also tried MadEdit, an open source cross-platform editor. So far it could automatically decode files it handled even without the need for me to choose a likely one. I am not skilled to read its source code so I can't tell how it is done. Also I don't know how MULE handles the coding detection case.

A friend of mine, a Vim user, showed me handling those different encodings by ":set fencs=(a list of possible encodings, the point is to put euc-jp before gbk)". It seems to be done by calling libiconv and libintl(or gettext, I'm not sure).

I just thought that my Emacs should perform better or at least equivalent with these softwares.

Thanks for your help. I am actually using the commands you mentioned to set encodings for viewing or saving. The classification for document storage is a good idea and habit, only if I had the foresight. It's a bit unrealistic when facing a large quantity of unsorted documents in different encodings already on the disk and constantly increasing (as I always complain, why can't those guys just use UTF-8?). Is it possible to for example write a script to distinguish and sort those documents?


Eli Zaretskii wrote:
Date: Fri, 11 Dec 2009 13:42:39 +0800
From: "Hou, Ruoyu" <address@hidden>

I tried the tip you gave me, but now I've got my GBK-encoded files unreadable. How you would solve the problem?

Moreover, as I mentioned in the previous post, how could I set a prefer-coding-system without beforehand knowledge about the encoding I am supposed to encounter?

If you have many documents in different encodings that Emacs cannot
distinguish by itself, then I'm afraid there's no good solution except
"C-x RET c", which requires that you know the encoding in advance.  At
least I'm not aware of any better way.  What do other applications do?

Of course, if you inadvertently visit a file without knowing the
encoding, and want to re-visit it with the correct encoding, after you
notice that Emacs didn't properly decode it, then typing "C-x RET c
CORRECT-ENCODING RET M-x revert-buffer RET" will fix the problem.
Here CORRECT-ENCODING is the correct encoding of the file.

Also, if you could somehow manage to have documents in different
encodings to reside in different directories, then perhaps you could
set up the directory-local variables to cause Emacs decode the files
in each directory correctly.  See the node "Directory Variables" in
the Emacs user manual for details about this feature.

Hou, Ruoyu

Laboratory of Reproductive & Stem Cell Biology,
College of Life Science & Biotech.,
Shanghai Jiao Tong University,
Shanghai 200240, P.R.China.

reply via email to

[Prev in Thread] Current Thread [Next in Thread]