[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: Automatic recognition of some specific coding systems
From: |
Jürgen Hartmann |
Subject: |
RE: Automatic recognition of some specific coding systems |
Date: |
Fri, 27 Feb 2015 13:12:46 +0100 |
Thank you, Yuri Khan, for widening the perspective:
> The general problem you’re solving is that of encoding detection.
> There exist ready-made solutions for that, e.g. by computing byte
> frequencies and matching them against known character frequencies in
> your language. One of these is called enca.
>
> Googling for “emacs enca” yields a post by Dmitriyi Paduchikh in
> gnu.emacs.sources, dated 2007.
>
> https://lists.gnu.org/archive/html/gnu-emacs-sources/2007-06/msg00037.html
To use Google is always a good advise that I will gratefully follow
once more with respect to this broader background.
Actually I didn't know Enca at all up to now: A language based attempt
to recognize encoding is an interesting idea.
Unfortunately, Enca can not be used in my special case, because--I
didn't mention this before, sorry--the text files to handle are mostly
in English and German. For the former ones encoding is not an issue,
and for the latter the language German is not supported by Enca.
Enca 1.14 for example only supports
Belarussian
Bulgarian
Czech
Estonian
Croatian
Hungarian
Lithuanian
Latvian
Polish
Russian
Slovak
Slovene
Ukrainian
Chinese
But for people that use any of these languages this might be a
promising option.
Apart from that--and this might be helpful in my case also--the idea
to use an external software to detect encoding is very charming, and
maybe it is possible to adapt the lisp snippets contained in your link
to other programs. E.g.
find -bi ...
is capable to identify file encodings although it recognizes cp850
rather non-specifically as "unknown-8bit".
So thank you very much for your suggestions.
Juergen
- Re: Automatic recognition of some specific coding systems, (continued)
- Re: Automatic recognition of some specific coding systems, Eli Zaretskii, 2015/02/24
- RE: Automatic recognition of some specific coding systems, Jürgen Hartmann, 2015/02/24
- Re: Automatic recognition of some specific coding systems, Eli Zaretskii, 2015/02/25
- RE: Automatic recognition of some specific coding systems, Jürgen Hartmann, 2015/02/25
- Re: Automatic recognition of some specific coding systems, Eli Zaretskii, 2015/02/25
- RE: Automatic recognition of some specific coding systems, Jürgen Hartmann, 2015/02/25
- Re: Automatic recognition of some specific coding systems, Eli Zaretskii, 2015/02/26
- RE: Automatic recognition of some specific coding systems, Jürgen Hartmann, 2015/02/26
- Re: Automatic recognition of some specific coding systems, Eli Zaretskii, 2015/02/28
Re: Automatic recognition of some specific coding systems, Yuri Khan, 2015/02/26
- RE: Automatic recognition of some specific coding systems,
Jürgen Hartmann <=