[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[bug #51330] preconv fails to detect utf-8 without BOM
From: |
Werner LEMBERG |
Subject: |
[bug #51330] preconv fails to detect utf-8 without BOM |
Date: |
Wed, 28 Jun 2017 04:58:27 -0400 (EDT) |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36 |
Follow-up Comment #1, bug #51330 (project groff):
I like the idea of using a library to guess the charset and encoding.
However, I think that libmagic is not suited to that – as far as I can see,
it returns a textual description of the data that preconv had to parse
manually. Please correct me if I'm wrong.
Looking around, the probably best choice is uchardet:
https://www.freedesktop.org/wiki/Software/uchardet/
We could make preconv use it optionally if it is available.
_______________________________________________________
Reply to this item at:
<http://savannah.gnu.org/bugs/?51330>
_______________________________________________
Message sent via/by Savannah
http://savannah.gnu.org/