[bug #51330] preconv fails to detect utf-8 without BOM

bug-groff

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[bug #51330] preconv fails to detect utf-8 without BOM

From:	Werner LEMBERG
Subject:	[bug #51330] preconv fails to detect utf-8 without BOM
Date:	Wed, 28 Jun 2017 04:58:27 -0400 (EDT)
User-agent:	Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36

Follow-up Comment #1, bug #51330 (project groff):

I like the idea of using a library to guess the charset and encoding. 
However, I think that libmagic is not suited to that – as far as I can see,
it returns a textual description of the data that preconv had to parse
manually.  Please correct me if I'm wrong.

Looking around, the probably best choice is uchardet:

https://www.freedesktop.org/wiki/Software/uchardet/

We could make preconv use it optionally if it is available.

    _______________________________________________________

Reply to this item at:

  <http://savannah.gnu.org/bugs/?51330>

_______________________________________________
  Message sent via/by Savannah
  http://savannah.gnu.org/

[Prev in Thread]

Current Thread

[Next in Thread]

[bug #51330] preconv fails to detect utf-8 without BOM, Bertrand Garrigues, 2017/06/27
- [bug #51330] preconv fails to detect utf-8 without BOM, Werner LEMBERG <=

Prev by Date: [bug #50989] Incorrect generation of typesetting.pdf
Next by Date: grog doesn't detect files that contain .so
Previous by thread: [bug #51330] preconv fails to detect utf-8 without BOM
Next by thread: grog doesn't detect files that contain .so
Index(es):
- Date
- Thread