[bug #51330] preconv fails to detect utf-8 without BOM

bug-groff

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[bug #51330] preconv fails to detect utf-8 without BOM

From:	Bertrand Garrigues
Subject:	[bug #51330] preconv fails to detect utf-8 without BOM
Date:	Tue, 27 Jun 2017 18:31:11 -0400 (EDT)
User-agent:	Mozilla/5.0 (Windows NT 6.1; rv:54.0) Gecko/20100101 Firefox/54.0

URL:
  <http://savannah.gnu.org/bugs/?51330>

                 Summary: preconv fails to detect utf-8 without BOM
                 Project: GNU troff
            Submitted by: bgarrigues
            Submitted on: Tue 27 Jun 2017 10:31:10 PM UTC
                Severity: 3 - Normal
              Item Group: None
                  Status: Confirmed
                 Privacy: Public
             Assigned to: bgarrigues
             Open/Closed: Open
         Discussion Lock: Any
         Planned Release: None

    _______________________________________________________

Details:

(See also comment #1 from bug #50989)

typesetting.pdf (utf-8 file without BOM; contains some characters with French
accents) is not correctly generated in the build tree because LC_ALL=C is
passed: this causes `preconv' to use "latin1" as default encoding, which is
the expected behaviour according to the man page of `preconv', and therefore
characters with accents are not properly handled.

There are several quick fixes to the generation of mom examples:
- Add a BOM to the .mom files.
- Use '-K utf8' instead of just '-k'
- Add a tag to the .mom files.

However it seems to me that `preconv' should not rely on the locale to detect
the file encoding.

Would it make sense to use, for example, libmagic (from the `file' utility) to
make preconv correctly detect the input file encoding?





    _______________________________________________________

Reply to this item at:

  <http://savannah.gnu.org/bugs/?51330>

_______________________________________________
  Message sent via/by Savannah
  http://savannah.gnu.org/

[Prev in Thread]

Current Thread

[Next in Thread]

[bug #51330] preconv fails to detect utf-8 without BOM, Bertrand Garrigues <=
- [bug #51330] preconv fails to detect utf-8 without BOM, Werner LEMBERG, 2017/06/28

Prev by Date: [bug #50989] Incorrect generation of typesetting.pdf
Next by Date: [bug #50989] Incorrect generation of typesetting.pdf
Previous by thread: [bug #50989] Incorrect generation of typesetting.pdf
Next by thread: [bug #51330] preconv fails to detect utf-8 without BOM
Index(es):
- Date
- Thread