[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#20623: XML and HTML files with encoding/charset="utf-8" declaration
From: |
Alain Schneble |
Subject: |
bug#20623: XML and HTML files with encoding/charset="utf-8" declaration loose BOM; Coding system is reset from utf-8-with-signature to utf-8 on save |
Date: |
Wed, 12 Oct 2016 23:44:57 +0200 |
User-agent: |
Gnus/5.13 (Gnus v5.13) Emacs/25.1.50 (windows-nt) |
I'm joining this discussion and would like to report a recipe to
reproduce this issue on Windows:
- emacs -Q
- C-x C-f utf-8-bom-test.xml
- Enter the following text in the new buffer:
<?xml version="1.0" encoding="utf-8"?>
<root></root>
- C-x RET c utf-8-with-signature-dos C-x C-s yes RET
- C-x k RET
- C-x C-f utf-8-bom-test.xml
- M-: buffer-file-coding-system
=> utf-8-with-signature-dos
- Change buffer content, e.g. add some text to the root element:
<?xml version="1.0" encoding="utf-8"?>
<root>test</root>
- C-x C-s
- M-: buffer-file-coding-system
=> utf-8-dos
(expected coding system: utf-8-with-signature-dos)
As it was already mentioned in this thread, just by visiting the file,
then changing and saving the buffer, the BOM gets lost. This is due to
select-safe-coding-system (called by choose_write_coding_system) fully
trusting the coding system identified by find-auto-coding. So far so
good. The latter eventually calls auto-coding-functions which in turn
calls the built-in sgml-xml-auto-coding-function which I think should
take into account some context to enrich the derived coding system with
a signature if needed. Similar to what select-safe-coding-system does
to enrich the coding with the proper eol-type.
Does that make sense to you? If so, I'll try to come up with a patch
that enhances sgml-xml-auto-coding-function to take into account
buffer-file-coding-system (buffer + default value) in case it carries
the same text-conversion but different signature. The proposed "auto
coding" shall inherit the signature in this case.
Thanks for any help.
Alain
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- bug#20623: XML and HTML files with encoding/charset="utf-8" declaration loose BOM; Coding system is reset from utf-8-with-signature to utf-8 on save,
Alain Schneble <=