[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [bug-gettext] [bug #52932] XML doesn’t default to UTF-8

From: Iñigo Martínez
Subject: Re: [bug-gettext] [bug #52932] XML doesn’t default to UTF-8
Date: Mon, 22 Jan 2018 12:33:47 +0100

2018-01-20 16:37 GMT+01:00 Roumen Petrov <address@hidden>:
Bruno Haible wrote:
Update of bug #52932 (project gettext):

                   Status:                    None => Need Info


Follow-up Comment #1:

Why? Why make an assumption about the encoding (that assumption can be wrong)
- thus possibly produce an file in another encoding than the one the caller
expects? When we have a way to get away without this assumption and produce a
valid and unambiguous XML always.

There is no exactly default encoding for xml as specification requires xml processor to support UTF-8 and UTF-16 encoding - for more details see  chapter https://www.w3.org/TR/xml/#charencoding .

Next encoding is required if is not UTF-8 or UTF-16. Actually is more complicated. Let me quote from specification (same chapter) : "In the absence of external character encoding information (such as MIME headers), parsed entities which are stored in an encoding other than UTF-8 or UTF-16 /MUST/ begin with a text declaration...."

So if there is no explicit specification of according to standard encoding is either UTF-8 or UTF-16.


Looking at the that same chapter[0], a bit below it also says the following:

"In the absence of information ... Note that since ASCII is a subset of UTF-8, ordinary ASCII entities do not strictly need an encoding declaration."

The last sentence is the critical one for me. The fact that ASCII doesn't need the encoding declaration makes it the default in those cases.

Best regards,

[0] https://www.w3.org/TR/xml/#charencoding

reply via email to

[Prev in Thread] Current Thread [Next in Thread]