|
From: | Anthony Liguori |
Subject: | Re: [Qemu-devel] KVM call minutes for Feb 15 |
Date: | Thu, 17 Feb 2011 07:37:54 -0600 |
User-agent: | Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.15) Gecko/20101027 Lightning/1.0b1 Thunderbird/3.0.10 |
On 02/17/2011 07:25 AM, Avi Kivity wrote:
On 02/17/2011 03:10 PM, Anthony Liguori wrote:On 02/17/2011 06:23 AM, Avi Kivity wrote:On 02/17/2011 02:12 PM, Anthony Liguori wrote:(btw what happens in a non-UTF-8 locale? I guess we should just reject unencodable strings).While QEMU is mostly ASCII internally, for the purposes of the JSON parser, we always encode and decode UTF-8. We reject invalid UTF-8 sequences. But since JSON is string-encoded unicode, we can always decode a JSON string to valid UTF-8 as long as the string is well formed.That is wrong. If the user passes a Unicode filename it is expected to be translated to the current locale encoding for the purpose of, say, filename lookup.QEMU does not support anything but UTF-8.Since when?AFAICT, JSON string conversion is the only place where there is any dependency on UTF-8. Anything else should just work.That's pretty common with Unix software. I don't think any modern Unix platform actually uses UCS2 or UTF-16. It's either ascii or UTF-8.Most/all Linux distributions support UTF-8 as well as a zillion other encodings (single-byte ASCII + another charset, or multi-byte charsets for languages with many characters.
An application has to explicitly support an encoding. It is not transparent. UCS2/UTF-16 means that strings are not 'const char *'s but 'const wchar_t *' where typedef unsigned short wchar_t;.
QEMU assumes, in lots of places that strings are single-byte NUL terminated. Basically, any use of snprintf, printf, strcpy, strlen, etc. pretty much tie you to ASCII/UTF-8. You can have a single NUL byte as part of a valid UCS2 string.
The only place it even matters is Windows and Windows has ASCII and UTF-16 versions of their APIs. So on Windows, non-ASCII characters won't be handled correctly (yet another one of the many issues with Windows support in QEMU). UTF-8 is self-recovering though so it degrades gracefully.It matters on Linux with el_GR.iso88597, for example.
The whole series of iso8859 (8-bit encodings) are officially abandoned in favor of UCS and encodings that support the full UCS code page (UTF-8/UTF-16).
I see no strong reason to try and support deprecated encodings when there are perfectly valid replacements like el_GR.utf8.
Regards, Anthony Liguori
[Prev in Thread] | Current Thread | [Next in Thread] |