Re: [Qemu-devel] KVM call minutes for Feb 15

qemu-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] KVM call minutes for Feb 15

From:	Avi Kivity
Subject:	Re: [Qemu-devel] KVM call minutes for Feb 15
Date:	Thu, 17 Feb 2011 16:06:41 +0200
User-agent:	Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.13) Gecko/20101209 Fedora/3.1.7-0.35.b3pre.fc14 Lightning/1.0b3pre Thunderbird/3.1.7

On 02/17/2011 03:37 PM, Anthony Liguori wrote:

On 02/17/2011 07:25 AM, Avi Kivity wrote:
On 02/17/2011 03:10 PM, Anthony Liguori wrote:
On 02/17/2011 06:23 AM, Avi Kivity wrote:
On 02/17/2011 02:12 PM, Anthony Liguori wrote:
(btw what happens in a non-UTF-8 locale? I guess we should justreject unencodable strings).
While QEMU is mostly ASCII internally, for the purposes of theJSON parser, we always encode and decode UTF-8. We reject invalidUTF-8 sequences. But since JSON is string-encoded unicode, we canalways decode a JSON string to valid UTF-8 as long as the stringis well formed.
That is wrong. If the user passes a Unicode filename it isexpected to be translated to the current locale encoding for thepurpose of, say, filename lookup.
QEMU does not support anything but UTF-8.
Since when?
AFAICT, JSON string conversion is the only place where there is anydependency on UTF-8. Anything else should just work.
That's pretty common with Unix software. I don't think any modernUnix platform actually uses UCS2 or UTF-16. It's either ascii orUTF-8.
Most/all Linux distributions support UTF-8 as well as a zillion otherencodings (single-byte ASCII + another charset, or multi-bytecharsets for languages with many characters.
Maybe there's some confusion here.  UTF-8 is an encoding, not a locale.

The common encodings are ASCII, UTF-8, UCS2, UTF-16, and UTF-32.

ASCII is a character set and encoding. The rest are encodings forUnicode. There are lots of other encodings, say latin-1.

An application has to explicitly support an encoding. It is nottransparent.

It is fully transparent until you do wire conversions (like we do withqmp which is explicitly UTF-8).

UCS2/UTF-16 means that strings are not 'const char *'s but 'constwchar_t *' where typedef unsigned short wchar_t;.
QEMU assumes, in lots of places that strings are single-byte NULterminated. Basically, any use of snprintf, printf, strcpy, strlen,etc. pretty much tie you to ASCII/UTF-8. You can have a single NULbyte as part of a valid UCS2 string.

We're tied to single- or multiple- byte encodings, and can't dowchar_t. But that's very different from ASCII/UTF-8 only.

The only place it even matters is Windows and Windows has ASCII andUTF-16 versions of their APIs. So on Windows, non-ASCII characterswon't be handled correctly (yet another one of the many issues withWindows support in QEMU). UTF-8 is self-recovering though so itdegrades gracefully.
It matters on Linux with el_GR.iso88597, for example.
The whole series of iso8859 (8-bit encodings) are officially abandonedin favor of UCS and encodings that support the full UCS code page(UTF-8/UTF-16).
I see no strong reason to try and support deprecated encodings whenthere are perfectly valid replacements like el_GR.utf8.

All it takes is a call to iconv(3). I agree it's unlikely to happen inpractice.


--
error compiling committee.c: too many arguments to function

[Prev in Thread]

Current Thread

[Next in Thread]

Re: [Qemu-devel] KVM call minutes for Feb 15, (continued)

Prev by Date: Re: [Qemu-devel] KVM call minutes for Feb 15
Next by Date: Re: [Qemu-devel] [RFC][PATCH v6 00/23] virtagent: host/guest RPC communication agent
Previous by thread: Re: [Qemu-devel] KVM call minutes for Feb 15
Next by thread: Re: [Qemu-devel] KVM call minutes for Feb 15
Index(es):
- Date
- Thread