[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH v2] vl: set LC_CTYPE early in main() for all cod
From: |
Daniel P . Berrangé |
Subject: |
Re: [Qemu-devel] [PATCH v2] vl: set LC_CTYPE early in main() for all code |
Date: |
Tue, 16 Apr 2019 17:09:27 +0100 |
User-agent: |
Mutt/1.11.3 (2019-02-01) |
On Tue, Apr 16, 2019 at 06:01:46PM +0200, Markus Armbruster wrote:
> Daniel P. Berrangé <address@hidden> writes:
>
> > On Tue, Apr 16, 2019 at 09:49:09AM +0200, Markus Armbruster wrote:
> >> Daniel P. Berrangé <address@hidden> writes:
> > The main thing I can see would be filenames.
> >
> > Though having said it is UTF-8 on looking more closely I think QEMU is
> > probably 8-bit clean in its handling, so will just be blindly passing
> > whatever filename string it get from libvirt straight on to the kernel
> > with no interpretation.
>
> Sounds good to me.
>
> > Libvirt has enabled UTF-8 validation in its JSON library when encoding
> > data it sends to QEMU, so any data libvirt is sending will be a valid
> > UTF-8 byte sequence at least. Libvirt doesn't axctually do any charset
> > conversion though, so if libvirt runs in a non-UTF8 locale it will
> > likely trip over this UTF-8 validation.
>
> QMP input must be encoded in UTF-8. Converting from other encodings to
> UTF-8 is the QMP client's problem.
Ok, so consider the host OS is globally running in a non-UTF-8 locale
such as ISO8859-1. This means that any multibyte filenames in the
filesystem are assumed to be in ISO8859-1 encoding.
Since QMP input must be UTF-8, libvirt must convert the filename
from the current locale (ISO8859-1) to UTF-8 otherwise it might
be putting an invalid UTF-8 sequence in the JSON.
For QEMU to be able to open the file, QEMU must be honouring the
host OS LC_CTYPE, and converting from UTF-8 back to LC_CTYPE
character set.
>
> The more interesting direction is the one I inquired about: QMP output.
> If locale-dependent text gets sent to QMP, converting it to UTF-8 is
> QEMU's problem.
>
> On closer look, anything but JSON string contents is plain ASCII by
> construction. JSON string contents gets assembled in to_json() case
> QTYPE_QSTRING. It expects QString to use UTF-8[*]. You can have any
> locale as long as it uses ASCII or UTF-8.
IOW
>
> >> > + *
> >> > + * - Lots of codes uses is{upper,lower,alnum,...} functions,
> >> > expecting
> >> > + * C locale sorting behaviour. Most QEMU usage should likely be
> >> > + * changed to g_ascii_is{upper,lower,alnum...} to match code
> >> > + * assumptions, without being broken by locale settnigs.
> >> > + *
> >> > + * We do still have two requirements
> >> > + *
> >> > + * - Ability to correct display translated text according to the
> >> > + * user's locale
> >> > + *
> >> > + * - Ability to handle multibyte characters, ideally according to
> >> > + * user's locale specified character set. This affects ability
> >> > + * of usb-mtp to correctly convert filenames to UCS16 and curses
> >> > + * & GTK frontends wide character display.
> >> > + *
> >> > + * The second requirement would need LC_CTYPE to be honoured, but
> >> > + * this conflicts with the 2nd & 3rd problems listed earlier. For
> >> > + * now we make a tradeoff, trying to set an explicit UTF-8 localee
> >> > + *
> >> > + * Note we can't set LC_MESSAGES here, since mingw doesn't define
> >> > + * this constant in locale.h Fortunately we only need it for the
> >> > + * GTK frontend and that uses gi18n.h which pulls in a definition
> >> > + * of LC_MESSAGES.
> >> > + */
> >> > + setlocale(LC_CTYPE, "C.UTF-8");
> >> > +
> >> > module_call_init(MODULE_INIT_TRACE);
> >> >
> >> > qemu_init_cpu_list();
> >>
> >> We should've stayed out of the GUI business.
> >
> > This isn't only a GUI problem as above, it affects USB MTP.
>
> I believe setlocale() in QEMU is basically wrong. Finding all the
> places that rely on the current locale when they shouldn't and
> converting them to locale-independent alternatives is a huge amount of
> work. Even if we managed to complete it, it wouldn't stay complete.
>
> Instead, find the places that have reason to use the locale, and fix
> them to uselocale().
I think that's fundamentally the wrong way around. Most stuff *should*
be locale dependant, otherwise any interaction with the host OS is
likely to use incorrect localization. It isn't practical to put a
uselocale() call around every place that opens a filename. There are
a few places where QEMU should be locale indepandant such as the QMP
and guest OS ABI sensitive things, which should take account of it.
Regards,
Daniel
--
|: https://berrange.com -o- https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o- https://fstop138.berrange.com :|
|: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|