qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v2] vl: set LC_CTYPE early in main() for all cod


From: Daniel P . Berrangé
Subject: Re: [Qemu-devel] [PATCH v2] vl: set LC_CTYPE early in main() for all code
Date: Tue, 16 Apr 2019 17:09:27 +0100
User-agent: Mutt/1.11.3 (2019-02-01)

On Tue, Apr 16, 2019 at 06:01:46PM +0200, Markus Armbruster wrote:
> Daniel P. Berrangé <address@hidden> writes:
> 
> > On Tue, Apr 16, 2019 at 09:49:09AM +0200, Markus Armbruster wrote:
> >> Daniel P. Berrangé <address@hidden> writes:
> > The main thing I can see would be filenames.
> >
> > Though having said it is UTF-8 on looking more closely I think QEMU is
> > probably 8-bit clean in its handling, so will just be blindly passing
> > whatever filename string it get from libvirt straight on to the kernel
> > with no interpretation.
> 
> Sounds good to me.
> 
> > Libvirt has enabled UTF-8 validation in its JSON library when encoding
> > data it sends to QEMU, so any data libvirt is sending will be a valid
> > UTF-8 byte sequence at least. Libvirt doesn't axctually do any charset
> > conversion though, so if libvirt runs in a non-UTF8 locale it will
> > likely trip over this UTF-8 validation.
> 
> QMP input must be encoded in UTF-8.  Converting from other encodings to
> UTF-8 is the QMP client's problem.

Ok, so consider the host OS is globally running in a non-UTF-8 locale
such as ISO8859-1. This means that any multibyte filenames in the
filesystem are assumed to be in ISO8859-1  encoding.

Since QMP input must be UTF-8, libvirt must convert the filename
from the current locale (ISO8859-1) to UTF-8 otherwise it might
be putting an invalid UTF-8 sequence in the JSON.

For QEMU to be able to open the file, QEMU must be honouring the
host OS LC_CTYPE, and converting from UTF-8 back to LC_CTYPE
character set.

> 
> The more interesting direction is the one I inquired about: QMP output.
> If locale-dependent text gets sent to QMP, converting it to UTF-8 is
> QEMU's problem.
> 
> On closer look, anything but JSON string contents is plain ASCII by
> construction.  JSON string contents gets assembled in to_json() case
> QTYPE_QSTRING.  It expects QString to use UTF-8[*].  You can have any
> locale as long as it uses ASCII or UTF-8.

IOW

> 
> >> > +     *
> >> > +     *   - Lots of codes uses is{upper,lower,alnum,...} functions, 
> >> > expecting
> >> > +     *     C locale sorting behaviour. Most QEMU usage should likely be
> >> > +     *     changed to g_ascii_is{upper,lower,alnum...} to match code
> >> > +     *     assumptions, without being broken by locale settnigs.
> >> > +     *
> >> > +     * We do still have two requirements
> >> > +     *
> >> > +     *   - Ability to correct display translated text according to the
> >> > +     *     user's locale
> >> > +     *
> >> > +     *   - Ability to handle multibyte characters, ideally according to
> >> > +     *     user's locale specified character set. This affects ability
> >> > +     *     of usb-mtp to correctly convert filenames to UCS16 and curses
> >> > +     *     & GTK frontends wide character display.
> >> > +     *
> >> > +     * The second requirement would need LC_CTYPE to be honoured, but
> >> > +     * this conflicts with the 2nd & 3rd problems listed earlier. For
> >> > +     * now we make a tradeoff, trying to set an explicit UTF-8 localee
> >> > +     *
> >> > +     * Note we can't set LC_MESSAGES here, since mingw doesn't define
> >> > +     * this constant in locale.h Fortunately we only need it for the
> >> > +     * GTK frontend and that uses gi18n.h which pulls in a definition
> >> > +     * of LC_MESSAGES.
> >> > +     */
> >> > +    setlocale(LC_CTYPE, "C.UTF-8");
> >> > +
> >> >      module_call_init(MODULE_INIT_TRACE);
> >> >  
> >> >      qemu_init_cpu_list();
> >> 
> >> We should've stayed out of the GUI business.
> >
> > This isn't only a GUI problem as above, it affects USB MTP.
> 
> I believe setlocale() in QEMU is basically wrong.  Finding all the
> places that rely on the current locale when they shouldn't and
> converting them to locale-independent alternatives is a huge amount of
> work.  Even if we managed to complete it, it wouldn't stay complete.
> 
> Instead, find the places that have reason to use the locale, and fix
> them to uselocale().

I think that's fundamentally the wrong way around. Most stuff *should*
be locale dependant, otherwise any interaction with the host OS is
likely to use incorrect localization. It isn't practical to put a
uselocale() call around every place that opens a filename. There are
a few places where QEMU should be locale indepandant such as the QMP
and guest OS ABI sensitive things, which should take account of it.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



reply via email to

[Prev in Thread] Current Thread [Next in Thread]