qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [RFC 0/6] enable numa configuration before machine_init


From: Daniel P. Berrange
Subject: Re: [Qemu-devel] [RFC 0/6] enable numa configuration before machine_init() from HMP/QMP
Date: Wed, 25 Oct 2017 08:02:06 +0100
User-agent: Mutt/1.9.1 (2017-09-22)

On Wed, Oct 25, 2017 at 08:57:43AM +0200, Eduardo Habkost wrote:
> On Mon, Oct 23, 2017 at 10:45:41AM +0200, Igor Mammedov wrote:
> > On Fri, 20 Oct 2017 17:53:09 -0200
> > Eduardo Habkost <address@hidden> wrote:
> > 
> > > On Fri, Oct 20, 2017 at 12:21:30PM +1100, David Gibson wrote:
> > > > On Thu, Oct 19, 2017 at 02:23:04PM +0200, Paolo Bonzini wrote:  
> > > > > On 19/10/2017 13:49, David Gibson wrote:  
> > > > > > Note that describing socket/core/thread tuples as arch independent 
> > > > > > (or
> > > > > > even machine independent) is.. debatable.  I mean it's flexible 
> > > > > > enough
> > > > > > that most platforms can be fit to that scheme without too much
> > > > > > straining.  But, there's no arch independent way of defining what 
> > > > > > each
> > > > > > level means in terms of its properties.
> > > > > > 
> > > > > > So, for example, on spapr - being paravirt - there's no real
> > > > > > distinction between cores and sockets, how you divide them up is
> > > > > > completely arbitrary.  
> > > > > 
> > > > > Same on x86, actually.
> > > > > 
> > > > > It's _common_ that cores on the same socket share L3 cache and that a
> > > > > socket spans an integer number of NUMA nodes, but it doesn't have to 
> > > > > be
> > > > > that way.
> > > > > 
> > > > > QEMU currently enforces the former (if it tells the guest at all that
> > > > > there is an L3 cache), but not the latter.  
> > > > 
> > > > Ok.  Correct me if I'm wrong, but doesn't ACPI describe the NUMA
> > > > architecture in terms of this thread/core/socket heirarchy?  That's
> > > > not true for PAPR, where the NUMA topology is described in an
> > > > independent set of (potentially arbitrarily nested) nodes.  
> > > 
> > > On PC, ACPI NUMA information only refer to CPU APIC IDs, which
> > > identify individual CPU threads; it doesn't care about CPU
> > > socket/core/thread topology.  If I'm not mistaken, the
> > > socket/core/thread topology is not represented in ACPI at all.
> > ACPI does node mapping per logical cpu (thread) in SRAT table,
> > so virtually we are able to describe insane configurations.
> > That however doesn't mean that we should go outside of
> > what real hw does and confuse guest which may have certain
> > expectations.
> 
> Agreed.
> 
> > 
> > Currently for x86 expectations are that cpus are mapped to numa
> > nodes either by whole cores or whole sockets (AMD and Intel cpus
> > respectively). In future it might change.
> > 
> > 
> > > Some guest OSes, however, may get very confused if they see an
> > > unexpected NUMA/CPU topology.  IIRC, it was possible to make old
> > > Linux kernel versions panic by generating a weird topology.
> > 
> > There where bugs that where fixed on QEMU or guest kernel side
> > when unexpected mapping were present. While we can 'fix' guest
> > expectation in linux kernel it might be not possible for other
> > OSes one more reason we shouldn't allow blind assignment by mgmt.
> 
> One problem with blocking arbitrary assignment is the possibility
> of breaking existing VM configurations.  We could enforce the new
> rules only on newer machine-types, although this means an
> existing VM configuration may stop being runnable after updating
> the machine-type.

We should also be wary of blocking something just because some guest OS
are unhappy. Other guest OS may be perfectly OK with the configuration
and shouldn't be prevented from using it if their admin wants it.

IOW, we should only consider blocking things that are disallowed
by relevant specs, or would impose functional or security problems
in the host. If it is merely that some guest OS are unhappy with
certain configs, that's just a docs problem (eg Windows won't use
more than 2 sockets in many versions, but we shouldn't block use
of more than 2 sockets of course).


Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



reply via email to

[Prev in Thread] Current Thread [Next in Thread]