qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [RFC] x86: Allow to set NUMA distance for different NUM


From: Daniel P. Berrange
Subject: Re: [Qemu-devel] [RFC] x86: Allow to set NUMA distance for different NUMA nodes
Date: Fri, 3 Mar 2017 16:52:18 +0000
User-agent: Mutt/1.7.1 (2016-10-04)

On Fri, Mar 03, 2017 at 01:47:51PM -0300, Eduardo Habkost wrote:
> On Fri, Mar 03, 2017 at 04:26:12PM +0000, Daniel P. Berrange wrote:
> > On Fri, Mar 03, 2017 at 10:09:22AM -0600, Eric Blake wrote:
> > > On 03/03/2017 07:57 AM, Eduardo Habkost wrote:
> > > 
> > > >> With this patch, when a user wants to create a guest that contains
> > > >> several vNUMA nodes and also wants to set distance among those nodes,
> > > >> the QEMU command would like:
> > > >>
> > > >> ```
> > > >> -object 
> > > >> memory-backend-ram,size=1G,prealloc=yes,host-nodes=0,policy=bind,id=node0
> > > >>  \
> > > >> -numa 
> > > >> node,nodeid=0,cpus=0,memdev=node0,distance=10,distance=21,distance=31,distance=41
> > > >>  \
> > > 
> > > > 
> > > > It would be nice to have a more intuitive syntax to represent
> > > > ordered lists in QemuOpts. But this is what we have today.
> > > > 
> > > 
> > > Markus has the discussion on representing arrays via the command line;
> > > particularly since this array is very tightly coupled to the order in
> > > which values are presented, it may be worth having:
> > > 
> > > -numa
> > > node,nodeid=0,cpus=0,memdev=nod0,distance.0=10,distance.1=21,distance.2=31,distance.3=41
> > > 
> > > with the explicit distance.0= suffixes to distance making it more
> > > obvious that we are dealing with an array.
> > > 
> > > > I think the proposal makes sense. I would like the semantics of the new 
> > > > option
> > > > to be documented at qapi-schema.json and qemu-options.hx.
> > > > 
> > > > I would call the new NumaNodeOptions field "distances", as it is
> > > > a list of distances.
> > > 
> > > Indeed, Markus is trying (with his work on -blockdev for 2.9) to get the
> > > command line to a point where it is identical to the QMP code, by
> > > reusing qapi-schema.json, so we should very much keep that in mind with
> > > whatever we add to -numa in 2.10.
> > > 
> > > 
> > > > but in the future we could support something like:
> > > > 
> > > >   -numa node,nodeid=0,cpus=0,memdev=node0 \
> > > >   -numa node,nodeid=1,cpus=1,memdev=node1 \
> > > >   -numa node,nodeid=2,cpus=2,memdev=node2 \
> > > >   -numa node,nodeid=3,cpus=3,memdev=node3 \
> > > >   -numa 
> > > > distances,distances[0][0]=10,distances[0][1]=21,distances[0][2]=31,distances[0][3]=41,\
> > > >                   
> > > > distances[1][0]=21,distances[1][1]=10,distances[1][2]=21,distances[1][3]=31,\
> > > >                   
> > > > distances[2][0]=31,distances[2][1]=21,distances[2][2]=10,distances[2][3]=21,\
> > > >                   
> > > > distances[3][0]=41,distances[3][1]=31,distances[3][2]=21,distances[3][3]=10
> > > 
> > > Except that [] requires special shell quoting, so the proposal would be
> > > more like:
> > > 
> > > -numa distances.0.0=10,distances.0.1=21
> > > 
> > > Right now, QMP doesn't support 2-D arrays (although this may be a good
> > > reason to introduce support), so that's also something to think about
> > > (not insurmountable, but makes the task more complex).
> > 
> > What I don't like about this syntax is that it is duplicating information
> > twice. IIUC the NUMA distance information is unidirectional, so specifying
> > the same data for both direetions (node 0 -> node 3, and node 3 -> node 0)
> > looks like overkill. Also the self-node distance isi defined to always be
> > 10 IIUC, so specifying that is not required. IOW, could cut down the data
> > we need to provider to just
> > 
> >    -numa distances,nodea=0,nodeb=1,value=20
> >    -numa distances,nodea=0,nodeb=2,value=20
> >    -numa distances,nodea=0,nodeb=3,value=20
> >    -numa distances,nodea=1,nodeb=2,value=20
> >    -numa distances,nodea=1,nodeb=3,value=20
> >    -numa distances,nodea=2,nodeb=3,value=20
> 
> The ACPI spec (I'm looking at revision 5.0) explicitly mentions
> that A->B distance may be different from B->A distrance:
> 
> "The entry value is a one-byte unsigned integer. The relative
> distance from System Locality i to System Locality j is the
> i*N + j entry in the matrix, where N is the number of System
> Localities.  Except for the relative distance from a System
> Locality to itself, each relative distance is stored twice in the
> matrix. This provides the capability to describe the scenario
> where the relative distances for the two directions between
> System Localities is different."

Ah interesting, learn something new every day ? I've only made
that unidirectional assumption for the last 10 years ;-P

> But I agree we could figure out a more compact syntax for more
> common cases where self-node distance is 10 and distance is the
> same both ways.

QAPI would need a specialized numeric matrix type, which we could
efficiently map into some CLI syntax, in order to avoid needing to
tickle the rather verbose general purpose list syntax. Probably
not worth the hassle though - rather than just picking shorter
variable names eg

  -numa dist,a=0,b=1,val=3

instead of

  -numa distances,nodea=0,nodeb=1,value=20

Regards,
Daniel
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://entangle-photo.org       -o-    http://search.cpan.org/~danberr/ :|



reply via email to

[Prev in Thread] Current Thread [Next in Thread]