qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [RFC] x86: Allow to set NUMA distance for different NUM


From: Daniel P. Berrange
Subject: Re: [Qemu-devel] [RFC] x86: Allow to set NUMA distance for different NUMA nodes
Date: Fri, 3 Mar 2017 17:12:22 +0000
User-agent: Mutt/1.7.1 (2016-10-04)

On Fri, Mar 03, 2017 at 02:10:50PM -0300, Eduardo Habkost wrote:
> On Fri, Mar 03, 2017 at 04:52:18PM +0000, Daniel P. Berrange wrote:
> > On Fri, Mar 03, 2017 at 01:47:51PM -0300, Eduardo Habkost wrote:
> > > On Fri, Mar 03, 2017 at 04:26:12PM +0000, Daniel P. Berrange wrote:
> > > > On Fri, Mar 03, 2017 at 10:09:22AM -0600, Eric Blake wrote:
> > > > > On 03/03/2017 07:57 AM, Eduardo Habkost wrote:
> > > > > 
> > > > > >> With this patch, when a user wants to create a guest that contains
> > > > > >> several vNUMA nodes and also wants to set distance among those 
> > > > > >> nodes,
> > > > > >> the QEMU command would like:
> > > > > >>
> > > > > >> ```
> > > > > >> -object 
> > > > > >> memory-backend-ram,size=1G,prealloc=yes,host-nodes=0,policy=bind,id=node0
> > > > > >>  \
> > > > > >> -numa 
> > > > > >> node,nodeid=0,cpus=0,memdev=node0,distance=10,distance=21,distance=31,distance=41
> > > > > >>  \
> > > > > 
> > > > > > 
> > > > > > It would be nice to have a more intuitive syntax to represent
> > > > > > ordered lists in QemuOpts. But this is what we have today.
> > > > > > 
> > > > > 
> > > > > Markus has the discussion on representing arrays via the command line;
> > > > > particularly since this array is very tightly coupled to the order in
> > > > > which values are presented, it may be worth having:
> > > > > 
> > > > > -numa
> > > > > node,nodeid=0,cpus=0,memdev=nod0,distance.0=10,distance.1=21,distance.2=31,distance.3=41
> > > > > 
> > > > > with the explicit distance.0= suffixes to distance making it more
> > > > > obvious that we are dealing with an array.
> > > > > 
> > > > > > I think the proposal makes sense. I would like the semantics of the 
> > > > > > new option
> > > > > > to be documented at qapi-schema.json and qemu-options.hx.
> > > > > > 
> > > > > > I would call the new NumaNodeOptions field "distances", as it is
> > > > > > a list of distances.
> > > > > 
> > > > > Indeed, Markus is trying (with his work on -blockdev for 2.9) to get 
> > > > > the
> > > > > command line to a point where it is identical to the QMP code, by
> > > > > reusing qapi-schema.json, so we should very much keep that in mind 
> > > > > with
> > > > > whatever we add to -numa in 2.10.
> > > > > 
> > > > > 
> > > > > > but in the future we could support something like:
> > > > > > 
> > > > > >   -numa node,nodeid=0,cpus=0,memdev=node0 \
> > > > > >   -numa node,nodeid=1,cpus=1,memdev=node1 \
> > > > > >   -numa node,nodeid=2,cpus=2,memdev=node2 \
> > > > > >   -numa node,nodeid=3,cpus=3,memdev=node3 \
> > > > > >   -numa 
> > > > > > distances,distances[0][0]=10,distances[0][1]=21,distances[0][2]=31,distances[0][3]=41,\
> > > > > >                   
> > > > > > distances[1][0]=21,distances[1][1]=10,distances[1][2]=21,distances[1][3]=31,\
> > > > > >                   
> > > > > > distances[2][0]=31,distances[2][1]=21,distances[2][2]=10,distances[2][3]=21,\
> > > > > >                   
> > > > > > distances[3][0]=41,distances[3][1]=31,distances[3][2]=21,distances[3][3]=10
> > > > > 
> > > > > Except that [] requires special shell quoting, so the proposal would 
> > > > > be
> > > > > more like:
> > > > > 
> > > > > -numa distances.0.0=10,distances.0.1=21
> > > > > 
> > > > > Right now, QMP doesn't support 2-D arrays (although this may be a good
> > > > > reason to introduce support), so that's also something to think about
> > > > > (not insurmountable, but makes the task more complex).
> > > > 
> > > > What I don't like about this syntax is that it is duplicating 
> > > > information
> > > > twice. IIUC the NUMA distance information is unidirectional, so 
> > > > specifying
> > > > the same data for both direetions (node 0 -> node 3, and node 3 -> node 
> > > > 0)
> > > > looks like overkill. Also the self-node distance isi defined to always 
> > > > be
> > > > 10 IIUC, so specifying that is not required. IOW, could cut down the 
> > > > data
> > > > we need to provider to just
> > > > 
> > > >    -numa distances,nodea=0,nodeb=1,value=20
> > > >    -numa distances,nodea=0,nodeb=2,value=20
> > > >    -numa distances,nodea=0,nodeb=3,value=20
> > > >    -numa distances,nodea=1,nodeb=2,value=20
> > > >    -numa distances,nodea=1,nodeb=3,value=20
> > > >    -numa distances,nodea=2,nodeb=3,value=20
> > > 
> > > The ACPI spec (I'm looking at revision 5.0) explicitly mentions
> > > that A->B distance may be different from B->A distrance:
> > > 
> > > "The entry value is a one-byte unsigned integer. The relative
> > > distance from System Locality i to System Locality j is the
> > > i*N + j entry in the matrix, where N is the number of System
> > > Localities.  Except for the relative distance from a System
> > > Locality to itself, each relative distance is stored twice in the
> > > matrix. This provides the capability to describe the scenario
> > > where the relative distances for the two directions between
> > > System Localities is different."
> > 
> > Ah interesting, learn something new every day ? I've only made
> > that unidirectional assumption for the last 10 years ;-P
> > 
> > > But I agree we could figure out a more compact syntax for more
> > > common cases where self-node distance is 10 and distance is the
> > > same both ways.
> > 
> > QAPI would need a specialized numeric matrix type, which we could
> > efficiently map into some CLI syntax, in order to avoid needing to
> > tickle the rather verbose general purpose list syntax. Probably
> > not worth the hassle though - rather than just picking shorter
> > variable names eg
> > 
> >   -numa dist,a=0,b=1,val=3
> > 
> > instead of
> > 
> >   -numa distances,nodea=0,nodeb=1,value=20
> 
> Whatever syntax/names we choose, we could have reasonable
> defaults for omitted values:
> 
> * If A->B is set and B->A is omitted, use the same value for both
>   A->B and B->A
> * If A->A is omitted, use min(10, configured_distances)

That would be nice for humans, but from libvirt POV, I doubt we'd
use that since it'd involve us adding special case code for no
particular benefit. 


Regards,
Daniel
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://entangle-photo.org       -o-    http://search.cpan.org/~danberr/ :|



reply via email to

[Prev in Thread] Current Thread [Next in Thread]