qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [RFC] x86: Allow to set NUMA distance for different NUM


From: Eduardo Habkost
Subject: Re: [Qemu-devel] [RFC] x86: Allow to set NUMA distance for different NUMA nodes
Date: Fri, 3 Mar 2017 13:47:51 -0300
User-agent: Mutt/1.7.1 (2016-10-04)

On Fri, Mar 03, 2017 at 04:26:12PM +0000, Daniel P. Berrange wrote:
> On Fri, Mar 03, 2017 at 10:09:22AM -0600, Eric Blake wrote:
> > On 03/03/2017 07:57 AM, Eduardo Habkost wrote:
> > 
> > >> With this patch, when a user wants to create a guest that contains
> > >> several vNUMA nodes and also wants to set distance among those nodes,
> > >> the QEMU command would like:
> > >>
> > >> ```
> > >> -object 
> > >> memory-backend-ram,size=1G,prealloc=yes,host-nodes=0,policy=bind,id=node0
> > >>  \
> > >> -numa 
> > >> node,nodeid=0,cpus=0,memdev=node0,distance=10,distance=21,distance=31,distance=41
> > >>  \
> > 
> > > 
> > > It would be nice to have a more intuitive syntax to represent
> > > ordered lists in QemuOpts. But this is what we have today.
> > > 
> > 
> > Markus has the discussion on representing arrays via the command line;
> > particularly since this array is very tightly coupled to the order in
> > which values are presented, it may be worth having:
> > 
> > -numa
> > node,nodeid=0,cpus=0,memdev=nod0,distance.0=10,distance.1=21,distance.2=31,distance.3=41
> > 
> > with the explicit distance.0= suffixes to distance making it more
> > obvious that we are dealing with an array.
> > 
> > > I think the proposal makes sense. I would like the semantics of the new 
> > > option
> > > to be documented at qapi-schema.json and qemu-options.hx.
> > > 
> > > I would call the new NumaNodeOptions field "distances", as it is
> > > a list of distances.
> > 
> > Indeed, Markus is trying (with his work on -blockdev for 2.9) to get the
> > command line to a point where it is identical to the QMP code, by
> > reusing qapi-schema.json, so we should very much keep that in mind with
> > whatever we add to -numa in 2.10.
> > 
> > 
> > > but in the future we could support something like:
> > > 
> > >   -numa node,nodeid=0,cpus=0,memdev=node0 \
> > >   -numa node,nodeid=1,cpus=1,memdev=node1 \
> > >   -numa node,nodeid=2,cpus=2,memdev=node2 \
> > >   -numa node,nodeid=3,cpus=3,memdev=node3 \
> > >   -numa 
> > > distances,distances[0][0]=10,distances[0][1]=21,distances[0][2]=31,distances[0][3]=41,\
> > >                   
> > > distances[1][0]=21,distances[1][1]=10,distances[1][2]=21,distances[1][3]=31,\
> > >                   
> > > distances[2][0]=31,distances[2][1]=21,distances[2][2]=10,distances[2][3]=21,\
> > >                   
> > > distances[3][0]=41,distances[3][1]=31,distances[3][2]=21,distances[3][3]=10
> > 
> > Except that [] requires special shell quoting, so the proposal would be
> > more like:
> > 
> > -numa distances.0.0=10,distances.0.1=21
> > 
> > Right now, QMP doesn't support 2-D arrays (although this may be a good
> > reason to introduce support), so that's also something to think about
> > (not insurmountable, but makes the task more complex).
> 
> What I don't like about this syntax is that it is duplicating information
> twice. IIUC the NUMA distance information is unidirectional, so specifying
> the same data for both direetions (node 0 -> node 3, and node 3 -> node 0)
> looks like overkill. Also the self-node distance isi defined to always be
> 10 IIUC, so specifying that is not required. IOW, could cut down the data
> we need to provider to just
> 
>    -numa distances,nodea=0,nodeb=1,value=20
>    -numa distances,nodea=0,nodeb=2,value=20
>    -numa distances,nodea=0,nodeb=3,value=20
>    -numa distances,nodea=1,nodeb=2,value=20
>    -numa distances,nodea=1,nodeb=3,value=20
>    -numa distances,nodea=2,nodeb=3,value=20

The ACPI spec (I'm looking at revision 5.0) explicitly mentions
that A->B distance may be different from B->A distrance:

"The entry value is a one-byte unsigned integer. The relative
distance from System Locality i to System Locality j is the
i*N + j entry in the matrix, where N is the number of System
Localities.  Except for the relative distance from a System
Locality to itself, each relative distance is stored twice in the
matrix. This provides the capability to describe the scenario
where the relative distances for the two directions between
System Localities is different."

But I agree we could figure out a more compact syntax for more
common cases where self-node distance is 10 and distance is the
same both ways.

-- 
Eduardo



reply via email to

[Prev in Thread] Current Thread [Next in Thread]