qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH V9 06/12] NUMA: Add Linux libnuma detection


From: Andrew Jones
Subject: Re: [Qemu-devel] [PATCH V9 06/12] NUMA: Add Linux libnuma detection
Date: Thu, 29 Aug 2013 04:31:53 -0400 (EDT)


----- Original Message -----
> 
> 
> ----- Original Message -----
> > On 08/28/2013 09:44 PM, Paolo Bonzini wrote:
> > > Il 26/08/2013 10:43, Andrew Jones ha scritto:
> > >>
> > >> ----- Original Message -----
> > >>>> On 08/26/2013 03:46 PM, Andrew Jones wrote:
> > >>>>>>>>>> Is this patch still necessary? I thought that dropping the
> > >>>>>>>>>>>>>> numa_num_configured_nodes() calls from patch 8/12 got rid
> > >>>>>>>>>>>>>> of the need for this library. Maybe I missed other uses?
> > >>>>>>>>>>
> > >>>>>>>>>> Yes, in 08/12 we also use mbind(),
> > >>>>>> You don't need a whole library for mbind(), it's a syscall. See
> > >>>>>> syscall(2).
> > >>>>>>
> > >>>>>>>>>> and in 09/12 we use max_numa_node().
> > >>>>>> Really? I didn't see it there. And anyway, that goes back to our
> > >>>>>> discussion
> > >>>>>> about setting qemu's MAX_NODES to whatever we think qemu should
> > >>>>>> support,
> > >>>>>> and then just checking that we don't blow that limit whenever
> > >>>>>> reading
> > >>>>>> host node info, i.e.
> > >>>>>>
> > >>>>>> maxnode = 0;
> > >>>>>> while (host_nodes[maxnode] && maxnode < MAX_NODES)
> > >>>>>>   node_read(&info[maxnode++]);
> > >>>>>>
> > >>>>>> type of a thing.
> > >>>>>>
> > >>>>>> And, if there's a place you really need to know the current online
> > >>>>>> number
> > >>>>>> of host nodes, then, like I said earlier, you should just go to
> > >>>>>> sysfs
> > >>>>>> yourself. libnuma:numa_max_node() returns an int that it only
> > >>>>>> initializes
> > >>>>>> at library load time, so it's not going to adapt to
> > >>>>>> onlining/offlining.
> > >>>>
> > >>>> OK, thank you.
> > >>>> Then I should define MPOL_* macros in QEMU and use mbind(2) syscall
> > >>>> directly,
> > >>>> right?
> > >> Hmm, yeah, that's too bad that numaif.h is part of libnuma, and not a
> > >> more
> > >> general lib. Whether or not we want to redefine those symbols within
> > >> qemu, in order to avoid the dependency on installing numactl-devel,
> > >> isn't
> > >> something I can answer. That's a better question for Anthony. Anthony?
> > >> Paolo,
> > >> any opinions? Maybe we should pick up uapi/linux/mempolicy.h with the
> > >> linux-header synch script?
> > >>
> > > 
> > > I think using libnuma is fine.  In principle this could be used on other
> > > OSes than Linux, I think?
> > 
> > But seems that mbind(2) is Linux-specific syscall, right?
> > 
> 
> You would need to avoid directly calling mbind, i.e. use libnuma for all
> numa related calls. Then, if libnuma were to support more OSes, qemu would
> automatically (wrt to numa) as well. Your mbind() with libnuma would look
> like this
> 
> numa_set_bind_policy(strict)
> numa_tonodemask_memory(addr, size, nodemask)
> 
> The problem is that set_bind_policy only takes a bool, and thus only
> allows two of the four possibly policies
> 
> MPOL_BIND        strict == 1
> MPOL_PREFERRED   strict == 0
> 

Ah, there is a way to get interleave policy

if (policy == interleave) {
   numa_interleave_memory(addr, size, nodemask)
} else {
   numa_set_bind_policy(strict)
   numa_tonodemask_memory(addr, size, nodemask)
}

a bit clunky. And I still don't see a way to select MPOL_DEFAULT, nor a way to
use any additional flags, such as MPOL_F_RELATIVE_NODES.


> So, due to libnuma's policy setting limitations, and the fact it doesn't
> currently support more OSes than Linux, then I prefer your current
> series version that drops libnuma. If qemu will need to support NUMA on
> another OS, then we can cross this bridge when we get there.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]