qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v3.1 25/31] hostmem: add properties for NUMA mem


From: Hu Tao
Subject: Re: [Qemu-devel] [PATCH v3.1 25/31] hostmem: add properties for NUMA memory policy
Date: Mon, 9 Jun 2014 10:12:07 +0800
User-agent: Mutt/1.5.21 (2010-09-15)

On Fri, Jun 06, 2014 at 01:15:28PM -0300, Eduardo Habkost wrote:
> On Fri, Jun 06, 2014 at 11:37:26AM +0800, Hu Tao wrote:
> > On Mon, May 19, 2014 at 08:34:54PM -0300, Eduardo Habkost wrote:
> > > On Tue, May 06, 2014 at 05:27:46PM +0800, Hu Tao wrote:
> > > [...]
> > > > @@ -203,6 +296,20 @@ host_memory_backend_memory_init(UserCreatable *uc, 
> > > > Error **errp)
> > > >      if (backend->prealloc) {
> > > >          os_mem_prealloc(memory_region_get_fd(&backend->mr), ptr, sz);
> > > >      }
> > > > +
> > > > +#ifdef CONFIG_NUMA
> > > > +    unsigned long maxnode = find_last_bit(backend->host_nodes, 
> > > > MAX_NODES);
> > > > +
> > > > +    /* This is a workaround for a long standing bug in Linux'
> > > > +     * mbind implementation, which cuts off the last specified
> > > > +     * node.
> > > > +     */
> > > 
> > > What if the bug is fixed? mbind() documentation says "nodemask points to
> > 
> > No it won't, otherwise softwares depend on mbind() will break.
> > 
> > > a bit mask of nodes containing up to maxnode bits", so we must ensure
> > > backend->host_nodes has the one extra bit.
> > 
> > Yes.
> > 
> > > 
> > > Also, if no bit is set, we can pass nodemask=NULL or maxnode=0 as
> > > argument.
> > > 
> > > We could address both issues, and do this:
> > > 
> > >     struct HostMemoryBackend { [...]
> > >         DECLARE_BITMAP(host_nodes, MAX_NODES + 1);
> > >     [...]
> > >     lastbit = find_last_bit(backend->host_nodes, MAX_NODES);
> > >     /* lastbit == MAX_NODES means maxnode=0 */
> > >     maxnode = (lastbit + 1) % (MAX_NODES + 1);
> > >     /* We can have up to MAX_NODES nodes, but we need to pass maxnode+1
> > >      * as argument to mbind() due to an old Linux bug (feature?) which
> > >      * cuts off the last specified node. This means backend->host_nodes
> > >      * must have MAX_NODES+1 bits available.
> > >      */
> > >     assert(sizeof(backend->host_nodes) >= BITS_TO_LONGS(MAX_NODES + 1) * 
> > > sizeof(unsigned long));
> > >     assert(maxnode <= MAX_NODES);
> > 
> > I think we can just omit these two asserts since they are guaranteed to
> > be true.
> 
> asserts() must be always guaranteed to be true, that's the whole point
> of using them. They can detect subtle off-by-one bugs if somebody
> introduces them in the future.
> 
> > 
> > >     mbind(ptr, sz, policy, maxnode ? backend->host_nodes : NULL, maxnode 
> > > + 1, flags);
> > > 
> > > 
> > > (I am starting to wonder if it was worth dropping the libnuma
> > > requirement and implementing our own mbind()-calling code.)
> > > 
> > > > +    if (mbind(ptr, sz, backend->policy, backend->host_nodes, maxnode + 
> > > > 2, 0)) {
> > > > +        error_setg_errno(errp, errno,
> > > > +                         "cannot bind memory to host NUMA nodes");
> > > 
> > > Don't we want to set flags to MPOL_MF_STRICT here? I believe we
> > > shouldn't have any pages preallocated at this point, but in case we do,
> > > I would expect them to be moved instead of ignoring the policy set by
> > > the user.
> > 
> > MPOL_MF_STRICT | MPOL_MF_MOVE to move. Actually in this version the
> > preallocation happens before mbind, which is fixed in v3.2.
> 
> If memory was already allocated in a different node and has to be moved
> that early, that's a bug we want to detect and fix (instead of
> triggering useles memory moves). So I would use only MPOL_MF_STRICT.

Fair enough. But what about huge pages? As man page says, MPOL_MF_STRICT
is ignored on huge page mappings. Is leaving a comment at the place of
memory preallocation to warn people against alocating memory before mbind
(like it's done in v3.2) the only thing we can do?

Regards,
Hu



reply via email to

[Prev in Thread] Current Thread [Next in Thread]