qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [RFC 23/29] vub+postcopy: madvises


From: Dr. David Alan Gilbert
Subject: Re: [Qemu-devel] [RFC 23/29] vub+postcopy: madvises
Date: Thu, 10 Aug 2017 09:55:55 +0100
User-agent: Mutt/1.8.3 (2017-05-23)

* Alexey Perevalov (address@hidden) wrote:
> On 08/08/2017 08:06 PM, Dr. David Alan Gilbert wrote:
> > * Alexey Perevalov (address@hidden) wrote:
> > > On 06/28/2017 10:00 PM, Dr. David Alan Gilbert (git) wrote:
> > > > From: "Dr. David Alan Gilbert" <address@hidden>
> > > > 
> > > > Clear the area and turn off THP.
> > > > 
> > > > Signed-off-by: Dr. David Alan Gilbert <address@hidden>
> > > > ---
> > > >    contrib/libvhost-user/libvhost-user.c | 32 
> > > > ++++++++++++++++++++++++++++++--
> > > >    1 file changed, 30 insertions(+), 2 deletions(-)
> > > > 
> > > > diff --git a/contrib/libvhost-user/libvhost-user.c 
> > > > b/contrib/libvhost-user/libvhost-user.c
> > > > index 0658b6e847..ceddeac74f 100644
> > > > --- a/contrib/libvhost-user/libvhost-user.c
> > > > +++ b/contrib/libvhost-user/libvhost-user.c
> > > > @@ -451,11 +451,39 @@ vu_set_mem_table_exec(VuDev *dev, VhostUserMsg 
> > > > *vmsg)
> > > >            }
> > > >            if (dev->postcopy_listening) {
> > > > +            int ret;
> > > >                /* We should already have an open ufd need to mark each 
> > > > memory
> > > >                 * range as ufd.
> > > > -             * Note: Do we need any madvises? Well it's not been 
> > > > accessed
> > > > -             * yet, still probably need no THP to be safe, discard to 
> > > > be safe?
> > > >                 */
> > > > +
> > > > +            /* Discard any mapping we have here; note I can't use 
> > > > MADV_REMOVE
> > > > +             * or fallocate to make the hole since I don't want to lose
> > > > +             * data that's already arrived in the shared process.
> > > > +             * TODO: How to do hugepage
> > > > +             */
> > > Hi, David, frankly saying, I stuck with my solution, and I have also 
> > > another
> > > issues,
> > > but here I could suggest solution for hugepages. I think we could 
> > > transmit a
> > > received pages
> > > bitmap in VHOST_USER_SET_MEM_TABLE (VhostUserMemoryRegion), but it will
> > > raise a compatibility issue,
> > > or introduce special message type for that and send it before
> > > VHOST_USER_SET_MEM_TABLE.
> > > So it will be  possible to do fallocate on received bitmap basis, just 
> > > skip
> > > already copied pages.
> > > If you wish, I could send patches, rebased on yours, for doing it.
> > What we found works is that actually we don't need to do a discard -
> > since we've only just done the mmap of the arena, nothing will be
> > occupying it on the shared client, so we don't need to discard.
> Looks like yes, I checked on kernel from Andrea's git,
> there is any more EEXIST error in case when client doesn't
> fallocate.
> 
> > 
> > We've had a postcopy migrate work now, with a few hacks we're still
> > cleaning up, both on vhost-user-bridge and dpdk; so I'll get this
> > updated and reposted.
> In you patch series vring is disabling in case of VHOST_USER_GET_VRING_BASE.
> It's being called when vhost-user server want's to stop vring.
> QEMU is enabling vring as soon as virtual machine is started, so I didn't
> see
> explicit vring disabling for migrating VRING.
> So migrating VRING is protected just by uffd_register, isn't it? And PMD
> thread (any
> vhost-user thread which accessing migrating VRING) will wait page copying in
> this case,
> right?

Yes I believe that's the case; although I don't know the structure of
dpdk to know the effect of that.

Dave

> 
> > 
> > Dave
> > 
> > > > +            ret = madvise((void *)dev_region->mmap_addr,
> > > > +                          dev_region->size + dev_region->mmap_offset,
> > > > +                          MADV_DONTNEED);
> > > > +            if (ret) {
> > > > +                fprintf(stderr,
> > > > +                        "%s: Failed to madvise(DONTNEED) region %d: 
> > > > %s\n",
> > > > +                        __func__, i, strerror(errno));
> > > > +            }
> > > > +            /* Turn off transparent hugepages so we dont get lose 
> > > > wakeups
> > > > +             * in neighbouring pages.
> > > > +             * TODO: Turn this backon later.
> > > > +             */
> > > > +            ret = madvise((void *)dev_region->mmap_addr,
> > > > +                          dev_region->size + dev_region->mmap_offset,
> > > > +                          MADV_NOHUGEPAGE);
> > > > +            if (ret) {
> > > > +                /* Note: This can happen legally on kernels that are 
> > > > configured
> > > > +                 * without madvise'able hugepages
> > > > +                 */
> > > > +                fprintf(stderr,
> > > > +                        "%s: Failed to madvise(NOHUGEPAGE) region %d: 
> > > > %s\n",
> > > > +                        __func__, i, strerror(errno));
> > > > +            }
> > > >                struct uffdio_register reg_struct;
> > > >                /* Note: We might need to go back to using mmap_addr and
> > > >                 * len + mmap_offset for * huge pages, but then we do 
> > > > hope not to
> > > 
> > > -- 
> > > Best regards,
> > > Alexey Perevalov
> > --
> > Dr. David Alan Gilbert / address@hidden / Manchester, UK
> > 
> > 
> > 
> 
> -- 
> Best regards,
> Alexey Perevalov
--
Dr. David Alan Gilbert / address@hidden / Manchester, UK



reply via email to

[Prev in Thread] Current Thread [Next in Thread]