[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH v4 44/47] Postcopy; Handle userfault requests
From: |
David Gibson |
Subject: |
Re: [Qemu-devel] [PATCH v4 44/47] Postcopy; Handle userfault requests |
Date: |
Tue, 27 Jan 2015 15:33:12 +1100 |
User-agent: |
Mutt/1.5.23 (2014-03-12) |
On Mon, Jan 05, 2015 at 05:13:50PM +0000, Dr. David Alan Gilbert wrote:
> * David Gibson (address@hidden) wrote:
> > On Fri, Oct 03, 2014 at 06:47:50PM +0100, Dr. David Alan Gilbert (git)
> > wrote:
> > > From: "Dr. David Alan Gilbert" <address@hidden>
> > >
> > > userfaultfd is a Linux syscall that gives an fd that receives a stream
> > > of notifications of accesses to pages marked as MADV_USERFAULT, and
> > > allows the program to acknowledge those stalls and tell the accessing
> > > thread to carry on.
> > >
> > > Signed-off-by: Dr. David Alan Gilbert <address@hidden>
> >
> > [snip]
> > > /*
> > > + * Tell the kernel that we've now got some memory it previously asked
> > > for.
> > > + * Note: We're not allowed to ack a page which wasn't requested.
> > > + */
> > > +static int ack_userfault(MigrationIncomingState *mis, void *start,
> > > size_t len)
> > > +{
> > > + uint64_t tmp[2];
> > > +
> > > + /*
> > > + * Kernel wants the range that's now safe to access
> > > + * Note it always takes 64bit values, even on a 32bit host.
> > > + */
> > > + tmp[0] = (uint64_t)(uintptr_t)start;
> > > + tmp[1] = (uint64_t)(uintptr_t)start + (uint64_t)len;
> > > +
> > > + if (write(mis->userfault_fd, tmp, 16) != 16) {
> > > + int e = errno;
> >
> > Is an EOF (i.e. write() returns 0) ever possible here? If so errno
> > may not have a meaningful value.
>
> I don't think so; I think any !=16 case is an error; however if I understand
> correctly the safe thing to do is for me to do an
>
> errno = 0
>
> before the call.
Either that, or handle unexpected EOF / short write as a different case.
>
> >
> > > + if (e == ENOENT) {
> > > + /* Kernel said it wasn't waiting - one case where this can
> > > + * happen is where two threads triggered the userfault
> > > + * and we receive the page and ack it just after we received
> > > + * the 2nd request and that ends up deciding it should ack it
> > > + * We could optimise it out, but it's rare.
> > > + */
> > > + /*fprintf(stderr, "ack_userfault: %p/%zx ENOENT\n", start,
> > > len); */
> > > + return 0;
> > > + }
> > > + error_report("postcopy_ram: Failed to notify kernel for %p/%zx
> > > (%d)",
> > > + start, len, e);
> > > + return -errno;
>
> Hmm, and made that return -e
Ah, yes, otherwise it's very likely that error_report() will clobber
the value.
> > > +/*
> > > * Handle faults detected by the USERFAULT markings
> > > */
> > > static void *postcopy_ram_fault_thread(void *opaque)
> > > {
> > > MigrationIncomingState *mis = (MigrationIncomingState *)opaque;
> > > + void *hostaddr;
> > > + int ret;
> > > + size_t hostpagesize = getpagesize();
> > > + RAMBlock *rb = NULL;
> > > + RAMBlock *last_rb = NULL; /* last RAMBlock we sent part of */
> > >
> > > - fprintf(stderr, "postcopy_ram_fault_thread\n");
> > > - /* TODO: In later patch */
> > > + DPRINTF("%s", __func__);
> > > qemu_sem_post(&mis->fault_thread_sem);
> > > - while (1) {
> > > - /* TODO: In later patch */
> > > - }
> > > + while (true) {
> > > + PostcopyPMIState old_state, tmp_state;
> > > + ram_addr_t rb_offset;
> > > + ram_addr_t in_raspace;
> > > + unsigned long bitmap_index;
> > > + struct pollfd pfd[2];
> > > +
> > > + /*
> > > + * We're mainly waiting for the kernel to give us a faulting HVA,
> > > + * however we can be told to quit via userfault_quit_fd which is
> > > + * an eventfd
> > > + */
> > > + pfd[0].fd = mis->userfault_fd;
> > > + pfd[0].events = POLLIN;
> > > + pfd[0].revents = 0;
> > > + pfd[1].fd = mis->userfault_quit_fd;
> > > + pfd[1].events = POLLIN; /* Waiting for eventfd to go positive */
> > > + pfd[1].revents = 0;
> > > +
> > > + if (poll(pfd, 2, -1 /* Wait forever */) == -1) {
> > > + perror("userfault poll");
> > > + break;
> > > + }
> > >
> > > + if (pfd[1].revents) {
> > > + DPRINTF("%s got quit event", __func__);
> > > + break;
> >
> > I don't see any cleanup path in the userfault thread. So wouldn't it
> > be simpler to just pthread_cancel() it rather than using an extra fd
> > for quit notifications.
>
> But it does call functions that take locks (both the pmi and the
> return path qemu-file), so I don't feel comfortable just cancelling the
> thread.
Ah, good point. Use of an event restrict the points at which the
thread can exit, which is significant.
--
David Gibson | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson
pgp0fDmMArFp6.pgp
Description: PGP signature