qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH 07/46] Return path: Open a return path on QEMUFi


From: Dr. David Alan Gilbert
Subject: Re: [Qemu-devel] [PATCH 07/46] Return path: Open a return path on QEMUFile for sockets
Date: Wed, 16 Jul 2014 18:10:39 +0100
User-agent: Mutt/1.5.23 (2014-03-12)

* Paolo Bonzini (address@hidden) wrote:
> Il 16/07/2014 13:52, Dr. David Alan Gilbert ha scritto:
> >* Paolo Bonzini (address@hidden) wrote:
> >>Il 16/07/2014 11:37, Dr. David Alan Gilbert ha scritto:
> >>>>>
> >>>>>+
> >>>>>+    /* If it's already open, return it */
> >>>>>+    if (qfs->file->return_path) {
> >>>>>+        return qfs->file->return_path;
> >>>>
> >>>>Wouldn't this leave a dangling file descriptor if you call
> >>>>socket_dup_return_path twice, and then close the original QEMUFile?
> >>>
> >>>Hmm - how?
> >>
> >>The problem is that there is no reference count on QEMUFile, so if you do
> >>
> >>  f1 = open_return_path(f0);
> >>  f2 = open_return_path(f0);
> >>  /* now f1 == f2 */
> >>  qemu_fclose(f1);
> >>  /* now f2 is dangling */
> >
> >I think from the way I'm using it, I can remove the optimisation, but I do
> >need to check.
> >
> >I'm not too sure what your worry is about 'f2' in this case; I guess the 
> >caller
> >needs to know that it should only close the return path once - is that
> >your worry?
> 
> Yes.  The API is not well encapsulated; a "random" caller of
> open_return_path does not know (and cannot know) whether it should close the
> returned file or not.

OK, then yes I'll give that a go taking out those optimisations.

> >I'm more nervous about dropping that one, because the current scheme
> >does provide a clean way of finding the forward path if you've got the
> >reverse (although I don't think I make use of it).
> 
> If it isn't used, why keep it?

It just felt pleasently symmetric being able to get the forward path
by asking for the return path on the return path; but I can remove it.

> >>> Source side
> >>>    Forward path - written by migration thread
> >>>           : It's OK for this to be blocking, but we end up with it being
> >>>             non-blocking, and modify the socket code to emulate blocking.
> >>
> >>This likely has a performance impact though.  The first migration thread
> >>code drop from Juan already improved throughput a lot, even if it kept the
> >>iothread all the time and only converted from nonblocking writes to
> >>blocking.
> >
> >Can you give some more reasoning as to why you think this will hit the
> >performance so much; I thought the output buffers were quite big anyway.
> 
> I don't really know, it's
> >>>    Return path  - opened by main thread, read by fd_handler on main thread
> >>>           : Must be non-blocking so as not to block the main thread while
> >>>             waiting for a partially sent command.
> >>
> >>Why can't you handle this in the migration thread (or a new postcopy thread
> >>on the source side)?  Then it can stay blocking.
> >
> >Handling it within the migration thread would make it much more complicated
> >(which would be bad since it's already complex enough);
> 
> Ok.  I'm not sure why it is more complicated since migration is essentially
> two-phase, one where the source drives it and one where the source just
> waits for requests, but I'll trust you on this. :)

It's not as clean a split like that; during the postcopy phase we still do the 
linear
page scan to send pages before they're requested, so the main migration thread
code keeps going.
(There's an 'interesting' balance here, send too many linear pages and they get
in the way of the postcopy requests and increase the latency, but sending them
means that you get a lot of the pages without having to request them which is 0
latency)

> >>> Destination side
> >>>    Forward path - read by main thread
> >>
> >>This must be nonblocking so that the monitor keeps responding.
> >
> >Interesting, I suspect none of the code in there is set up for that at the
> >moment, so how does that work during migration at the moment?
> 
> It sure is. :)

Oh so it is; I missed the 'qemu_set_nonblock(fd);' in process_incoming_migration

> On the destination side, migration is done in a coroutine (see
> process_incoming_migration) so it's all transparent.  Only socket_get_buffer
> has to know about this:
> 
>         len = qemu_recv(s->fd, buf, size, 0);
>         if (len != -1) {
>             break;
>         }
>         if (socket_error() == EAGAIN) {
>             yield_until_fd_readable(s->fd);
>         } else if (socket_error() != EINTR) {
>             break;
>         }
> 
> If the socket is put in blocking mode recv will never return EAGAIN, so this
> code will only run if the socket is nonblocking.

OK, yes.

> >Actually, I see I missed something here; this should be:
> >
> >   Destination side
> >         Forward path - read by main thread, and listener thread (see the
> >             separate mail that described that listner thread)
> >
> >and that means that once postcopy is going (and the listener thread started)
> >it can't block the monitor.
> 
> Ok, so the listener thread can do socket_set_block(qemu_get_fd(file)) once
> it gets its hands on the QEMUFile.
> 
> >>>    Return path  - opened by main thread, written by main thread AND 
> >>> postcopy
> >>>                   thread (protected by rp_mutex)
> >>
> >>When does the main thread needs to write?
> >
> >Not much; the only things the main thread currently responds to are the
> >ReqAck (ping like) requests; those are turning out to be very useful during 
> >debug;
> >I've also got the ability for the destination to send a migration result 
> >back to the
> >source which seems useful to be able to 'fail' early.
> 
> Why can't this be done in the listener thread?  (Thus transforming it into a
> more general postcopy migration thread; later we could even change incoming
> migration from a coroutine to a thread).

It depends when the ReqAck is sent; i.e. if it's received when the listener
thread is running and processing the stream then it's the listener thread that
sends the reply.

However, that's not necessarily a big issue now that you've pointed out that
the destination fd is already running non-blocking.   If the worst comes to the
worst I could just disable the ReqAck's in non-debug.

> >>If it doesn't need that, you can just switch to blocking when you process
> >>the listen command (i.e. when the postcopy thread starts).
> >
> >Why don't I just do it anyway? Prior to postcopy starting we're in the same
> >situation as we're in with precopy today, so can already get mainblock 
> >threading.
> 
> See above for the explanation.
> 
> Paolo
--
Dr. David Alan Gilbert / address@hidden / Manchester, UK



reply via email to

[Prev in Thread] Current Thread [Next in Thread]