qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH RFC 1/6] io: only allow return path for socket t


From: Peter Xu
Subject: Re: [Qemu-devel] [PATCH RFC 1/6] io: only allow return path for socket typed
Date: Fri, 19 May 2017 17:51:43 +0800
User-agent: Mutt/1.5.24 (2015-08-30)

On Fri, May 19, 2017 at 09:25:38AM +0100, Daniel P. Berrange wrote:
> On Fri, May 19, 2017 at 02:43:27PM +0800, Peter Xu wrote:
> > We don't really have a return path for the other types yet. Let's check
> > this when .get_return_path() is called.
> > 
> > For this, we introduce a new feature bit, and set it up only for socket
> > typed IO channels.
> > 
> > This will help detect earlier failure for postcopy, e.g., logically
> > speaking postcopy cannot work with "exec:". Before this patch, when we
> > try to migrate with "migrate -d exec:cat>out", we'll hang the system.
> > With this patch, we'll get:
> > 
> > (qemu) migrate -d exec:cat>out
> > Unable to open return-path for postcopy
> 
> This is wrong - post-copy migration *can* work with exec: - it just entirely
> depends on what command you are running. Your example ran a command which is
> unidirectional, but if you ran 'exec:socat ...' you would have a fully
> bidirectional channel. Actually the channel is always bi-directional, but
> 'cat' simply won't ever send data back to QEMU.

Indeed. I should not block postcopy if the user used a TCP tunnel
between the source and destination in some way, using this exec: way.
Thanks for pointing that out.

However I still think the idea is needed here. Say, we'd better know
whether the transport would be able to respond (though current
approach of "assuming sockets are the only ones that can reply" is not
a good solution...). Please see below.

> 
> If QEMU hangs when the other end doesn't send data back, that actually seems
> like a potentially serious bug in migration code. Even if using the normal
> 'tcp' migration protocol, if the target QEMU server hangs and fails to
> send data to QEMU on the return path, the source QEMU must never hang.

Firstly I should not say it's a hang - it's actually by-design here
imho - migration thread is in the last phase now, waiting for a SHUT
message from destination (which I think is wise). But from the
behavior, indeed src VM is not usable during the time, just like what
happened for most postcopy cases on the source side. So, we can see
that postcopy "assumes" that destination side can reply now.

Meanwhile, I see it reasonable for postcopy to have such an
assumption. After all, postcopy means "start VM on destination before
pages are moved over completely", then there must be someone to reply
to source, no matter whether it'll be via some kind of io channel.

That's why I think we still need the general idea here, that we need
to know whether destination end is able to reply.

But, I still have no good idea (after knowing this patch won't work)
on how we can do this... Any further suggestions would be greatly
welcomed.

Thanks,

-- 
Peter Xu



reply via email to

[Prev in Thread] Current Thread [Next in Thread]