[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH] io/channel-command: Delay the killing of the ch
From: |
Dr. David Alan Gilbert |
Subject: |
Re: [Qemu-devel] [PATCH] io/channel-command: Delay the killing of the child after closing the pipe |
Date: |
Tue, 13 Feb 2018 16:03:39 +0000 |
User-agent: |
Mutt/1.9.2 (2017-12-15) |
* Daniel P. Berrangé (address@hidden) wrote:
> On Tue, Feb 13, 2018 at 03:49:42PM +0000, Dr. David Alan Gilbert wrote:
> > * Daniel P. Berrangé (address@hidden) wrote:
> > > On Tue, Feb 13, 2018 at 03:41:45PM +0000, Dr. David Alan Gilbert wrote:
> > > > * Daniel P. Berrangé (address@hidden) wrote:
> > > > > On Tue, Feb 13, 2018 at 03:25:30PM +0000, Dr. David Alan Gilbert
> > > > > wrote:
> > > > > > * Daniel P. Berrangé (address@hidden) wrote:
> > > > > > > On Tue, Feb 13, 2018 at 03:09:12PM +0000, Dr. David Alan Gilbert
> > > > > > > wrote:
> > > > > > > > * Thomas Huth (address@hidden) wrote:
> > > > > > > > > We are currently facing some migration failure on s390x when
> > > > > > > > > running
> > > > > > > > > certain avocado tests, e.g. when running the test
> > > > > > > > > type_specific.io-github-autotest-qemu.migrate.with_reboot.exec.gzip_exec.
> > > > > > > > > This test is using 'migrate -d "exec:nc localhost 5200"' for
> > > > > > > > > the migration.
> > > > > > > > > The problem is detected at the receiving side, where the
> > > > > > > > > migration stream
> > > > > > > > > apparently ends too early. However, the cause for the problem
> > > > > > > > > is the
> > > > > > > > > sending side: After writing the migration stream into the
> > > > > > > > > pipe to netcat,
> > > > > > > > > the source QEMU calls qio_channel_command_close() which
> > > > > > > > > closes the pipe
> > > > > > > > > and immediately (!) kills the child process afterwards. So if
> > > > > > > > > the
> > > > > > > > > sending netcat did not read the final bytes from the pipe
> > > > > > > > > yet, or
> > > > > > > > > if it did not manage to send out all its buffers yet, it is
> > > > > > > > > killed
> > > > > > > > > before the whole migration stream is passed to the
> > > > > > > > > destination side.
> > > > > > > >
> > > > > > > > Thanks for tracking that down!
> > > > > > > >
> > > > > > > > > To ease the situation at least a little bit, we should give
> > > > > > > > > the child
> > > > > > > > > process at least some few more time slices before we kill it
> > > > > > > > > with
> > > > > > > > > SIGTERM and then with SIGKILL. With this change, the avocado
> > > > > > > > > test now
> > > > > > > > > succeeds here in 10 out of 10 runs.
> > > > > > > > >
> > > > > > > > > Signed-off-by: Thomas Huth <address@hidden>
> > > > > > > > > ---
> > > > > > > > > io/channel-command.c | 6 +++---
> > > > > > > > > 1 file changed, 3 insertions(+), 3 deletions(-)
> > > > > > > > >
> > > > > > > > > diff --git a/io/channel-command.c b/io/channel-command.c
> > > > > > > > > index 319c5ed..f64db3e 100644
> > > > > > > > > --- a/io/channel-command.c
> > > > > > > > > +++ b/io/channel-command.c
> > > > > > > > > @@ -177,11 +177,11 @@ static int
> > > > > > > > > qio_channel_command_abort(QIOChannelCommand *ioc,
> > > > > > > > > return -1;
> > > > > > > > > }
> > > > > > > > > } else if (ret == 0) {
> > > > > > > > > - if (step == 0) {
> > > > > > > > > + if (step == 4) {
> > > > > > > > > kill(ioc->pid, SIGTERM);
> > > > > > > > > - } else if (step == 1) {
> > > > > > > > > + } else if (step == 8) {
> > > > > > > > > kill(ioc->pid, SIGKILL);
> > > > > > > > > - } else {
> > > > > > > > > + } else if (step >= 9) {
> > > > > > > >
> > > > > > > > Hmm. This seems pretty arbitrary; if I understand correctly
> > > > > > > > you're
> > > > > > > > saying it'll get a SIGTERM after 4 (arbitrary) * 10ms
> > > > > > > > (arbitrary).
> > > > > > > >
> > > > > > > > Who is to say that's enough for a scp or gzip or the like?
> > > > > > >
> > > > > > > We could conceivably implement the qio_channel_shutdown()
> > > > > > > operation
> > > > > > > for the QIOChannelCommand class. It would merely close the FD to
> > > > > > > the
> > > > > > > child process, but leave it running. That would give it time to
> > > > > > > read
> > > > > > > any data still in the pipe from QEMU IIUC.
> > > > > >
> > > > > > Yeh that's better; although when would we call shutdown or close on
> > > > > > it?
> > > > >
> > > > > Doesn't QEMU alredy use shutdown() during the right part of
> > > > > migration,
> > > > > or is that only wrt post-copy ?
> > > >
> > > > We only use it for cancel and errors, not during the normal behaviour.
> > >
> > > So we could do with shutdown() for sake of post-copy anyway, but for
> > > normal behaviour maybe the right answer is for close() to just wait a
> > > real long time for the child app to exit ? If we close the pipes, and
> > > then wait 5 seconds or more before giving up ?
> >
> > Yes, I'm happier with a much longer arbitrary value than a short
> > arbitrary value; but I do wonder if there's any real need to kill it.
>
> If we don't kill it, then if it gets stuck for some reason it will live
> forever. If we don't kill it but just close the FD, then we still need
> to waitpid at some point otherwise we get a zombie - unless we decide
> to daemonize the child instead ?
Well we used to rely on popen/pclose which I think just waited for it.
You could rely on migration_cancel calling shutdown and causing that to
do the kill, but always waiting in the normal course of things; which is
probably OK unless it decides to hang right at the end.
Dave
> Regards,
> Daniel
> --
> |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org -o- https://fstop138.berrange.com :|
> |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
--
Dr. David Alan Gilbert / address@hidden / Manchester, UK
- [Qemu-devel] [PATCH] io/channel-command: Delay the killing of the child after closing the pipe, Thomas Huth, 2018/02/13
- Re: [Qemu-devel] [PATCH] io/channel-command: Delay the killing of the child after closing the pipe, Daniel P . Berrangé, 2018/02/13
- Re: [Qemu-devel] [PATCH] io/channel-command: Delay the killing of the child after closing the pipe, Dr. David Alan Gilbert, 2018/02/13
- Re: [Qemu-devel] [PATCH] io/channel-command: Delay the killing of the child after closing the pipe, Daniel P . Berrangé, 2018/02/13
- Re: [Qemu-devel] [PATCH] io/channel-command: Delay the killing of the child after closing the pipe, Dr. David Alan Gilbert, 2018/02/13
- Re: [Qemu-devel] [PATCH] io/channel-command: Delay the killing of the child after closing the pipe, Daniel P . Berrangé, 2018/02/13
- Re: [Qemu-devel] [PATCH] io/channel-command: Delay the killing of the child after closing the pipe, Dr. David Alan Gilbert, 2018/02/13
- Re: [Qemu-devel] [PATCH] io/channel-command: Delay the killing of the child after closing the pipe, Daniel P . Berrangé, 2018/02/13
- Re: [Qemu-devel] [PATCH] io/channel-command: Delay the killing of the child after closing the pipe, Dr. David Alan Gilbert, 2018/02/13
- Re: [Qemu-devel] [PATCH] io/channel-command: Delay the killing of the child after closing the pipe, Daniel P . Berrangé, 2018/02/13
- Re: [Qemu-devel] [PATCH] io/channel-command: Delay the killing of the child after closing the pipe,
Dr. David Alan Gilbert <=