[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH v3 1/1] os-posix: asynchronous teardown for shutdown on Linux
From: |
Claudio Imbrenda |
Subject: |
Re: [PATCH v3 1/1] os-posix: asynchronous teardown for shutdown on Linux |
Date: |
Fri, 12 Aug 2022 13:45:27 +0200 |
On Fri, 12 Aug 2022 08:38:59 -0300
Murilo Opsfelder Araújo <muriloo@linux.ibm.com> wrote:
> On 8/12/22 04:26, Claudio Imbrenda wrote:
> > On Thu, 11 Aug 2022 23:05:52 -0300
> > Murilo Opsfelder Araújo <muriloo@linux.ibm.com> wrote:
> >
> >> On 8/11/22 11:02, Daniel P. Berrangé wrote:
> >> [...]
> >>>>> Hmm, I was hoping you could just use SIGKILL to guarantee that this
> >>>>> gets killed off. Is SIGKILL delivered too soon to allow for the
> >>>>> main QEMU process to have exited quickly ?
> >>>>
> >>>> yes, I tried. qemu has not finished exiting when the signal is
> >>>> delivered, the cleanup process dies before qemu, which defeats the
> >>>> purpose
> >>>
> >>> Ok, too bad.
> >>>
> >>>>> If so I wonder what happens when systemd just delivers SIGKILL to
> >>>>> all processes in the cgroup - I'm not sure there's a guarantee it
> >>>>> will SIGKILL the main qemu before it SIGKILLs this helper
> >>>>
> >>>> I'm afraid in that case there is no guarantee.
> >>>>
> >>>> for what it's worth, both virsh shutdown and destroy seem to do things
> >>>> properly.
> >>>
> >>> Hmm, probably because libvirt tells QEMU to exit before systemd comes
> >>> along and tells everything in the cgroup to die with SIGKILL.
> >>
> >> It seems Libvirt sends SIGKILL if qemu process doesn't terminate within 10
> >> seconds after Libvirt sent SIGTERM:
> >>
> >> https://gitlab.com/libvirt/libvirt/-/blob/0615df084ec9996b5df88d6a1b59c557e22f3a12/src/util/virprocess.c#L375
> >>
> >
> > but this is fine.
> >
> > with asynchronous teardown, qemu will exit almost immediately when
> > receiving SIGTERM, and the cleanup process will start cleaning up.
>
> Under normal and orderly conditions, yes.
>
> >> So I guess this patch happened to work with Libvirt because the main qemu
> >> process terminated before the timeout and before SIGKILL was delivered.
> >
> > it seems so
> >
> >>
> >> The cleanup process is trying to solve the problem where the main qemu
> >> process
> >> takes too long to terminate. However, if the cleanup process itself takes
> >> too
> >> long, SIGKILL will be sent by Libvirt anyway.
> >
> > but that is not a problem, the sole purpose of the cleanup process is
> > to terminate _after_ qemu. it doesn't matter what happens after qemu
> > has terminated. if you look at the patch, after going to great lengths
> > to assure that qemu has terminated, all the child process does is
> > _exit(0).
> >
> >>
> >> Perhaps we can describe this situation in the parameter help, e.g.: If
> >> management layer decides to send SIGKILL (e.g.: due to timeout or
> >> deliberate
> >> decision), the cleanup process can exit before the main process, deceiving
> >> its
> >> purpose.
> >
> > if the management layer (or the user) decides to send SIGKILL
> > immediately to the whole cgroup without sending SIGTERM first, then
> > this whole asynchronous teardown mechanism is defeated, yes.
>
> This situation is what we likely want to describe in the parameter help. I
> don't
> want to give users the false impression that this option will *always* behave
> the manner we expect it to work *most* of the time.
fair enough, I'll improve the documentation
>
> --
> Murilo
Re: [PATCH v3 1/1] os-posix: asynchronous teardown for shutdown on Linux, Markus Armbruster, 2022/08/30