[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Thoughts on VM fence infrastructure
From: |
Felipe Franciosi |
Subject: |
Re: Thoughts on VM fence infrastructure |
Date: |
Mon, 30 Sep 2019 17:33:02 +0000 |
> On Sep 30, 2019, at 6:11 PM, Dr. David Alan Gilbert <address@hidden> wrote:
>
> * Felipe Franciosi (address@hidden) wrote:
>>
>>
>>> On Sep 30, 2019, at 5:03 PM, Dr. David Alan Gilbert <address@hidden> wrote:
>>>
>>> * Felipe Franciosi (address@hidden) wrote:
>>>> Hi David,
>>>>
>>>>> On Sep 30, 2019, at 3:29 PM, Dr. David Alan Gilbert <address@hidden>
>>>>> wrote:
>>>>>
>>>>> * Felipe Franciosi (address@hidden) wrote:
>>>>>> Heyall,
>>>>>>
>>>>>> We have a use case where a host should self-fence (and all VMs should
>>>>>> die) if it doesn't hear back from a heartbeat within a certain time
>>>>>> period. Lots of ideas were floated around where libvirt could take
>>>>>> care of killing VMs or a separate service could do it. The concern
>>>>>> with those is that various failures could lead to _those_ services
>>>>>> being unavailable and the fencing wouldn't be enforced as it should.
>>>>>>
>>>>>> Ultimately, it feels like Qemu should be responsible for this
>>>>>> heartbeat and exit (or execute a custom callback) on timeout.
>>>>>
>>>>> It doesn't feel doing it inside qemu would be any safer; something
>>>>> outside QEMU can forcibly emit a kill -9 and qemu *will* stop.
>>>>
>>>> The argument above is that we would have to rely on this external
>>>> service being functional. Consider the case where the host is
>>>> dysfunctional, with this service perhaps crashed and a corrupt
>>>> filesystem preventing it from restarting. The VMs would never die.
>>>
>>> Yeh that could fail.
>>>
>>>> It feels like a Qemu timer-driven heartbeat check and calls abort() /
>>>> exit() would be more reliable. Thoughts?
>>>
>>> OK, yes; perhaps using a timer_create and telling it to send a fatal
>>> signal is pretty solid; it would take the kernel to do that once it's
>>> set.
>>
>> I'm confused about why the kernel needs to be involved. If this is a
>> timer off the Qemu main loop, it can just check on the heartbeat
>> condition (which should be customisable) and call abort() if that's
>> not satisfied. If you agree on that I'd like to talk about how that
>> check could be made customisable.
>
> There are times when the main loop can get blocked even though the CPU
> threads can be running and can in some configurations perform IO
> even without the main loop (I think!).
Ah, that's a very good point. Indeed, you can perform IO in those
cases specially when using vhost devices.
> By setting a timer in the kernel that sends a signal to qemu, the kernel
> will send that signal however broken qemu is.
Got you now. That's probably better. Do you reckon a signal is
preferable over SIGEV_THREAD?
I'm still wondering how to make this customisable so that different
types of heartbeat could be implemented (preferably without creating
external dependencies per discussion above). Thoughts welcome.
F.
>
>>
>>>
>>> IMHO the safer way is to kick the host off the network by reprogramming
>>> switches; so even if the qemu is actually alive it can't get anywhere.
>>>
>>> Dave
>>
>> Naturally some off-host STONITH is preferable, but that's not always
>> available. A self-fencing mechanism right at the heart of the emulator
>> can do the job without external hardware dependencies.
>
> Dave
>
>> Cheers,
>> Felipe
>>
>>>
>>>
>>>> Felipe
>>>>
>>>>>
>>>>>> Does something already exist for this purpose which could be used?
>>>>>> Would a generic Qemu-fencing infrastructure be something of interest?
>>>>> Dave
>>>>>
>>>>>
>>>>>> Cheers,
>>>>>> F.
>>>>>>
>>>>> --
>>>>> Dr. David Alan Gilbert / address@hidden / Manchester, UK
>>>>
>>> --
>>> Dr. David Alan Gilbert / address@hidden / Manchester, UK
>>
> --
> Dr. David Alan Gilbert / address@hidden / Manchester, UK
- Thoughts on VM fence infrastructure, Felipe Franciosi, 2019/09/30
- Re: Thoughts on VM fence infrastructure, Dr. David Alan Gilbert, 2019/09/30
- Re: Thoughts on VM fence infrastructure, Felipe Franciosi, 2019/09/30
- Re: Thoughts on VM fence infrastructure, Dr. David Alan Gilbert, 2019/09/30
- Re: Thoughts on VM fence infrastructure, Felipe Franciosi, 2019/09/30
- Re: Thoughts on VM fence infrastructure, Dr. David Alan Gilbert, 2019/09/30
- Re: Thoughts on VM fence infrastructure,
Felipe Franciosi <=
- Re: Thoughts on VM fence infrastructure, Dr. David Alan Gilbert, 2019/09/30
- Re: Thoughts on VM fence infrastructure, Felipe Franciosi, 2019/09/30
- Re: Thoughts on VM fence infrastructure, Rafael David Tinoco, 2019/09/30
- Re: Thoughts on VM fence infrastructure, Felipe Franciosi, 2019/09/30