qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 0/4] hw: make TCO watchdog actually work by default for Q35


From: Michael S. Tsirkin
Subject: Re: [PATCH 0/4] hw: make TCO watchdog actually work by default for Q35
Date: Thu, 10 Nov 2022 11:29:21 -0500

On Tue, Nov 01, 2022 at 01:03:17PM +0000, Daniel P. Berrangé wrote:
> On Tue, Nov 01, 2022 at 01:57:24PM +0100, Igor Mammedov wrote:
> > On Mon, 31 Oct 2022 11:48:58 -0400
> > "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > 
> > > On Mon, Oct 31, 2022 at 01:50:24PM +0000, Daniel P. Berrangé wrote:
> > > > On Mon, Oct 31, 2022 at 01:19:30PM +0000, Daniel P. Berrangé wrote:  
> > > > > The TCO watchdog is unconditionally integrated into the Q35 machine
> > > > > type by default, but at the same time is unconditionally disabled
> > > > > from firing by a host config option that overrides guest OS attempts
> > > > > to enable it. People have to know to set a magic -global to make
> > > > > it non-broken  
> > > > 
> > > > Incidentally I found that originally the TCO watchdog was not
> > > > unconditionally enabled. Its exposure to the guest could be
> > > > turned on/off using
> > > > 
> > > >   -global ICH9-LPC.enable_tco=bool
> > > > 
> > > > This was implemented for machine type compat, but it also gave
> > > > apps a way to disable the watchdog functionality. Unfortunately
> > > > that ability was discarded in this series:
> > > > 
> > > >   
> > > > https://lore.kernel.org/all/1453564933-29638-1-git-send-email-ehabkost@redhat.com/
> > > > 
> > > > but the 'enable_tco' property still exists in QOM, but silently
> > > > ignored.
> > > > 
> > > > Seems we should either fix the impl of 'enable_tco', or remove the
> > > > QOM property entirely, so we don't pretend it can be toggled anymore.
> > > > 
> > > > With regards,
> > > > Daniel  
> > > 
> > > i am inclined to say you are right and the fix is to fix the impl.
> > 
> > Is there need for users to disable whatchdog at all?
> > It was always present since then and no one complained, 
> > so perhaps we should ditch property instead fixing it
> > to keep it simple.
> 
> Thinking about it more, I think we should NOT fix the 'enable_tco' property,
> because there will be no way for a mgmt appp to tell if they're using a
> fixed or broken QEMU.

This is always the case for any bug. We don't as a rule add properties
for this, it is distro's responsibility to pick bugfixes for
features users care about.

What makes this bug different?


> So if they use 'enable_tco' on a broken QEMU and then
> live migrate, they'll get an guest ABI change. If we did want to support
> disabling it, then we should have a brand new property that apps can probe
> for.
> 
> In the absence of a request to disable watchdog, I'd say we just delete
> 'enable_tco' right now. If someone wants it in future, we can add it with
> a new name.
> 
> With regards,
> Daniel

I am really stressed about watchdog firing and resetting VMs that previously
were working fine. Adding this without a safety mechanism to quickly
disable in case of problems in the field is not wise imho.

> -- 
> |: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org         -o-            https://fstop138.berrange.com :|
> |: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|




reply via email to

[Prev in Thread] Current Thread [Next in Thread]