qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH RFC 0/5] disk deadlines


From: Kevin Wolf
Subject: Re: [Qemu-devel] [PATCH RFC 0/5] disk deadlines
Date: Tue, 8 Sep 2015 12:49:26 +0200
User-agent: Mutt/1.5.21 (2010-09-15)

Am 08.09.2015 um 12:20 hat Fam Zheng geschrieben:
> On Tue, 09/08 12:11, Kevin Wolf wrote:
> > Am 08.09.2015 um 11:20 hat Fam Zheng geschrieben:
> > > [Cc'ing address@hidden
> > > 
> > > On Tue, 09/08 11:00, Denis V. Lunev wrote:
> > > > To avoid such situation this patchset introduces patch per-drive option
> > > > "disk-deadlines=on|off" which is unset by default.
> > > 
> > > The general idea sounds very nice. Thanks!
> > > 
> > > Should we allow user configuration on the timeout?  If so, the option 
> > > should be
> > > something like "timeout-seconds=0,1,2...".  Also I think we could use 
> > > werror
> > > and rerror to control the handling policy (whether to ignore/report/stop 
> > > on
> > > timeout).
> > 
> > Yes, I think the timeout needs to be configurable. However, the only
> > action that makes sense is stop. Everything else would be unsafe because
> > the running request could still complete at a later point.
> 
> What if the timeout happens on a quorum child?  The management can replace it
> transparently without stopping the VM.

This is getting tricky...

I'll try this: We need to attribute timed out requests to a specific BDS.
A user of a BlockBackend can run if all of its (recursive) children
don't have timed out requests. So if the only thing that is blocked is a
BDS used for an NBD server, but it isn't used by the guest, the guest
can keep running. The same way, after removing a bad quorum child, the
guest can be continued again.

Somehow we must make sure that timeouts are propagated through the BDS
tree (do we need parent notifiers?), and that at the same time the
quorum BDS's timeout status is updated when the bad child is removed.

The trickier part might actually be to remove a BDS from quorum while a
request is still in flight. The traditional approach is bdrv_drain(),
but that won't work here. We want to remove the child while quorum has
still a request pending on it.

I don't think this will result automatically from doing the timeout
work. It will instead need some serious design work.

Kevin



reply via email to

[Prev in Thread] Current Thread [Next in Thread]