qemu-block
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-block] question:about introduce a new feature named “I/O hang


From: Kevin Wolf
Subject: Re: [Qemu-block] question:about introduce a new feature named “I/O hang”
Date: Fri, 5 Jul 2019 09:50:53 +0200
User-agent: Mutt/1.11.3 (2019-02-01)

Am 04.07.2019 um 17:16 hat wangjie (P) geschrieben:
> Hi, everybody:
> 
> I developed a feature named "I/O hang",my intention is to solve the problem
> like that:
> If the backend storage media of VM disk is far-end storage like IPSAN or
> FCSAN, storage net link will always disconnection and
> make I/O requests return EIO to Guest, and the status of filesystem in Guest
> will be read-only, even the link recovered
> after a while, the status of filesystem in Guest will not recover.

The standard solution for this is configuring the guest device with
werror=stop,rerror=stop so that the error is not delivered to the guest,
but the VM is stopped. When you run 'cont', the request is then retried.

> So I developed a feature named "I/O hang" to solve this problem, the
> solution like that:
> when some I/O requests return EIO in backend, "I/O hang" will catch the
> requests in qemu block layer and
> insert the requests to a rehandle queue but not return EIO to Guest, the I/O
> requests in Guest will hang but it does not lead
> Guest filesystem to be read-only, then "I/O hang" will loop to rehandle the
> requests for a period time(ex. 5 second) until the requests
> not return EIO(when backend storage link recovered).

Letting requests hang without stopping the VM risks the guest running
into timeouts and deciding that its disk is broken.

As you say your "hang" and retry logic sits in the block layer, what do
you do when you encounter a bdrv_drain() request?

> In addition to the function as above, "I/O hang" also can sent event to
> libvirt after backend storage status changed.
> 
> configure methods:
> 1. "I/O hang" ability can be configured for each disk as a disk attribute.
> 2. "I/O hang" timeout value also can be configured for each disk, when
> storage link not recover in timeout value,
>    "I/O hang" will disable rehandle I/O requests and return EIO to Guest.
> 
> Are you interested in the feature?  I intend to push this feature to qemu
> org, what's your opinion?

Were you aware of werror/rerror? Before we add another mechanism, we
need to be sure how the features compare, that the new mechanism
provides a significant advantage and that we keep code duplication as
low as possible.

Kevin



reply via email to

[Prev in Thread] Current Thread [Next in Thread]