qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v4 for-4.0 0/7] vhost-user-blk: Add support for


From: Stefan Hajnoczi
Subject: Re: [Qemu-devel] [PATCH v4 for-4.0 0/7] vhost-user-blk: Add support for backend reconnecting
Date: Mon, 14 Jan 2019 10:22:31 +0000
User-agent: Mutt/1.10.1 (2018-07-13)

On Sat, Jan 12, 2019 at 12:50:12PM +0800, Yongji Xie wrote:
> On Fri, 11 Jan 2019 at 23:53, Stefan Hajnoczi <address@hidden> wrote:
> >
> > On Thu, Jan 10, 2019 at 06:59:27PM +0800, Yongji Xie wrote:
> > > On Thu, 10 Jan 2019 at 18:25, Stefan Hajnoczi <address@hidden> wrote:
> > > >
> > > > On Wed, Jan 09, 2019 at 07:27:21PM +0800, address@hidden wrote:
> > > > > From: Xie Yongji <address@hidden>
> > > > >
> > > > > This patchset is aimed at supporting qemu to reconnect
> > > > > vhost-user-blk backend after vhost-user-blk backend crash or
> > > > > restart.
> > > > >
> > > > > The patch 1 uses exisiting wait/nowait options to make QEMU not
> > > > > do a connect on client sockets during initialization of the chardev.
> > > > >
> > > > > The patch 2 introduces two new messages VHOST_USER_GET_INFLIGHT_FD
> > > > > and VHOST_USER_SET_INFLIGHT_FD to support providing shared
> > > > > memory to backend.
> > > >
> > > > Can you describe the problem that the inflight I/O shared memory region
> > > > solves?
> > > >
> > >
> > > The backend need to get inflight I/O and do I/O replay after restart.
> > > Now we can only get used_idx in used_ring. It's not enough. Because we
> > > might not process descriptors in the same order which they have been
> > > made available sometimes. A simple example:
> > > https://patchwork.kernel.org/cover/10715305/#22375607. So we need a
> > > shared memory to track inflight I/O.
> >
> > The inflight shared memory region is something that needs to be
> > discussed in detail to make sure it is correct not just for vhost-blk
> > but also for other vhost-user devices in the future.
> >
> > Please expand the protocol documentation so that someone can implement
> > this feature without looking at the code.  Explain the reason for the
> > inflight shared memory region and how exactly it used.
> >
> 
> OK, will do it in v5.
> 
> > After a quick look at the shared memory region documentation I have a
> > few questions:
> >
> > 1. The avail ring only contains "head" indices, each of which may chain
> >    non-head descriptors.  Does the inflight memory region track only the
> >    heads?
> >
> 
> Yes, we only track the head in inflight region.

Okay, thanks.  That is useful information to include in the
specification.

> > 2. Does the inflight shared memory region preserve avail ring order?
> >    For example, if the guest submitted 5 2 4, will the new vhost-user
> >    backend keep that order or does it see 2 4 5?
> >
> 
> Now we don't support to resubmit I/O in order. But I think we can add
> a timestamp
> for each inflight descriptor in shared memory to achieve that.

Great, the reason I think that feature is interesting is that other
device types may need to preserve order.  It depends on the exact
meaning of the device's requests...

> > I'm concerned that this approach to device recovery is invasive and hard
> > to test.  Instead I would use VIRTIO's Device Status Field
> > DEVICE_NEEDS_RESET bit to tell the guest driver that a reset is
> > necessary.  This is more disruptive - drivers either have to resubmit or
> > fail I/O with EIO - but it's also simple and more likely to work
> > correctly (it only needs to be implemented correctly in the guest
> > driver, not in the many available vhost-user backend implementations).
> >
> 
> So you mean adding one way to notify guest to resubmit inflight I/O. I
> think it's a good idea. But would it be more flexible to implement
> this in backend. We can support old guest. And it will be easy to fix
> bug or add feature.

There are trade-offs with either approach.  In the long run I think it's
beneficial minimize non-trivial logic in vhost-user backends.  There
will be more vhost-user backend implementations and therefore more bugs
if we put the logic into the backend.  This is why I think a simple
mechanism for marking the device as needing a reset will be more
reliable and less trouble.

Stefan

Attachment: signature.asc
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]