[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [Nbd] Is NBD_CMD_FLAG_FUA valid during NBD_CMD_FLUSH?

From: Eric Blake
Subject: Re: [Qemu-devel] [Nbd] Is NBD_CMD_FLAG_FUA valid during NBD_CMD_FLUSH?
Date: Thu, 31 Mar 2016 13:54:17 -0600
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.7.1

On 03/31/2016 01:41 PM, Alex Bligh wrote:
> On 31 Mar 2016, at 20:33, Eric Blake <address@hidden> wrote:
>> Qemu's nbd-client is setting NBD_CMD_FLAG_FUA during a flush command,
>> but the official NBD protocol documentation doesn't describe this as
>> valid (it merely states that flush must not have a reply until all
>> acknowledged writes have hit permanent storage).  Does this flag make
>> sense (what semantics would the flag add, and we need to fix the NBD
>> docs as well as relax the reference implementation to allow the flag),
>> or is it a bug in qemu (and the recent tightening of NBD to throw EINVAL
>> on unsupported flags will trip up qemu)?
> As the original author of that particular mess, the intent was that
> they should reflect exactly the Linux kernel's semantics for FLUSH
> and FUA, not only in terms of whether they can be used together,
> but also exactly what they mean.

Oh, and I also just found that qemu's nbd-server tries to honor FUA on
read, even though the protocol doesn't document that as valid either.

> This turned out to be an easier way of describing the operations
> than describing them semantically (in particular FLUSH, where I
> couldn't get an entirely consistent answer of what it required
> of inflight requests, specifically whether it required all
> requests inflight at the time of making the request to be written
> to disk prior to answering, or all requests inflight prior to the
> time of replying to be written to disk prior to answering, though
> I believe the former).
> FUA just requires that particular request to be persisted to
> disk, and does not require other requests to be persisted to disk

As written, NBD says that FUA requires the current write operation to
land on disk (but says nothing about any other writes, whether those
writes had an early reply).  And for flush, NBD only requires that all
writes that have _sent_ their reply to the client must land on disk, but
this can certainly be a smaller set of write requests than _all_ writes
issued prior to that point in time.  So maybe flush+FUA is a valid thing
to support, and means that ALL in-flight writes must land, whether or
not a reply has been sent to the client, for an even stronger barrier?

> So in answer to your question, my understanding is that FLUSH requires
> (some subset) of otherwise potentially non-persisted requests to
> be persisted to disk. In that sense it implies FUA. It is permitted
> to set FUA (as it is permitted, I believe, in the linux block layer)
> but it will make no difference.
> I once thought FUA on read should bypass any local read cache, though
> that is not part of the spec currently.

In qemu, read+FUA just triggers blk_co_flush() prior to reading; but
that's the same function it calls for write+FUA.  And for flush (whether
or not FUA was specified), qemu still calls blk_co_flush().  So from
qemu's perspective, FUA is synonymous with "finish ALL pending
transactions", which is stronger than what the NBD protocol requires.
(Nothing wrong with an implementation doing more work than required,
although it may be less efficient).  Alas, that means I can't use qemu's
behavior as a good reference for how to improve the NBD spec.

Meanwhile, it sounds like FUA is valid on read, write, AND flush
(because the kernel supports all three), even if we aren't quite sure
what to document of those flags.  And that means qemu is correct, and
the NBD protocol has a bug.  Since you contributed the FUA flag, is that
something you can try to improve?

Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org

Attachment: signature.asc
Description: OpenPGP digital signature

reply via email to

[Prev in Thread] Current Thread [Next in Thread]