[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [Nbd] [PATCH 2/2] NBD proto: add GET_LBA_STATUS extensi

From: Eric Blake
Subject: Re: [Qemu-devel] [Nbd] [PATCH 2/2] NBD proto: add GET_LBA_STATUS extension
Date: Mon, 28 Mar 2016 09:58:52 -0600
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.7.1

On 03/25/2016 02:49 AM, Wouter Verhelst wrote:

>> You may also want to add a rule that for all future extensions, any
>> command that requires data in the server response (other than the server
>> response to NBD_CMD_READ) must include a 32-bit length as the first
>> field of its data payload, so that the server reply is always structured.
> Right.
>> Hmm, I still think it would be hard to write a wireshark dissector for
>> server replies, since there is no indication whether a given reply will
>> include data or not.
> Well, a wireshark dissector already exists. However, it is very old; it
> doesn't support the NBD_CMD_TRIM or NBD_CMD_FLUSH commands, it doesn't
> support the NBD_CMD_FLAG_FUA option to NBD_CMD_WRITE, and it doesn't
> deal with negotiation at all. It was written when newstyle negotiation
> didn't exist yet, and scanning for INIT_PASSWD at the time was too hard,
> according to the guy who wrote it (don't remember who that was).

And I've never written a wireshark dissector myself, to know how easy or
hard it would be to extend that work.

>> The _client_ knows (well, any well-written client
>> that uses a different value for 'handle' for every command sent to the
>> server), because it can read the returned 'handle' field to know what
>> command it matches to and therefore what data to expect; but a
>> third-party observer seeing _just_ the server messages has no idea which
>> server responses should have payload.
> It can if it knows enough about the protocol, but granted, that makes it
> harder for us to extend the protocol without breaking the dissector.
>> Scanning the stream and assuming that a new reply starts each time
>> NBD_REPLY_MAGIC is encountered is a close approximation, but hits
>> false positives if the data being returned for NBD_CMD_READ contains
>> the same magic number as part of its contents.
> Indeed.

One benefit of TCP: we can rely on packet boundaries (whether or not
fragmentation is also happening); I'm not sure if UDP shares the same
benefits (then again, I'm not even sure if UDP is usable for the NBD
protocol, since we definitely rely on in-order delivery of packets: a
read command can definitely return more bytes than even a jumbo frame
can contain, even though we do allow out-of-order delivery of replies).
 So if the server always sends each reply as the start of its own packet
(rather than coalescing multiple replies into a single network packet),
and the dissector only looks for magic numbers at the start of a packet,
then any server packet not starting with the magic number must be data
payload, and you could potentially even avoid the false positives by
choosing to break data replies into packets by adjusting packet lengths
by one byte any time the next data chunk would otherwise happen to start
with the same two bytes as the magic number.  But I haven't actually
tested any of this, to know if packet coalescing goes on, and I
certainly don't think it is worth complicating the server to avoid false
positive magic number detection by splitting read payloads across packet
boundaries, just for the benefit of wire-sniffing.

> I like this. However, before committing to it, I would like to see
> Markus' feedback on that (explicit Cc added, even though he's on the
> list, too).
> We'd also need a way to communicate the ability to speak this protocol
> from the kernel to userspace before telling the server that the client
> understands something which maybe its kernel doesn't.

proto.md already documents that ioctl()s exist for the user space to
inform the kernel about options sent by the server prior to kicking off
transmission phase, and that the NBD protocol itself does not go into
full detail about available ioctl()s (the kernel/userspace coordination
does not affect the protocol).  It seems fairly straightforward for the
kernel to supply another ioctl() that userspace can use to learn whether
it is allowed or forbidden from advertising structured reply status
during the handshake phase (including the case where the ioctl() is not
present being treated as the client must not enable structured replies).

> Markus?

Markus is on vacation for a bit, so we'll just have to wait for a reply.

>> I could write up a negotiation of global flags for structured reply
>> lengths as an extension proposal, if you think it is worth it.
> I think it is worth it...

Okay, I'll give it a shot, in a separate thread.

Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org

Attachment: signature.asc
Description: OpenPGP digital signature

reply via email to

[Prev in Thread] Current Thread [Next in Thread]