qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [Nbd] [PATCH 3/1] doc: Propose Structured Replies exten


From: Alex Bligh
Subject: Re: [Qemu-devel] [Nbd] [PATCH 3/1] doc: Propose Structured Replies extension
Date: Tue, 29 Mar 2016 20:39:29 +0100

On 29 Mar 2016, at 19:51, Wouter Verhelst <address@hidden> wrote:

>> 
>> But I was envisioning the opposite: the server must NOT send X bytes
>> unless it knows they are valid; if it encounters a read error at Z,
>> then it sends a structured read of Z-1 bytes before the final normal
>> message that reports overall failure.  The client then assumes that
>> all X bytes received are valid.
> 
> The problem with that approach is that it makes it impossible for a
> server to use a sendfile()-like system call, where you don't know that
> there's a read error until start sending out data to the client (which
> implies that you must've already sent out the header).

I don't think sendfile semantics are ever compatible with reporting
read errors *unless* you pad after the read.

IIRC the way sendfile works is that you specify a pointer to an offset,
and sendfile sends as much as it can read (up to the length specified)
and updates the offset for the length read.

Naturally at the start of the read section, you don't know when the
error is going to occur, so you must say that the length of the data
read is going to be the length of the actual chunk. sendfile then
does its stuff, and fills up either the whole thing, or part of it.
In the case that part of the data (only) is available, you can't
report the error there and then, because the client is expecting
chunk data, so you must either close the connection (potentially
disruptive) or pad the data, and report the error at the end.

Using Eric's current scheme, you have no way of knowing where the error
occurred. Remember the chunks could be out of order, e.g. you get
chunks 1,3,5,7,9 in, and then an error, so you have no idea where
the error was. It could be in chunks 1,3,5,7,9 (and the server might
have padded the rest of the chunk) or in an unread chunk (2,4,6,8,10).
This seems undesirable.

I think we are paying too much attention to trying to keep NBD_RESPONSE
intact. The justification for this was (I think) that it made it easier
for existing protocol analysers. It doesn't, really, as all the data
is going to come BEFORE the NBD_RESPONSE (unlike in NBD_CMD_READ in
other situations).

I think we should therefore look at this the other way around. Here's
a straw man proposal as an alternative for the reply bits. For
a structured reply ALL we get is the chunks. The final chunk
(possibly the only chunk) is marked specially. Each chunk looks something
like:

offset+
0000    32 bit   NBD_STRUCTURED_REPLY_MAGIC
0004    64 bit   handle
000C    32 bit   Flags
0010    32 bit   Payload length


We have a couple of flags defined:

NBD_CHUNK_IS_DATA: the chunk is data, and the payload is a 64 bit offset
plus the data read

NBD_CHUNK_IS_HOLE: the chunk is zeroes, and the payload is a 64 bit offset
(only)

NBD_CHUNK_IS_END: (must be the final chunk). The payload is a 64 bit offset
plus a 32 bit error code, or zero. If no error, the offset must be set to
the total amount read. If there is an error, the offset MAY indicate the
position of the error. If an error occurs, no more chunks should be sent.


The advantages of this scheme are:

1. Only one packet type in the reply (chunks)

2. It's no more difficult to implement wireshark decoding of this (in
   addition to the normal NBD protocol) than the current proposal. I'd
   suggest in fact they could be easier.

3. Chunks that error part way through (sendfile type) must still be
   padded but now can indicate error location.

4. It would be possible to allow EVERY server reply to be a structured
   reply that simply set NBD_CHUNK_IS_END. That gives us a convenient
   route to servers which only implement structured replies. With DF,
   this would be little harder than implementing the current
   protocol.

-- 
Alex Bligh







reply via email to

[Prev in Thread] Current Thread [Next in Thread]