qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH 3/4] nbd/client: Support requests of additional


From: Eric Blake
Subject: Re: [Qemu-devel] [PATCH 3/4] nbd/client: Support requests of additional block sizing info
Date: Thu, 3 May 2018 09:49:48 -0500
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.7.0

On 05/03/2018 04:17 AM, Vladimir Sementsov-Ogievskiy wrote:
02.05.2018 00:13, Eric Blake wrote:
The NBD spec is clarifying [1] that a server may want to advertise
different limits for READ/WRITE (in our case, 32M) than for
TRIM/ZERO (in our case, nearly 4G).  Implement the client
side support for these alternate limits, by always requesting
the new information (a compliant server must ignore the
request if it is unknown), and by validating anything reported
by the server before populating the block layer limits.

[1] https://lists.debian.org/nbd/2018/03/msg00048.html

Signed-off-by: Eric Blake <address@hidden>


+++ b/nbd/client.c
@@ -337,16 +337,18 @@ static int nbd_opt_go(QIOChannel *ioc, const char *wantname,
      info->flags = 0;

      trace_nbd_opt_go_start(wantname);
-    buf = g_malloc(4 + len + 2 + 2 * info->request_sizes + 1);
+    buf = g_malloc(4 + len + 2 + 3 * 2 * info->request_sizes + 1);

Hmm, what "+1" means?

Doesn't appear to be necessary, but a little slop never hurts. Better might be struct-ifying things rather than open-coding offsets, or even using a vectored approach rather than requiring a single buffer. But if useful, that's refactoring that is independent of this patch, while this is just demonstrating the bare minimum implementation of the new feature.

      stl_be_p(buf, len);
      memcpy(buf + 4, wantname, len);
-    /* At most one request, everything else up to server */
-    stw_be_p(buf + 4 + len, info->request_sizes);
+    /* Either 0 or 3 requests, everything else up to server */
+    stw_be_p(buf + 4 + len, 3 * info->request_sizes);
      if (info->request_sizes) {
          stw_be_p(buf + 4 + len + 2, NBD_INFO_BLOCK_SIZE);
+        stw_be_p(buf + 4 + len + 2 + 2, NBD_INFO_TRIM_SIZE);
+        stw_be_p(buf + 4 + len + 2 + 2 + 2, NBD_INFO_ZERO_SIZE);
      }
      error = nbd_send_option_request(ioc, NBD_OPT_GO,
-                                    4 + len + 2 + 2 * info->request_sizes, +                                    4 + len + 2 + 3 * 2 * info->request_sizes,
                                      buf, errp);

magic chunk) Change looks correct. Is it worth introducing a buf_len variable, to
not retype it?

And increment as we go?  Yeah, it might make things more legible.


+            be32_to_cpus(&info->max_trim);
+            if (!info->min_trim || !info->max_trim ||
+                (info->max_trim != UINT32_MAX &&
+                 !QEMU_IS_ALIGNED(info->max_trim, info->min_trim))) {
+                error_setg(errp, "server trim sizes %" PRIu32 "/%" PRIu32 +                           " are not valid", info->min_trim, info->max_trim);
+                nbd_send_opt_abort(ioc);
+                return -1;
+            }

Didn't you forget to check that min_trim is a power of two?

Nope. The NBD spec wording specifically uses "The minimum trim size SHOULD be a power of 2, and MUST be at least as large as the preferred block size advertised in `NBD_INFO_BLOCK_SIZE`", in part because there ARE existing hardware iscsi devices that have an advertised preferred/maximum trim size of exactly 15 megabytes (you can request smaller trims 1M at a time, and the device then caches things to eventually report unmapped slices at a 15M boundary). For more information on this odd hardware, look in the qemu logs for the Dell Equallogic device, such as commit 3482b9bc.

Since the NBD 'min trim' is advisory, and merely telling the client the ideal alignments to use for a trim most likely to have an effect (on Equallogic, a 1M trim by itself might not trim anything unless neighboring areas are also trimmed, while an aligned 15M trim is immediately observable), an NBD server wrapping such an iscsi device may prefer to mirror the hardware's preferred alignment of 15M, rather than inventing a minimum alignment of 1M, in what it advertises in NBD_INFO_TRIM_SIZE.

And maybe I should tweak the NBD spec addition to call things "preferred trim/zero" rather than "min trim/zero", if that makes things any easier to grasp on first read, and better matches the iscsi concept.


hmm, so, now nbd_opt_go() is full of very-very similar code parts, which may worth some refactoring now or in future.

Are we going to add similar limits for BLOCK_STATUS? and for CACHE? I have an idea: why not to make an universal option for it in
protocol? Something like

option INFO_CMD_SIZE:
   uint16_t command
   uint32_t min_block
   uint32_t max_block

and server can send several such options. Most of semantics for these additional limits are common, so it will simplify both documentation
and realization..

I'll copy this to nbd thread and it is better to discuss it in it.

Yep, I've followed up on that thread, but will post a v2 of this proposal along those lines.

--
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



reply via email to

[Prev in Thread] Current Thread [Next in Thread]