[PATCH for-5.0?] nbd: Attempt reconnect after server error of ESHUTDOWN

qemu-block

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[PATCH for-5.0?] nbd: Attempt reconnect after server error of ESHUTDOWN

From:	Eric Blake
Subject:	[PATCH for-5.0?] nbd: Attempt reconnect after server error of ESHUTDOWN
Date:	Wed, 1 Apr 2020 17:38:41 -0500

I was trying to test qemu's reconnect-delay parameter by using nbdkit
as a server that I could easily make disappear and resume.  A bit of
experimenting shows that when nbdkit is abruptly killed (SIGKILL),
qemu detects EOF on the socket and manages to reconnect just fine; but
when nbdkit is gracefully killed (SIGTERM), it merely fails all
further guest requests with NBD_ESHUTDOWN until the client disconnects
first, and qemu was blindly failing the I/O request with ESHUTDOWN
from the server instead of attempting to reconnect.

While most NBD server failures are unlikely to change by merely
retrying the same transaction, our decision to not start a retry loop
in the common case is correct.  But NBD_ESHUTDOWN is rare enough, and
really is indicative of a transient situation, that it is worth
special-casing.

Here's the test setup I used: in one terminal, kick off a sequence of
nbdkit commands that has a temporary window where the server is
offline; in another terminal (and within the first 5 seconds) kick off
a qemu-img convert with reconnect enabled.  If the qemu-img process
completes successfully, the reconnect worked.

$ #term1
$ MYSIG=    # or MYSIG='-s KILL'
$ timeout $MYSIG 5s ~/nbdkit/nbdkit -fv --filter=delay --filter=noextents \
  null 200M delay-read=1s; sleep 5; ~/nbdkit/nbdkit -fv --filter=exitlast \
  --filter=delay --filter=noextents null 200M delay-read=1s

$ #term2
$ MYCONN=server.type=inet,server.host=localhost,server.port=10809
$ qemu-img convert -p -O raw --image-opts \
  driver=nbd,$MYCONN,,reconnect-delay=60 out.img

See also: https://bugzilla.redhat.com/show_bug.cgi?id=1819240#c8

Signed-off-by: Eric Blake <address@hidden>
---

This is not a regression, per se, as reconnect-delay has been unchanged
since 4.2; but I'd like to consider this as an interoperability bugfix
worth including in the next rc.

 block/nbd.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/block/nbd.c b/block/nbd.c
index 2906484390f9..576b95fb8753 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -863,6 +863,15 @@ static coroutine_fn int nbd_co_receive_one_chunk(
     if (ret < 0) {
         memset(reply, 0, sizeof(*reply));
         nbd_channel_error(s, ret);
+    } else if (s->reconnect_delay && *request_ret == -ESHUTDOWN) {
+        /*
+         * Special case: if we support reconnect and server is warning
+         * us that it wants to shut down, then treat this like an
+         * abrupt connection loss.
+         */
+        memset(reply, 0, sizeof(*reply));
+        *request_ret = 0;
+        nbd_channel_error(s, -EIO);
     } else {
         /* For assert at loop start in nbd_connection_entry */
         *reply = s->reply;
-- 
2.26.0.rc2

[Prev in Thread]

Current Thread

[Next in Thread]

[PATCH for-5.0?] nbd: Attempt reconnect after server error of ESHUTDOWN, Eric Blake <=
- Re: [PATCH for-5.0?] nbd: Attempt reconnect after server error of ESHUTDOWN, Vladimir Sementsov-Ogievskiy, 2020/04/02
  - Re: [PATCH for-5.0?] nbd: Attempt reconnect after server error of ESHUTDOWN, Eric Blake, 2020/04/02
    - Re: [PATCH for-5.0?] nbd: Attempt reconnect after server error of ESHUTDOWN, Eric Blake, 2020/04/02
- Re: [PATCH for-5.0?] nbd: Attempt reconnect after server error of ESHUTDOWN, Richard W.M. Jones, 2020/04/02
  - Re: [PATCH for-5.0?] nbd: Attempt reconnect after server error of ESHUTDOWN, Eric Blake, 2020/04/02
    - Re: [PATCH for-5.0?] nbd: Attempt reconnect after server error of ESHUTDOWN, Richard W.M. Jones, 2020/04/02

Prev by Date: Re: [PATCH for-5.0 v2] qemu-img: Report convert errors by bytes, not sectors
Next by Date: [PATCH v18 0/4] qcow2: Implement zstd cluster compression method
Previous by thread: [PATCH v1] nvme: indicate CMB support through controller capabilities register
Next by thread: Re: [PATCH for-5.0?] nbd: Attempt reconnect after server error of ESHUTDOWN
Index(es):
- Date
- Thread