Re: [PATCH for-5.0?] nbd: Attempt reconnect after server error of ESHUTD

qemu-block

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH for-5.0?] nbd: Attempt reconnect after server error of ESHUTD

From:	Eric Blake
Subject:	Re: [PATCH for-5.0?] nbd: Attempt reconnect after server error of ESHUTDOWN
Date:	Thu, 2 Apr 2020 08:55:48 -0500
User-agent:	Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.6.0

On 4/2/20 8:33 AM, Eric Blake wrote:

Then, what about "SHOULD wait until no inflight requests"? We don't doit either.. Should we?
qemu as server doesn't send NBD_ESHUTDOWN. It probably should (the waynbdkit does), but that's orthogonal to qemu as client responding toNBD_ESHUTDOWN.


Other things I want to document here based on an IRC conversation with Dave:

Our notion of reconnect-delay has a baked-in notion of timeout, butselecting the right timeout can be difficult (how do you know it is longenough to catch all the cases you care about where recovery will work,but not so long that waiting for an actual timeout is not going to bepainful when recovery is not possible). And the qemu block layeralready has the notion of pausing the guest on certain errors (whetherthat be just on ENOSPC, or on all errors), to give the management allthe time it needs to resolve the problem and then resume the guest.

There's also the issue of TCP timeouts - if the server manages to sendshutdown(SHUT_WR) before the connection dies, the client gets an instantEOF and can be pretty responsive to the need to start the retry cycle.But if the connection dies without a clean shutdown, the client may bestuck waiting several seconds for a TCP timeout to occur beforerealizing that things are down (use of TCP keep-alive may or may nothelp here) - management apps may be able to figure out from other meanswhen an NBD server is having issues long before qemu itself sees the TCPconnection go down. In that case, having a way for the client totrigger shutdown(SHUT_RD) in order to speed up disconnection, ratherthan waiting for a TCP timeout, can come in handy.

Or, if we have a multipath scenario, where we know that several IPaddresses will serve the same underlying storage, we may just need a wayto reopen an existing NBD blockdev but with a different IP address tothe server.

All of this implies we may want to add a QMP command to force a givenNBD blockdev to attempt a reconnect now, rather than having to wait fora TCP connection death to let us know that a reconnect is the only wayforward, or even as a way to make sure that we can resume the guestafter it was paused due to I/O error. I don't know if the existing'x-blockdev-reopen' can be extended to cover our needs, or if we needsomething completely new.


--
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org

[Prev in Thread]

Current Thread

[Next in Thread]

[PATCH for-5.0?] nbd: Attempt reconnect after server error of ESHUTDOWN, Eric Blake, 2020/04/01
- Re: [PATCH for-5.0?] nbd: Attempt reconnect after server error of ESHUTDOWN, Vladimir Sementsov-Ogievskiy, 2020/04/02
  - Re: [PATCH for-5.0?] nbd: Attempt reconnect after server error of ESHUTDOWN, Eric Blake, 2020/04/02
    - Re: [PATCH for-5.0?] nbd: Attempt reconnect after server error of ESHUTDOWN, Eric Blake <=
- Re: [PATCH for-5.0?] nbd: Attempt reconnect after server error of ESHUTDOWN, Richard W.M. Jones, 2020/04/02
  - Re: [PATCH for-5.0?] nbd: Attempt reconnect after server error of ESHUTDOWN, Eric Blake, 2020/04/02
    - Re: [PATCH for-5.0?] nbd: Attempt reconnect after server error of ESHUTDOWN, Richard W.M. Jones, 2020/04/02

Prev by Date: Re: [PATCH for-5.0?] nbd: Attempt reconnect after server error of ESHUTDOWN
Next by Date: [PATCH for-5.0? v3] qemu-img: Report convert errors by bytes, not sectors
Previous by thread: Re: [PATCH for-5.0?] nbd: Attempt reconnect after server error of ESHUTDOWN
Next by thread: Re: [PATCH for-5.0?] nbd: Attempt reconnect after server error of ESHUTDOWN
Index(es):
- Date
- Thread