qemu-block
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 0/7] block/nbd: decouple reconnect from drain


From: Vladimir Sementsov-Ogievskiy
Subject: Re: [PATCH 0/7] block/nbd: decouple reconnect from drain
Date: Wed, 17 Mar 2021 11:35:31 +0300
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.8.1

15.03.2021 09:06, Roman Kagan wrote:
The reconnection logic doesn't need to stop while in a drained section.
Moreover it has to be active during the drained section, as the requests
that were caught in-flight with the connection to the server broken can
only usefully get drained if the connection is restored.  Otherwise such
requests can only either stall resulting in a deadlock (before
8c517de24a), or be aborted defeating the purpose of the reconnection
machinery (after 8c517de24a).

This series aims to just stop messing with the drained section in the
reconnection code.

While doing so it undoes the effect of 5ad81b4946 ("nbd: Restrict
connection_co reentrance"); as I've missed the point of that commit I'd
appreciate more scrutiny in this area.

Roman Kagan (7):
   block/nbd: avoid touching freed connect_thread
   block/nbd: use uniformly nbd_client_connecting_wait
   block/nbd: assert attach/detach runs in the proper context
   block/nbd: transfer reconnection stuff across aio_context switch
   block/nbd: better document a case in nbd_co_establish_connection
   block/nbd: decouple reconnect from drain
   block/nbd: stop manipulating in_flight counter

  block/nbd.c  | 191 +++++++++++++++++++++++----------------------------
  nbd/client.c |   2 -
  2 files changed, 86 insertions(+), 107 deletions(-)



Hmm. The huge source of problems for this series is weird logic around drain 
and aio context switch in NBD driver.

Why do we have all these too complicated logic with abuse of in_flight counter 
in NBD? The answer is connection_co. NBD differs from other drivers, it has a 
coroutine independent of request coroutines. And we have to move this coroutine 
carefully to new aio context. We can't just enter it from the new context, we 
want to be sure that connection_co is in one of yield points that supports 
reentering.

I have an idea of how to avoid this thing: drop connection_co at all.

1. nbd negotiation goes to connection thread and becomes independent of any aio 
context.

2. waiting for server reply goes to request code. So, instead of reading the 
replay from socket always in connection_co, we read in the request coroutine, 
after sending the request. We'll need a CoMutex for it (as only one request 
coroutine should read from socket), and be prepared to coming reply is not for 
_this_ request (in this case we should wake another request and continue read 
from socket).

but this may be too much for soft freeze.


Another idea:

You want all the requests be completed on drain_begin(), not cancelled. 
Actually, you don't need reconnect runnning during drained section for it. It 
should be enough just wait for all currenct requests before disabling the 
reconnect in drain_begin handler.

--
Best regards,
Vladimir



reply via email to

[Prev in Thread] Current Thread [Next in Thread]