qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v4 08/10] block/nbd: add cmdline and qapi parame


From: Vladimir Sementsov-Ogievskiy
Subject: Re: [Qemu-devel] [PATCH v4 08/10] block/nbd: add cmdline and qapi parameter reconnect-delay
Date: Thu, 11 Apr 2019 15:47:19 +0000

05.01.2019 1:25, Eric Blake wrote:
> On 7/31/18 12:30 PM, Vladimir Sementsov-Ogievskiy wrote:
>> Reconnect will be implemented in the following commit, so for now,
>> in semantics below, disconnect itself is a "serious error".
>>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy <address@hidden>
>> ---
>>   qapi/block-core.json | 12 +++++++++++-
>>   block/nbd-client.h   |  1 +
>>   block/nbd-client.c   |  1 +
>>   block/nbd.c          | 16 +++++++++++++++-
>>   4 files changed, 28 insertions(+), 2 deletions(-)
>>
>> diff --git a/qapi/block-core.json b/qapi/block-core.json
>> index 5b9084a394..cf03402ec5 100644
>> --- a/qapi/block-core.json
>> +++ b/qapi/block-core.json
>> @@ -3511,13 +3511,23 @@
>>   #                  traditional "base:allocation" block status (see
>>   #                  NBD_OPT_LIST_META_CONTEXT in the NBD protocol) (since 
>> 3.0)
>>   #
>> +# @reconnect-delay: Reconnect delay. On disconnect, nbd client tries to 
>> connect
> 
> Maybe 'On unexpected disconnect', since intentional disconnect is not
> unexpected.
> 
>> +#                   again, until success or serious error. During first
>> +#                   @reconnect-delay seconds of reconnecting loop all 
>> requests
>> +#                   are paused and have a chance to rerun, if successful
>> +#                   connect occures during this time. After @reconnect-delay
> 
> occurs
> 
>> +#                   seconds all delayed requests are failed and all 
>> following
>> +#                   requests will be failed to (until successfull 
>> reconnect).
> 
> successful
> 
>> +#                   Default 300 seconds (Since 3.1)
> 
> My delay in reviewing means this now has to be 4.0.
> 
> I'm guessing that a delay of 0 means disable auto-reconnect.

I doubt, as reconnect-delay=0 is a valid case, when all request are failed
(pre-patch behavior), but we still try to reconnect in background. Any reason
not to try? If any, I suggest one of the following:

1. treat absence of the option as no-reconnect-at-all. So absence and 0 would be
    not the same

2. use -1 value as no-reconnect-at-all

3. additional bool option "reconnect"

>  From a
> backwards-compatibility standpoint, no auto-reconnect is more in line
> with what we previously had - but from a usability standpoint, trying to
> reconnect can avoid turning transient network hiccups into permanent
> loss of a device to EIO errors, especially if the retry timeout is long
> enough to allow an administrator to reroute the network to an
> alternative server.  So I'm probably okay with the default being
> non-zero - but it DOES mean that where you used to get instant EIO
> failures when a network connection was severed, you now have to wait for
> the reconnect delay to expire, and 5 minutes can be a long wait.  Since
> the long delay is guest-observable, can we run into issues where a guest
> that is currently used to instant EIO and total loss of the device could
> instead get confused by not getting any response for up to 5 minutes,
> whether or not that response eventually turns out to be EIO or a
> successful recovery?
> 
>> +++ b/block/nbd.c
>> @@ -360,6 +360,18 @@ static QemuOptsList nbd_runtime_opts = {
>>               .help = "experimental: expose named dirty bitmap in place of "
>>                       "block status",
>>           },
>> +        {
>> +            .name = "reconnect-delay",
>> +            .type = QEMU_OPT_NUMBER,
>> +            .help = "Reconnect delay. On disconnect, nbd client tries to"
>> +                    "connect again, until success or serious error. During"
>> +                    "first @reconnect-delay seconds of reconnecting loop 
>> all"
>> +                    "requests are paused and have a chance to rerun, if"
>> +                    "successful connect occures during this time. After"
>> +                    "@reconnect-delay seconds all delayed requests are 
>> failed"
>> +                    "and all following requests will be failed to (until"
>> +                    "successfull reconnect). Default 300 seconds",
> 
> Same typos as in qapi.
> 
> The UI aspects look fine, now I need to review the patch series for code
> issues :)
> 
> 


-- 
Best regards,
Vladimir

reply via email to

[Prev in Thread] Current Thread [Next in Thread]