qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH] block/iscsi: handle BUSY condition


From: Peter Lieven
Subject: Re: [Qemu-devel] [PATCH] block/iscsi: handle BUSY condition
Date: Thu, 29 May 2014 20:41:05 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.4.0

Am 29.05.2014 19:47, schrieb Paolo Bonzini:
> Il 29/05/2014 19:31, Peter Lieven ha scritto:
>> +static const unsigned iscsi_retry_times[] = {4, 16, 64, 256, 1024, 4096};
>
> This is a pretty fast growth.

I know, I wanted to end up with a total retry period of approx. 10 seconds 
before failing.
As we allow only for  5 retries smaller growth would result in less total time.
>
>> +static void iscsi_retry_timer_expired(void *opaque)
>> +{
>> +    struct IscsiTask *iTask = opaque;
>> +    timer_del(iTask->retry_timer);
>
> This timer_del is not necessary.  Also, since you are introducing a new usage 
> I would prefer if you used aio_timer_init instead.

Okay, I didn't know that aio_timer_new was deprecated.

timer_del is not necessary because we delete the timer in the callback and can 
be sure that it is not armed?

>
>> +    timer_free(iTask->retry_timer);
>> +    if (iTask->co) {
>> +        qemu_coroutine_enter(iTask->co, NULL);
>> +    }
>> +}
>> +
>> +#define RANDRANGE(M, N) (M + rand() / (RAND_MAX / (N - M + 1) + 1))
>
> A minor nit: you should probably use "((double)rand() / RAND_MAX) and then 
> cast it back to integer at the end.  You should also make it a static inline 
> function for readability, or surround the arguments with parentheses.

>
> If you care, it probably would be better to have the random numbers be 
> exponentially distributed, even more so since the ranges form a geometric 
> progression.  So for example the first retry would be on average 8 ms rather 
> than 10 ms.  The difference for the last retry is half a second!

I hope we never get in the situation that we need more than one retry. I just 
wanted to give the storage a chance to calm down and in this case a static 
value would be either
to high for the first retry or to low for the higher retries if ever reached.

If you want I can include this one here which seems to work:

static inline unsigned geo_rand_range(double m, double n) {
    return exp((log(m) + (double)rand() /
               (RAND_MAX / (log(n) - log(m) + log(1)) + log(1))));
}

>
> You can do that by taking the (natural) logarithm of N and M, applying 
> randrange on the logarithms, and then applying exp on the result.
>
> Otherwise looks good, but how did you test the patch?  I'd rather have the 
> growth in iscsi_retry_times backed by actual data, but I understand that's 
> hard to gather.

I don't know how actually. In my use case a see a single BUSY once every few 
hours on a group runnning new storage firmware with a few hundred targets.
For testing this I randomly created BUSY conditions and overwrote the status.

Peter

>
> Paolo
>
>>  static void
>>  iscsi_co_generic_cb(struct iscsi_context *iscsi, int status,
>>                          void *command_data, void *opaque)
>> @@ -156,20 +172,40 @@ iscsi_co_generic_cb(struct iscsi_context *iscsi, int 
>> status,
>>      iTask->do_retry = 0;
>>      iTask->task = task;
>>
>> -    if (iTask->retries-- > 0 && status == SCSI_STATUS_CHECK_CONDITION
>> -        && task->sense.key == SCSI_SENSE_UNIT_ATTENTION) {
>> -        error_report("iSCSI CheckCondition: %s", iscsi_get_error(iscsi));
>> -        iTask->do_retry = 1;
>> -        goto out;
>> -    }
>> -
>>      if (status != SCSI_STATUS_GOOD) {
>> +        if (iTask->retries++ < ISCSI_CMD_RETRIES) {
>> +            if (status == SCSI_STATUS_CHECK_CONDITION
>> +                && task->sense.key == SCSI_SENSE_UNIT_ATTENTION) {
>> +                error_report("iSCSI CheckCondition: %s",
>> +                             iscsi_get_error(iscsi));
>> +                iTask->do_retry = 1;
>> +                goto out;
>> +            }
>> +            if (status == SCSI_STATUS_BUSY) {
>> +                unsigned retry_time =
>> +                    RANDRANGE(iscsi_retry_times[iTask->retries - 1],
>> +                              iscsi_retry_times[iTask->retries]);
>> +                error_report("iSCSI Busy (retry #%u in %u ms): %s",
>> +                             iTask->retries, retry_time,
>> +                             iscsi_get_error(iscsi));
>> +                iTask->retry_timer = 
>> aio_timer_new(iTask->iscsilun->aio_context,
>> +                                                   QEMU_CLOCK_REALTIME,
>> +                                                   SCALE_MS,
>> +                                                   
>> iscsi_retry_timer_expired,
>> +                                                   iTask);
>> +                timer_mod(iTask->retry_timer,
>> +                          qemu_clock_get_ms(QEMU_CLOCK_REALTIME) + 
>> retry_time);
>




reply via email to

[Prev in Thread] Current Thread [Next in Thread]