qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [RFC] find_next_bit optimizations


From: Peter Lieven
Subject: Re: [Qemu-devel] [RFC] find_next_bit optimizations
Date: Mon, 11 Mar 2013 15:29:53 +0100

Am 11.03.2013 um 15:22 schrieb Peter Lieven <address@hidden>:

> 
> Am 11.03.2013 um 15:14 schrieb Paolo Bonzini <address@hidden>:
> 
>> Il 11/03/2013 14:44, Peter Lieven ha scritto:
>>> Hi,
>>> 
>>> I ever since had a few VMs which are very hard to migrate because of a
>>> lot of memory I/O. I found that finding the next dirty bit
>>> seemed to be one of the culprits (apart from removing locking which
>>> Paolo is working on).
>>> 
>>> I have to following proposal which seems to help a lot in my case. Just
>>> wanted to have some feedback.
>>> I applied the same unrolling idea like in buffer_is_zero().
>>> 
>>> Peter
>>> 
>>> --- a/util/bitops.c
>>> +++ b/util/bitops.c
>>> @@ -24,12 +24,13 @@ unsigned long find_next_bit(const unsigned long
>>> *addr, unsigned long size,
>>>    const unsigned long *p = addr + BITOP_WORD(offset);
>>>    unsigned long result = offset & ~(BITS_PER_LONG-1);
>>>    unsigned long tmp;
>>> +    unsigned long d0,d1,d2,d3;
>>> 
>>>    if (offset >= size) {
>>>        return size;
>>>    }
>>>    size -= result;
>>> -    offset %= BITS_PER_LONG;
>>> +    offset &= (BITS_PER_LONG-1);
>>>    if (offset) {
>>>        tmp = *(p++);
>>>        tmp &= (~0UL << offset);
>>> @@ -43,6 +44,18 @@ unsigned long find_next_bit(const unsigned long
>>> *addr, unsigned long size,
>>>        result += BITS_PER_LONG;
>>>    }
>>>    while (size & ~(BITS_PER_LONG-1)) {
>>> +        while (!(size & (4*BITS_PER_LONG-1))) {
>> 
>> This really means
>> 
>>      if (!(size & (4*BITS_PER_LONG-1))) {
>>          while (1) {
>>              ...
>>          }
>>      }
>> 
>> because the subtraction will not change the result of the "while" loop
>> condition.
> 
> Are you sure? The above is working nicely for me (wondering why ;-))
> I think !(size & (4*BITS_PER_LONG-1)) is the same as what you
> propose. If size & (4*BITS_PER_LONG-1) is not zero its not dividable
> by 4*BITS_PER_LONG. But it see it might be a problem for size == 0.
> 
>> 
>> What you want is probably "while (size & ~(4*BITS_PER_LONG-1))", which
>> in turn means "while (size >= 4*BITS_PER_LONG).
>> 
>> Please change both while loops to use a ">=" condition, it's easier to read.

Thinking again, in case a bit is found, this might lead to unnecessary 
iterations
in the while loop if the bit is in d1, d2 or d3.


> 
> Good idea, its easier to understand.
> 
> Has anyone evidence if unrolling 4 longs is optimal on today processors?
> I just chooses 4 longs because it was the same in buffer_is_zero.
> 
> Peter
> 
>> 
>> Paolo
>> 
>>> +            d0 = *p;
>>> +            d1 = *(p+1);
>>> +            d2 = *(p+2);
>>> +            d3 = *(p+3);
>>> +            if (d0 || d1 || d2 || d3) {
>>> +                break;
>>> +            }
>>> +            p+=4;
>>> +            result += 4*BITS_PER_LONG;
>>> +            size -= 4*BITS_PER_LONG;
>>> +        }
>>>        if ((tmp = *(p++))) {
>>>            goto found_middle;
>>>        }
>> 
> 




reply via email to

[Prev in Thread] Current Thread [Next in Thread]