qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [RFC] find_next_bit optimizations


From: Peter Lieven
Subject: Re: [Qemu-devel] [RFC] find_next_bit optimizations
Date: Mon, 11 Mar 2013 19:20:18 +0100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:15.0) Gecko/20120827 Thunderbird/15.0

Am 11.03.2013 18:07, schrieb Paolo Bonzini:
> Il 11/03/2013 18:06, ronnie sahlberg ha scritto:
>> Even more efficient might be to do bitwise instead of logical or
>>
>>>>        if (tmp | d1 | d2 | d3) {
>> that should remove 3 of the 4 conditional jumps
>> and should become 3 bitwise ors and one conditional jump
> 
> Without any serious profiling, please let the compiler do that.

Paolo is right, i ran some tests with gcc 4.6.3 on x86_64 (with -O3) and tried 
the
various ideas. They all made no significant difference. Even unrolling to 8 
unsigned
longs didn't change anything.

What I tried is running 1^20 interations of find_next_bit(bitfield,4194304,0);
I choosed the bitfield to be 4MByte which equals a 16GB VM. The bitfield was
complete zeroed so find_next_bit had to run completely through the bitfield.

The original version took 1 minute and 10 seconds whereas all other took
approx. 37-38 seconds which is almost a 100% boost ;-)

So I think this here is the final version:

    while (size >= 4*BITS_PER_LONG) {
        unsigned long d1, d2, d3;
        tmp = *p;
        d1 = *(p+1);
        d2 = *(p+2);
        d3 = *(p+3);
        if (tmp) {
            goto found_middle;
        }        
        if (d1 || d2 || d3) {
            break;
        }
        p += 4;
        result += 4*BITS_PER_LONG;
        size -= 4*BITS_PER_LONG;
    }

Peter




reply via email to

[Prev in Thread] Current Thread [Next in Thread]