[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCHv4 0/9] buffer_is_zero / migration optimizations
From: |
Peter Lieven |
Subject: |
Re: [Qemu-devel] [PATCHv4 0/9] buffer_is_zero / migration optimizations |
Date: |
Mon, 25 Mar 2013 14:32:13 +0100 |
Am 25.03.2013 um 14:23 schrieb Peter Lieven <address@hidden>:
>
> Am 25.03.2013 um 14:02 schrieb Paolo Bonzini <address@hidden>:
>
>>> Maybe I should have explained the output more detailed. The percentages
>>> are added. 35.8% in the second last column means that
>>> 35.8% have a return value that is less than TARGET_PAGE_SIZE.
>>> This was meant to illustrate at how many 64-bit chunks you have
>>> to look to grab a certain percentage of non-zero pages.
>>
>> Ok, I wrongly understood that many pages had 4088 zero bytes but
>> the last 8 were not zero. Now it's clearer, and more logical too. :)
>>
>>> Looking e.g. at the third value it means that looking at the first
>>> three 64-bit chunks it will catch 34.0% of all pages.
>>> It turns out that the non-zeroness of a page can be detected looking
>>> at the first 256 or so bits and only a low
>>> percentage turns out to be non-zero at a later position. So after
>>> having checked the first chunks one by one
>>> there is no big penalty looking at the remaining chunks with the
>>> vectorized loop.
>>
>> I think it makes most sense to unroll the first four non-vectorized
>> iterations, i.e. not use SSE and use three or four ifs. Either:
>>
>> if (foo[0]) return 0;
>> if (foo[1]) return 8;
>> if (foo[2]) return 16;
>> if (foo[3]) return 24;
>>
>> or
>>
>> if (foo[0]) return 0;
>> if (foo[1] | foo[2] | foo[3]) return 8;
>>
>> and then proceed on the remaining 4096-4*sizeof(long) bytes with
>> the vectorized loop. foo+4 is aligned for SIMD operations on both
>> 32- and 64-bit machines, which makes this a nice choice.
>
> i can't start at foo+4 since the remaining X-4*sizeof(long) bytes
> are not dividable by 8*sizeof(VECTYPE).
>
> I could just do sty like the following:
>
> const unsigned long *tmp = buf;
>
> for (i = 0;
> i < sizeof(VECTYPE) * BUFFER_FIND_NONZERO_OFFSET_UNROLL_FACTOR
> / sizeof(unsigned long);
> i += 4) {
> if (tmp[i + 0]) return i * sizeof(unsigned long);
> if (tmp[i + 1]) return (i+1) * sizeof(unsigned long);
> if (tmp[i + 2]) return (i+2) * sizeof(unsigned long);
> if (tmp[i + 3]) return (i+3) * sizeof(unsigned long);
> }
>
> for (i = BUFFER_FIND_NONZERO_OFFSET_UNROLL_FACTOR;
> i < len / sizeof(VECTYPE);
> i += BUFFER_FIND_NONZERO_OFFSET_UNROLL_FACTOR) {
> …
> }
performance of the above is bad compared to:
for (i = 0; i < BUFFER_FIND_NONZERO_OFFSET_UNROLL_FACTOR; i++) {
if (!ALL_EQ(p[i], zero)) {
return i * sizeof(VECTYPE);
}
}
…
The above is basically what old is_dup_page is doing, but after the first
8 iterations the optimized version kicks in.
Peter
- Re: [Qemu-devel] [PATCHv4 7/9] migration: do not sent zero pages in bulk stage, (continued)
- Re: [Qemu-devel] [PATCHv4 0/9] buffer_is_zero / migration optimizations, Paolo Bonzini, 2013/03/22
- Re: [Qemu-devel] [PATCHv4 0/9] buffer_is_zero / migration optimizations, Peter Lieven, 2013/03/22
- Re: [Qemu-devel] [PATCHv4 0/9] buffer_is_zero / migration optimizations, Paolo Bonzini, 2013/03/22
- Re: [Qemu-devel] [PATCHv4 0/9] buffer_is_zero / migration optimizations, Peter Lieven, 2013/03/23
- Re: [Qemu-devel] [PATCHv4 0/9] buffer_is_zero / migration optimizations, Peter Lieven, 2013/03/25
- Re: [Qemu-devel] [PATCHv4 0/9] buffer_is_zero / migration optimizations, Paolo Bonzini, 2013/03/25
- Re: [Qemu-devel] [PATCHv4 0/9] buffer_is_zero / migration optimizations, Peter Lieven, 2013/03/25
- Re: [Qemu-devel] [PATCHv4 0/9] buffer_is_zero / migration optimizations, Paolo Bonzini, 2013/03/25
- Re: [Qemu-devel] [PATCHv4 0/9] buffer_is_zero / migration optimizations, Peter Lieven, 2013/03/25
- Re: [Qemu-devel] [PATCHv4 0/9] buffer_is_zero / migration optimizations,
Peter Lieven <=
- Re: [Qemu-devel] [PATCHv4 0/9] buffer_is_zero / migration optimizations, Paolo Bonzini, 2013/03/25
- Re: [Qemu-devel] [PATCHv4 0/9] buffer_is_zero / migration optimizations, Peter Lieven, 2013/03/25
- Re: [Qemu-devel] [PATCHv4 0/9] buffer_is_zero / migration optimizations, Peter Lieven, 2013/03/26
- Re: [Qemu-devel] [PATCHv4 0/9] buffer_is_zero / migration optimizations, Paolo Bonzini, 2013/03/26