qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH] copy, dd: simplify and optimize NUL bytes detec


From: Paolo Bonzini
Subject: Re: [Qemu-devel] [PATCH] copy, dd: simplify and optimize NUL bytes detection
Date: Thu, 22 Oct 2015 17:31:30 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.3.0


On 22/10/2015 17:17, Pádraig Brady wrote:
>> > Nice trick indeed.  On the other hand, the first 16 bytes are enough to
>> > rule out 99.99% (number out of thin hair) of the non-zero blocks, so
>> > that's where you want to optimize.  Checking them an unsigned long at a
>> > time, or fetching a few unsigned longs and ORing them together would
>> > probably be the best of both worlds, because you then only use the FPU
>> > in the rare case of a zero buffer.
> Note the above does break early if non zero detected in first 16 bytes.

Yes, but it loops unnecessarily if the non-zero byte is the third or fourth.

> Also I suspect the extra conditions involved in using longs
> for just the first 16 bytes would outweigh the benefits?

Only if your machine cannot do unaligned loads.  If it can, you can
align the length instead of the buffer.  memcmp will take care of
aligning the buffer (with some luck it won't have to, e.g. if buf is
0x12340002 and length = 4094).  On x86 unaligned "unsigned long" loads
are basically free as long as they don't cross a cache line.

> BTW Rusty has a benchmark framework for this as referenced from:
> http://rusty.ozlabs.org/?p=560

I missed his benchmark framework so I wrote another one, here it is:
https://gist.githubusercontent.com/bonzini/9a95b0e02d1ceb60af9e/raw/7bc42ddccdb6c42fea3db58e0539d0443d0e6dc6/memeqzero.c

Paolo



reply via email to

[Prev in Thread] Current Thread [Next in Thread]