2011/9/6 Pádraig Brady
<address@hidden>
A few general points.
You essentially used Linus' code (albeit by
very helpfully isolating the significant differences).
It might be easier/required to just include it in gnulib?
There are a few files in gnulib that are not copyright of the FSF,
so would Nicolas and Linus need to assign copyright?
Yes, this is what I did. I don't thing that including Linus' is easier as the functions have a different prototype. Also, sha1, sha256 and sha512 share the same structure in gnulib, changing one without changing the other would be weird. But if you thing it is required, I have not problem with that.
By the way, I have done a test on sha512 and I have improved the speed on the same 1Gb zero file from 4.5 to 3.9s. Please find the patch attached. So I thing that using the same technics, we could improve all sha's speed.
For performance testing I've found gcc generates
much more deterministic results with a -march
as close to native as possible or otherwise
the code is very susceptible to alignment issues etc.
Your compiler supports -march=native.
Note also gcc 4.6 has much better support for your sandy bridge CPU,
either with -march=native or -march=corei7-avx
I tried using gcc-4.6.1 (I recompiled it under my
ubuntu 10.10) but I couldn't see any differences. For me, using any combination of -march=native or not and gcc 4.4.5 or 4.6.1 doesn't make a
difference, all the times are in the measurement margin.
As for the SSE version, I would also like to see that included,
given the proportion of hardware supporting that these days.
I previously noticed a coreutils SSE2 patch here:
http://www.arctic.org/~dean/crypto/sha1.html
Though we'd probably need some runtime SSE detection to include that.
Ok, I could try to work on this. The real problem is to test that compilation and SSE detection is done correctly on several platform. I only have access to a few x86 machines, what is the usual way to test more platforms ?
Best regards
--
Loïc