bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Linus' sha1 is much faster!


From: Steven Noonan
Subject: Re: Linus' sha1 is much faster!
Date: Mon, 17 Aug 2009 14:43:55 -0700

On Mon, Aug 17, 2009 at 9:22 AM, Linus
Torvalds<address@hidden> wrote:
>
>
> On Mon, 17 Aug 2009, Steven Noonan wrote:
>>
>> Interesting. I compared Linus' implementation to the public domain one
>> by Steve Reid[1]
>
> You _really_ need to talk about what kind of environment you have.
>
> There are three major issues:
>  - Netburst vs non-netburst
>  - 32-bit vs 64-bit
>  - compiler version

Right. I'm running a Core 2 "Merom" 2.33GHz. The code was compiled for
x86_64 with GCC 4.2.1. I didn't _expect_ it to compile for x86_64, but
apparently the version of GCC that ships with Xcode 3.2 defaults to
compiling 64-bit code on machines that are capable of running it.

>
> Steve Reid's code looks great, but the way it is coded, gcc makes a mess
> of it, which is exactly what my SHA1 tries to avoid.
>
> [ In contrast, gcc does very well on just about _any_ straightforward
>  unrolled SHA1 C code if the target architecture is something like PPC or
>  ia64 that has enough registers to keep it all in registers.
>
>  I haven't really tested other compilers - a less aggressive compiler
>  would actually do _better_ on SHA1, because the problem with gcc is that
>  it turns the whole temporary 16-entry word array into register accesses,
>  and tries to do register allocation on that _array_.
>
>  That is wonderful for the above-mentioned PPC and IA64, but it makes gcc
>  create totally crazy code when there aren't enough registers, and then
>  gcc starts spilling randomly (ie it starts spilling a-e etc). This is
>  why the compiler and version matters so much. ]
>
>> (average of 5 runs)
>> Linus' sha1: 283MB/s
>> Steve Reid's sha1: 305MB/s
>
> So I get very different results:
>
>        #             TIME[s] SPEED[MB/s]
>        Reid            2.742       222.6
>        linus           1.464         417

Added -m32:

Steve Reid: 156MB/s
Linus: 209MB/s

So on x86, your code really kicks butt.

> this is Intel Nehalem, but compiled for 32-bit mode (which is the more
> challenging one because x86-32 only has 7 general-purpose registers), and
> with gcc-4.4.0.
>
>                        Linus
>




reply via email to

[Prev in Thread] Current Thread [Next in Thread]