bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: 1.2x to 1.9x speedup for sha1sum using SSE2


From: dean gaudet
Subject: Re: 1.2x to 1.9x speedup for sha1sum using SSE2
Date: Fri, 5 Dec 2003 15:01:19 -0800 (PST)

On Fri, 5 Dec 2003, Marco Gerards wrote:

> dean gaudet <address@hidden> writes:
>
> > at <http://arctic.org/~dean/crypto/sha1.html> you'll find a coreutils
> > patch which includes a new implementation of SHA1 using SSE2 hardware for
> > a speedup ranging from 1.2x to 1.9x depending on which SSE2-capable CPU is
> > used.
> >
> > there's a complication to compiling this code -- it requires the intel
> > compiler for one file (gcc should compile it, but it trips a bug which has
> > been reported).  i've included icc-generated assembly... and a makefile
> > hack to use the assembly.  it's all enabled with --enable-sse2.
>
> Is this intel compiler Free Software?  It would be really bad if
> coreutils has a feature for which non-free software would be required
> IMHO.

the intel compiler is downloadable for free (as in beer) for evaluation
purposes, but is not free (as in speech).  my patch includes the assembly,
which can be assembled with gas.  icc is only required if you want to
modify the SHA1 code.


> Isn't there any way to avoid the gcc bug?  Very often this is possible
> by using autoconf to check if the gcc version has the bug and activate
> some code to avoid the bug.

if you disable optimisation then the bug is avoided, but this code is
pointless without optimisation.  otherwise the bug is present.

gcc was capable of compiling a very early version of this code, however
its performance was significantly worse than the icc compiled code.  if
the gcc bug is fixed i'll re-evaluate the performance and let them know if
it's still markedly worse.  (the code is challenging for the compiler
because it requires very detailed knowledge of the processor issue ports.)

i've been reconsidering the use of C for the code anyhow, because i'm
running up against compiler limitations in trying to get a bi-endian
version of the core code for inclusion in openssl.  i may end up reverting
to a perl program which generates very stylized assembly.  this would
certainly be free as in speech and beer.

a more interesting question for coreutils folks might be if they want to
consider processor-specific optimisation like this in the coreutils tree.
or if they'd rather i work on getting it into, say, the libgcrypt tree
where it could be shared.

anyhow, this is a side project for me... i just wanted to share my results
so far :)

-dean




reply via email to

[Prev in Thread] Current Thread [Next in Thread]