help-gplusplus
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Help with Hand-Optimized Assembly


From: sfuerst
Subject: Re: Help with Hand-Optimized Assembly
Date: Wed, 28 Mar 2012 18:29:56 -0000
User-agent: G2/1.0

On Jan 13, 9:45 am, Bill Woessner <woess...@nospicedham.gmail.com>
wrote:
> On Jan 13, 4:59 am, Terje Mathisen <"terje.mathisen at
>
> tmsw.no"@giganews.com> wrote:
> > I'll second James' suggestion about SSE2!
>
> I'm open to using SSE2.  The only reason I used x87 is that I started
> with the assembly code that g++ generated.  By default, it generates
> x87 instructions.  But I'm certainly willing to try with SSE2
> instructions.
>
> > Anyway, it seem that what you are trying to do is to take the
> > difference between two angles and then make sure that said
> > difference will be in the [-pi .. pi> range, right?
>
> That's it exactly.  It's such a simple thing, but I can't come up with
> a really elegant way to do it.  The code generated by g++ involves a
> jump.  But I think this should be possible without a jump by using a
> conditional move.
>
> > Anyway, trying your original algorithm:
>
> I'll give your implementation a try.  A big part of the challenge (for
> me, at least) is figuring out how to get this in to a form that g++
> will understand.  I don't really care where or how this is
> implemented, but I need to be able to call it from C++.  And it really
> should be inline, as well.  Otherwise, all the efficiency gained by
> tweaking the assembly will be lost.  :-p
>
> Thanks,
> Bill

There is a straight-forward algorithm using the fact that only one of
the bounds can be crossed...

Something like this:
(Inputs in %xmm0, and %xmm1, output in %xmm0)

subsd %xmm1,%xmm0
movsd plusM_PI(%rip), %xmm1
movsd minusM_PI(%rip), %xmm2

cmpgtsd %xmm0, %xmm1
cmpltsd %xmm0, %xmm2

andpd  minus2M_PI(%rip), %xmm1
andpd  plus2M_PI(%rip), %xmm2

addsd %xmm1, %xmm0
addsd %xmm2, %xmm0

I probably have some of the comparisons reversed by mistake... but you
get the idea.  You can do both comparisons in parallel.  Using sign
tricks doesn't seem to be profitable, as that increases the length of
the critical path.

Steven


reply via email to

[Prev in Thread] Current Thread [Next in Thread]