Re: Help with Hand-Optimized Assembly

help-gplusplus

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Help with Hand-Optimized Assembly

From:	Terje Mathisen
Subject:	Re: Help with Hand-Optimized Assembly
Date:	Wed, 28 Mar 2012 18:29:55 -0000
User-agent:	Mozilla/5.0 (Windows NT 6.1; WOW64; rv:9.0.1) Gecko/20111221 Firefox/9.0.1 SeaMonkey/2.6.1

James Van Buskirk wrote:

"Bill Woessner"<woessner@nospicedham.gmail.com>  wrote in message
67ddafac-ae03-4ef1-b156-5488e8b8086a@i26g2000vbt.googlegroups.com">news:67ddafac-ae03-4ef1-b156-5488e8b8086a@i26g2000vbt.googlegroups.com...

This compiles, runs and produces the correct answers.  But I have a
few issues with it:

1) If I declare this function inline, it gives me garbage (like
10^-304)
2) If I compile with -Wall, I get a warning that the function doesn't
return a value, which is absolutely true, but I don't know how to fix
it.
3) I don't like how TWO_PI and NEG_TWO_PI are defined.  I had to steal
it from some generated assembly.  It would be nice to use M_PI,
4*atan(1) or something like that.


I can't help you with your questions because I would always write
something like this in assembly rather than C, but is there some
reason that you can't use SSE2 rather than x87 here?  SSE2 should
be much faster if available in the context of your problem.


I'll second James' suggestion about SSE2!

Anyway, it seem that what you are trying to do is to take the differencebetween two angles and then make sure that said difference will be inthe [-pi .. pi> range, right?


I.e. what is the rotation angle to get from theta2 to theta1?

Let's start by looking at the various alternatives:

if the signs of th1 and th2 are the same, then the difference _must_ bein range:


 0 - pi = -pi
 pi - 0 = pi

 -0 - -pi = pi
 -pi - 0  = -pi

It is only when the signs differ that you might need to add or subtract2pi to bring it into range:


 pi - -pi = 2pi
-pi - pi = -2pi

I don't see immediately how I can use this to speed it up though...

Anyway, trying your original algorithm:

  movq xmm0,[theta1]

  subsd xmm0,[theta2]   ;; Result in [-2pi to 2pi]
  movq xmm2,[plus_mask] ;; 0x7fffff...

  andpd xmm2,xmm0       ;; ABS(diff), [0 to 2pi]
  movq xmm3,[pi]

  cmplesd xmm3,xmm2     ;; -1 mask if diff > pi
  andpd xmm3,[twopi]    ;; 0 or 2pi
  subsd xmm2,xmm3       ;; [-pi to pi]

If the original subtraction sign was negative, then we must invert thesign of the result:


  andpd xmm0,[signbits] ;; (-0.0 , -0.0)
  xorpd xmm0, xmm2

The code can be rescheduled a bit, and the mixture of 64-bit scalar and128-bit packed operations must be checked that they don't introduceforwarding problems, but it seems like it should run in 10-15 cycles,depending upon the latency of the FP operations (SUBSD, CMPLESD, SUBSD)

I tried to figure out a way to use scaling and integer math, but that islikely to be slower.


Terje
--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"

[Prev in Thread]

Current Thread

[Next in Thread]

Help with Hand-Optimized Assembly, Bill Woessner, 2012/03/28
- Re: Help with Hand-Optimized Assembly, James Van Buskirk, 2012/03/28
  - Re: Help with Hand-Optimized Assembly, Terje Mathisen <=
    - Re: Help with Hand-Optimized Assembly, Bill Woessner, 2012/03/28
    - Re: Help with Hand-Optimized Assembly, sfuerst, 2012/03/28
    - Re: Help with Hand-Optimized Assembly, Terje Mathisen, 2012/03/28
    - Re: Help with Hand-Optimized Assembly, James Van Buskirk, 2012/03/28
    - Re: Help with Hand-Optimized Assembly, Terje Mathisen, 2012/03/28
    - Re: Help with Hand-Optimized Assembly, Tim Roberts, 2012/03/28
    - Re: Help with Hand-Optimized Assembly, Terje Mathisen, 2012/03/28
    - Re: Help with Hand-Optimized Assembly, io_x, 2012/03/28
    - Re: Help with Hand-Optimized Assembly, io_x, 2012/03/28
- Re: Help with Hand-Optimized Assembly, Bob Masta, 2012/03/28

Prev by Date: Re: Help with Hand-Optimized Assembly
Next by Date: Re: Help with Hand-Optimized Assembly
Previous by thread: Re: Help with Hand-Optimized Assembly
Next by thread: Re: Help with Hand-Optimized Assembly
Index(es):
- Date
- Thread