Re: Help with Hand-Optimized Assembly

help-gplusplus

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Help with Hand-Optimized Assembly

From:	Markus Wichmann
Subject:	Re: Help with Hand-Optimized Assembly
Date:	Wed, 28 Mar 2012 18:29:58 -0000
User-agent:	Mozilla/5.0 (X11; Linux i686 on x86_64; rv:9.0) Gecko/20111222 Thunderbird/9.0.1

On 12.01.2012 22:38, Bill Woessner wrote:
> I'm a 100% total newbie at writing assembly.  But I figured it would
> be a good exercise.  And besides, this tiny chunk of code is
> definitely in the critical path of something I'm working on.  Any and
> all advice would be appreciated.
> 
> I'm trying to rewrite the following function in x86 assembly:
> 
> inline double DiffAngle(double theta1, double theta2)
> {
>   double delta(theta1 - theta2);
> 
>   return std::abs(delta) <= M_PI ? delta : delta - copysign(2 * M_PI,
> delta);
> }
> 
> To my great surprise, I've actually been somewhat successful.  Here's
> what I have so far:
> 
> double DiffAngle(double theta1, double theta2)
> {
>   asm(
>       "fldl    4(%esp);"
>       "fsubl   12(%esp);"
>       "fxam;"
>       "fnstsw  %ax;"
>       "fldl    TWO_PI;"
>       "testb   $2, %ah;"
>       "fldl    NEG_TWO_PI;"
>       "fcmovne %st(1), %st;"
>       "fstp    %st(1);"
>       "fsubr   %st(1), %st;"
>       "fldpi;"
>       "fld     %st(2);"
>       "fabs;"
>       "fcomip  %st(1), %st;"
>       "fstp    %st(0);"
>       "fcmovbe %st(1), %st;"
>       "fstp    %st(1);"
>       "rep;"
>       "ret;"
>       "NEG_TWO_PI:;"
>       ".long   1413754136;"
>       ".long   1075388923;"
>       "TWO_PI:;"
>       ".long   1413754136;"
>       ".long   -1072094725;"
>       );
> }
> 
> This compiles, runs and produces the correct answers.  But I have a
> few issues with it:
> 
> 1) If I declare this function inline, it gives me garbage (like
> 10^-304)

That is because you actually require a real call to the function. If the
above assembly is inlined, the compiler doesn't really know where to put
the input and output variables.

I'm rewriting your C++ first, so I can put it into assembly more easily:

double DiffAngle(double theta1, double theta2)
{
    double diff = theta1 - theta2;

    if (abs(diff) <= M_PI)
        return diff;
    else if (diff < 0)
        return diff + 2 * M_PI;
    else
        return diff - 2 * M_PI;

//Or, in a more SSE-like manor:
    double subtract;
    subtract = copysign(2*M_PI, diff);
    if (abs(diff) <= M_PI) subtract = 0;
    return diff - subtract;
}

Because you might want to rewrite the stuff anyway in SSE2, I'd change
it to something like:

double DiffAngle(double theta1, double theta2)
{
    double res;
    const uint64_t no_sign_mask = 0x7fffffffffffffff;
    asm("movsd %1, %xmm0": : "m" (theta1) : );
    asm("subsd %1, %xmm0": : "m" (theta2) : );
    asm("movsd %xmm0, %xmm1" : : );
    asm("movq %1, %xmm2" : : "m" (no_sign_mask) : );
    asm("andpd %xmm2, %xmm0" : : ); //xmm0 = abs
    asm("cmpgtsd %1, %xmm0": : "m" (M_PI) : ); //if abs(diff) <= M_PI
    // %xmm0 = 0, else %xmm0 == 0xffff...
    asm("movsd %1, %xmm3": : "m" (2 * M_PI) : );
    asm("movsd %1, %xmm2": : "m" (~no_sign_mask) : );
    asm("movsd %xmm1, %xmm4" : : : );
    asm("andpd %xmm2, %xmm4" : : : );
    asm("orpd %xmm4, %xmm3" : : : );
    asm("andpd %xmm0, %xmm3" : : : );
    asm("subsd %xmm3, %xmm1" : : :);
    asm("movsd %xmm1, %0" : "=m" (res) : : );
    return res;
}

Does that work for you? It's untested!

> 2) If I compile with -Wall, I get a warning that the function doesn't
> return a value, which is absolutely true, but I don't know how to fix
> it.

double ret;
asm("fld %1; fld %2; blablabla; fstp %0"
    : "m" (theta1), "m" (theta2)
    : "=m" (ret)
    : );
return ret;

This should also clear your previous question.

> 3) I don't like how TWO_PI and NEG_TWO_PI are defined.  I had to steal
> it from some generated assembly.  It would be nice to use M_PI,
> 4*atan(1) or something like that.
> 

Just define it as new inputs and let the compiler worry. Like:

double ret;
asm("fld %1; fld %2; blabla; fld %3; blabli; fld %4; bla; fstp %0"
    : "=m" (ret)
    : "m" (theta1), "m" (theta2), "m" (2*M_PI), "m" (-2*M_PI)
    : );
return ret;

The "m" means "memory operand" (let the compiler worry about the
addresses!), the "=" means "write only operand".

> Thanks in advance,
> Bill

HTH,
Markus

[Prev in Thread]

Current Thread

[Next in Thread]

Re: Help with Hand-Optimized Assembly, (continued)
- Re: Help with Hand-Optimized Assembly, Bob Masta, 2012/03/28
- Re: Help with Hand-Optimized Assembly, James Harris, 2012/03/28
- Re: Help with Hand-Optimized Assembly, Markus Wichmann <=
- Re: Help with Hand-Optimized Assembly, Jan Seiffert, 2012/03/28
- Re: Help with Hand-Optimized Assembly, Bill Woessner, 2012/03/28
  - Re: Help with Hand-Optimized Assembly, sfuerst, 2012/03/28
    - Re: Help with Hand-Optimized Assembly, Bill Woessner, 2012/03/28

Prev by Date: Re: Help with Hand-Optimized Assembly
Next by Date: Re: Help with Hand-Optimized Assembly
Previous by thread: Re: Help with Hand-Optimized Assembly
Next by thread: Re: Help with Hand-Optimized Assembly
Index(es):
- Date
- Thread