freetype-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ft-devel] FT_MulFix assembly


From: James Cloos
Subject: Re: [ft-devel] FT_MulFix assembly
Date: Sun, 05 Sep 2010 17:44:01 -0400
User-agent: Gnus/5.110011 (No Gnus v0.11) Emacs/24.0.50 (gnu/linux)

The final result for amd64 looks like:

  static __inline__ long
  FT_MulFix_x86_64( long  a,
                   long  b )
  {
    register long  result;

    __asm__ __volatile__ (
      "movq  %1, %%rax\n"
      "imul  %2\n"
      "addq  %%rdx, %%rax\n"
      "addq  $0x8000, %%rax\n"
      "sarq  $16, %%rax\n"
      : "=a"(result)
      : "g"(a), "g"(b)
      : "rdx" );
    return result;
  }


The use of long, though requires review.  The C version uses FT_Long
(not FT_Int32 like the other asm versions), but FT_Long is not a #define
or a typedef at the point where the asm version are located.

That said, using long there on amd64 prevents unnecessary 32<->64 bit
conversions in the resulting code.

The above code has a latency of 1+5+1+1+1 = 10 clocks on an amdfam10 cpu.

The assembly generated by the C code is 45 lines and 158 octets long,
contains six conditional jumps, three each of explicit compares and
tests, and still benchmarks are just as fast.  Out-of-order processing
wins out over hand-coded asm. :-/

It *might* make more of a difference on an in-order processor like the
Arom.  But I do not have one to test.

I can still finish a patch, and have collected the info I need to do one
for mips64, too, where I expect it will be more important.  I also expect
that the i386 version could be tidied a bit.

Is the amd64 version desired, given how little benefit it has?

-JimC
-- 
James Cloos <address@hidden>         OpenPGP: 1024D/ED7DAEA6



reply via email to

[Prev in Thread] Current Thread [Next in Thread]