[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [ft-devel] FT_MulFix assembly
From: |
James Cloos |
Subject: |
Re: [ft-devel] FT_MulFix assembly |
Date: |
Sat, 07 Aug 2010 12:36:27 -0400 |
User-agent: |
Gnus/5.110011 (No Gnus v0.11) Emacs/24.0.50 (gnu/linux) |
My first cut at FT_MulFix_x86_64() is:
static __inline__ FT_Int32
FT_MulFix_x86_64 (FT_Int32 a, FT_Int32 b) {
register FT_Int32 r;
__asm__ __volatile__ (
"movslq %%edx, %%rdx\n"
"cltq\n"
"imul %%rdx\n"
"addq %%rdx, %%rax\n"
"addq $0x8000, %%rax\n"
"sarq $16, %%rax\n"
: "=a"(r)
: "a"(a), "d"(b));
return r;
}
It passes a monte-carlo test comparing its results to the C code and to
the i386 assembly.
The logic is simple. The first two instructions sign-extend the two
values to 64 bits, the multiply puts the least significant 64 bits of
the product in rax and the most significant bits in rdx; because the
values started out as 32 bit, rdx is guaranteed to be only sign bits:
zero if the product is >=0, else -1. Adding the resulting rdx to rax
serves the same purpose as the ecx value in the i386 version: it makes
the rounding symmetric around zero, just like the C code.
An alternative might be to cast the src values to (FT_Int64), but I
doubt that the compiler would generate any better code than calling
movslq and cltq.
I have to finish the patch, but I thought I'd offer the algorithm for
review, if anyone wants to.
-JimC
--
James Cloos <address@hidden> OpenPGP: 1024D/ED7DAEA6