[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Take some lowhanging fruit to speed up R6RS fixnum operations
From: |
Andreas Rottmann |
Subject: |
Re: Take some lowhanging fruit to speed up R6RS fixnum operations |
Date: |
Wed, 30 Mar 2011 12:58:13 +0200 |
User-agent: |
Gnus/5.13 (Gnus v5.13) Emacs/24.0.50 (gnu/linux) |
[ Sorry for the incomplete mail I just sent; hit the wrong key combo ]
Andy Wingo <address@hidden> writes:
> On Wed 23 Mar 2011 00:20, Andreas Rottmann <address@hidden> writes:
>
>> In porting dorodango[0], I have noticed that Guile's R6RS fixnum
>> operations are quite slow; here's a patch to remedy that a bit.
>
> What is the state of things here? I'm a bit lost in the discussion :)
>
> You said that decompressing the zip file takes 74 seconds with Guile as
> it is now. What if you change to use Guile's arithmetic instead of
> fxops? That would give is a good baseline so that we can know how much
> overhead the fxops have.
>
As promised, here are a bunch of numbers that will hopefully help to
determine the desired course of action:
| Codebase | fixnum? | fxxor | unzip |
|--------------------+---------+-------+-------|
| vanilla | 5.66 | 24.69 | 65.9 |
| generic | 5.62 | 1.92 | 7.7 |
| smart | 3.42 | 21.29 | 56.3 |
| smart + opt | 3.42 | 6.99 | 21.9 |
| smart + opt-macros | 3.4 | 5.56 | 21.0 |
| vmop | 1.78 | 18.83 | 47.9 |
| vmop + opt | 1.77 | 3.94 | 14.1 |
| vmop + opt-macros | 1.78 | 3.7 | 13.3 |
As you can see, in the "vanilla" case (i.e. current stable-2.0), it's
65.9 seconds -- I do not know where I got the 74 second number from,
perhaps it was from a profiling run.
The "generic" case is the one where all fixnum checks have been
eliminated (i.e. using the generic procedures, such as logxor), thus
allowing the compiler to emit VM primitives for most of the operations.
Now, with these baselines established, I implemented the following
optimizations, and benchmarked combinations of them as can be seen from
the above table:
- "smart" refers to having `fixnum?' implemented via `define-inline' and
using `object-address'.
- "opt" refers to having many fixnum procedures (e.g. fx+, fxxor, ...)
going through a case-lambda, thus optimizing the common, two-argument
case not to cons.
- "opt-macros" refers to making the same procedures as in "opt" actually
macros, expanding to a `(let ((x x-expr) (y y-expr)) (OP x y))' in the
binary case.
- "vmop" means that the newly-introduced VM primitive `fixnum?' is used.
Regarding the columns, "fixnum?" is the time needed for 10 million
`fixnum?' checks, "fxxor" for 10 million binary `fxxor' operations, and
"unzip" times the unzipping of a not-so-large (675KiB) ZIP file using
Göran Weinholts ZIP implementation.
Now there are several questions:
- Can we put the `fixnum?' VM primitive into stable-2.0? AFAIK, yes we
can, but we will break things for users mixing 2.0.0 and 2.0.1 on the
same machine (or sharing $HOME). So do we want to?
- Should we complicate the implementation and take a potential code size
hit by using the "opt-macros" hack, for a 5-6% gain in speed? Of
course, that's just for the ZIP benchmark, but I think it gives a good
estimate in what kind of ballpark we are in here.
My preference would be to go with the simpler implementation ("opt"
instead of "opt-macros").
- Should further improvements be considered? I guess if we put the
fixnum operations into the VM, we could get as fast as the "generic"
case, and perhaps even a bit faster (i.e. about a 2x speedup on the
ZIP benchmark compared to vmop + opt). I'm not sure if that's worth
it.
Regards, Rotty
--
Andreas Rottmann -- <http://rotty.yi.org/>
Re: Take some lowhanging fruit to speed up R6RS fixnum operations, Andy Wingo, 2011/03/29