Re: [Chicken-users] Optimizing inner loops

chicken-users

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Chicken-users] Optimizing inner loops

From:	Will M Farr
Subject:	Re: [Chicken-users] Optimizing inner loops
Date:	Tue, 29 Aug 2006 16:07:56 -0400

Hello Carlos (and everyone),

Thanks for the reference to the inline egg---I'd seen that, but hadforgotten about it. Below is a little bit of code you can use totest the speed of the various methods I discussed in my last emailfor different sizes of vectors.

Each of the attached files can be compiled to an executable which isinvoked as


./test_{c,scm,fast_scm} <n> <n-iter>

which adds two vectors of double-precision floating point numbers oflength <n> and stores the result in a third vector <n-iter> times.On my machine (PowerBook G4 800 Mhz, Mac OS 10.4.7) user timings withn = 1000 and n-iter = 1000000 (conveniently 1 GFLOP) are as follows:


csr-dyn-69:/tmp farr$ time ./test_c 1000 1000000

real    0m7.889s
user    0m6.407s
sys     0m0.057s


csr-dyn-69:/tmp farr$ time ./test_fast_scm 1000 1000000

real    0m7.879s
user    0m6.296s
sys     0m0.094s


csr-dyn-69:/tmp farr$ time ./test_scm 1000 1000000
^C

real    30m56.616s
user    19m30.921s
sys     1m14.798s

As you can see, I terminated the test_scm run prematurely because itwas taking *forever*. I guess the moral of this story is that thepure-scheme implementation isn't going to come close (over two ordersof magnitude when I killed it) to the speed of the C operation.Several other things I can think of about this test:

1. Putting inline c code in the inner loops of your scheme program isjust as fast as pure-c for 1000-element vectors. This performanceequality will degrade as the vectors get shorter (because less workis done in c before looping in scheme). In general, the more you doinside the foreign-lambda* c-code, the closer you will come to pure-ccode because the looping scheme calls will be a smaller and smallerfraction of the run time. On my machine, test_fast_scm is about 15times slower than test_c when n = 3.

2. Cache effects will become important at some point (which is aboven = 1000 on my machine)---i.e. if you run ./test_{...} 1000000 1000,which is also 1 GFLOP of operations, the relative timings will changea lot. With such a long vector to add, and so little work being doneon each vector element, this is now a test of how fast your computercan move vector elements from main memory into the low-level cache---the speed of the add instruction becomes a lot less relevant, so youcan probably better afford the extra work done in the pure schemecode. In the paper http://repository.readscheme.org/ftp/papers/sw2000/lucier.pdf , they claim that their gambit-c code runs just asfast as c. They are correct, but (based on some other tests I'vedone) this only holds when you're bus-speed limited---gambit is, infact, a bit slower (maybe a factor of 2 or 3) than c in itsarithmetic. This factor clearly doesn't matter when you're stalledon a cache miss (though I bet even extreme cache missing wouldn'tdisguise the two orders of magnitude in the native-scheme chicken code).

Feel free to run this code yourself with relevant numbers for <n> and<n-iter> for your project. That should at least give you an idea ofthe maximum performance.


Will

P.S.---My compile options were:

gcc: -O3

csc: -block -optimize-level 3 -debug-level 0 -lambda-lift -disable-interrupts

On my system, csc invokes gcc with (among other path-setting options)-Os.

P.P.S.---I'm surprised at how bad the pure-scheme code is for this.I'd love to hear from some of the other Chicken experts out therethat I've made a mistake somewhere....

test-fast.scm
Description: Binary data

test.c
Description: Binary data

test.scm
Description: Binary data

[Prev in Thread]

Current Thread

[Next in Thread]

[Chicken-users] Optimizing inner loops, Carlos Pita, 2006/08/29
- Re: [Chicken-users] Optimizing inner loops, Will M Farr, 2006/08/29
  - Re: [Chicken-users] Optimizing inner loops, Carlos Pita, 2006/08/29
- Re: [Chicken-users] Optimizing inner loops, Will M Farr <=

Prev by Date: Re: [Chicken-users] Optimizing inner loops
Next by Date: Re: [Chicken-users] wiki spam
Previous by thread: Re: [Chicken-users] Optimizing inner loops
Next by thread: [Chicken-users] Sudokus
Index(es):
- Date
- Thread