[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Octave-bug-tracker] [bug #31709] simple routine which runs 20 times slo
From: |
Rik |
Subject: |
[Octave-bug-tracker] [bug #31709] simple routine which runs 20 times slower on octave vs. matlab! |
Date: |
Sun, 21 Nov 2010 22:12:40 +0000 |
User-agent: |
Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.12) Gecko/20101027 Ubuntu/9.10 (karmic) Firefox/3.6.12 |
Follow-up Comment #7, bug #31709 (project octave):
>>I am afraid you did not read my submission
Please do not tell me what I did or did not do. I know exactly what I did.
Your code has some typical issues which hinder speed which I identified. Only
after those are corrected, and the speed does not improve, can one can start
looking at whether the Octave core needs to be changed.
>>iii) pre-allocation will not help here either.
This is a scientific/engineering question rather than one of opinion. Don't
tell me that pre-allocation won't help, just demonstrate that it doesn't with
some code. I will happily accept that sort of evidence.
Profiling isn't all that hard, it just involves putting tic/toc combinations
around various blocks and narrowing things down. On my machine I find the
following times for the slow calculation:
Outer While Loop : 14.55 seconds
Inner While Loop : 1.093
So, somewhat surprisingly, it is not the tightest inner loop. Keep
profiling.
le, lens block : 0.9601
cy assignments : 12.311
Okay, now the hot spot is clear. 85% of the execution time is being spent in
just two cy assignments. For optimization purposes I will time both of them.
cy assignment #1 : 6.600
cy assignment #2 : 5.978
cy assignment #2 is
cy(i,N^3+1)=le;
why not keep cy square (N^3,N^3) and use a single row vector to keep track of
le? I added/changed
cy=sparse(N^3,N^3);
cyle = sparse (N^3,1);
and assignment #2 is now
cyle(i,1) = le;
cy assignment #2 w/cyle : 1.384
>From ~6 seconds to 1.4 seconds or a little over 400% speedup. But I notice
that cyle always uses 100% of it's entries. Why not pre-allocate the storage
for the sparse array since it will be used? Second change:
cyle = spalloc (N^3, 1, N^3);
cy assignment #2 w/cyle & spalloc : 0.3957
cy assignment #2 w/cyle & full matrix : 0.3461
This speeds things up another 3X. Since 100% of the cyle entries were filled
I tried using a regular full matrix rather than a sparse matrix. As you can
see, this change only nets about 10% improvement.
With the improvements above as a guide I tried allocating storage for cy as
well.
cy=spalloc(N^3,N^3,N^3);
cy assignment #1 w/spalloc : 3.1131
This halves the time, but that still looks a little long. Why not try
reversing the answer space and using column vectors instead of row vectors?
The change in calculation can easily be undone by transposing the matrix cy at
the end of the run.
cy(1:le,i)=v';
cy assignment #1 w/spalloc & reversed row/cols : 0.82021
This is an 8x speed-up overall. I agree that Octave really should treat
sparse column and row vectors equivalently and will file an optimization bug
about this on the bug tracker. Nevertheless, you can easily code around it.
Finally, I went back and took a look at the original inner while loop. With
all of the changes above incorporated the original benchmarking time has
changed. This is not unusual. Optimization involves repeatedly finding the
hotspot in the code, dealing with it, which in turn reveals another hotspot.
Inner While Loop : 0.7633
The while loop test condition checks the variable flag every single time
which is unnecessary. If you want a loop body to always execute once, use a
do/until loop. If you're programming in Matlab, which doesn't offer this
construct, then you can extract the body of the loop and execute it once
before starting the while loop. Changed code:
v=i;
first = pp(i);
pp(i) = 0;
while (first ~=i)
Inner While Loop w/o flag : 0.37611
Outer While Loop after optimization : 1.79 seconds
Overall improvement was 14.55/1.79 or 8.1X. Your 40 second code should take
about 5 seconds with these changes. It also yielded the same answer for cy
compared to the original code.
_______________________________________________________
Reply to this item at:
<http://savannah.gnu.org/bugs/?31709>
_______________________________________________
Message sent via/by Savannah
http://savannah.gnu.org/
- [Octave-bug-tracker] [bug #31709] simple routine which runs 20 times slower on octave vs. matlab!, kody law, 2010/11/19
- [Octave-bug-tracker] [bug #31709] simple routine which runs 20 times slower on octave vs. matlab!, Rik, 2010/11/20
- [Octave-bug-tracker] [bug #31709] simple routine which runs 20 times slower on octave vs. matlab!, kody law, 2010/11/20
- [Octave-bug-tracker] [bug #31709] simple routine which runs 20 times slower on octave vs. matlab!, Søren Hauberg, 2010/11/20
- [Octave-bug-tracker] [bug #31709] simple routine which runs 20 times slower on octave vs. matlab!, kody law, 2010/11/20
- [Octave-bug-tracker] [bug #31709] simple routine which runs 20 times slower on octave vs. matlab!, Søren Hauberg, 2010/11/21
- [Octave-bug-tracker] [bug #31709] simple routine which runs 20 times slower on octave vs. matlab!, kody law, 2010/11/21
- [Octave-bug-tracker] [bug #31709] simple routine which runs 20 times slower on octave vs. matlab!,
Rik <=
- [Octave-bug-tracker] [bug #31709] simple routine which runs 20 times slower on octave vs. matlab!, kody law, 2010/11/21