|
From: | Rik |
Subject: | Re: performance improvements |
Date: | Wed, 4 Sep 2019 21:03:35 -0700 |
On 09/04/2019 03:19 PM, John W. Eaton wrote: > On 9/3/19 8:14 PM, Rik wrote: >> On 09/03/2019 03:39 PM, John W. Eaton wrote: > >> Finally, I suspect you're right about the unwind_protect blocks. When I >> ran with perf there were indications in the stack that unwind_protect was >> consuming some time, but it was hard to figure out exactly how it was >> happening. >> >>> So in the vast majority of cases, we only have one action to perform, so >>> creating a stack and using virtual function dispatch doesn't make much >>> sense. We could speed this up a lot just by defining a few special case >>> classes that save and restore single values or call a single function >>> (and with lambda capture, we just need a single simple interface). I'll >>> take a look at doing that soon. >>> >> >> I'll re-run under perf to see what the new code hotspots are. By knocking >> down the overall running time we will get a magnification effect for the >> remaining issues. > > I pushed some more changes in an attempt to improve the performance of single unwind-protect actions: > > http://hg.savannah.gnu.org/hgweb/octave/rev/2f8428b61bd6 > http://hg.savannah.gnu.org/hgweb/octave/rev/25627c524ad8 > http://hg.savannah.gnu.org/hgweb/octave/rev/d171d356767b > > Now I see the following timings on my system: > > bm_assign bm_toeplitz > > 3.2.4: 0.19735 11.278 > 3.4.3: 0.17148 9.2246 > 3.6.4: 0.21437 9.5798 > 3.8.2: 0.25326 15.447 > 4.0.3: 0.51134 27.710 > 4.2.2: 0.53206 26.772 > 4.4.1: 1.5782 31.758 > 5.1.0: 1.7119 36.593 > before: 1.9624 30.399 > now: 0.62128 18.524 > > The "before" line is changeset 64289bf338da: > > user: John W. Eaton <address@hidden> > date: Wed Aug 14 00:19:34 2019 -0400 > summary: use separate variable for interrupting command editor event loop (bug #56738) > > I think that is prior to any changes related to this issue. > > Could you try another profiling run with these changes? The convert_to_const_vector function uses unwind_protect, so the latest changes might have made a difference. > bm_assign bm_toeplitz cset fcaecdbc8d8a 0.51 7.6 now 0.38 7.4 3.4.3 0.15 4.5 The cset mentioned is the one that removed the visitor pattern. Overall, for a a general script like bm_toeplitz there was little change (-2.6%). For the more narrow bm_assign the relative improvement was good (-25%) but the absolute magnitude (100 milliseconds) is pretty small. The elimination of the visitor pattern led to -42% performance improvements for comparison. I will run code profiling on bm_assign and post the results to bug #56752. --Rik |
[Prev in Thread] | Current Thread | [Next in Thread] |