octave-maintainers
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: move constructors likely a requirement


From: John W. Eaton
Subject: Re: move constructors likely a requirement
Date: Tue, 3 Sep 2019 18:39:12 -0400
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.8.0

On 8/30/19 12:42 AM, Rik wrote:
On 08/29/2019 06:43 PM, John W. Eaton wrote:

I spent a lot of time over the last week looking at the interpreter. It's
a bunch of different things, but the biggest factor seems to be using the
tree walker pattern instead of just doing virtual dispatch to evaluation
methods in the parse tree objects themselves.  I think I can speed it up
considerably, possibly even get back much closer to the 3.4.3 performance
just by going back to something more like what we used to have for the
evaluator (but keeping evaluator state in an object instead of as global
data).  Another option is to implement a byte code interpreter instead of
directly evaluating the parse tree, but that is more work.  OTOH, it
might be instructive and helpful for JIT compiling.

This sounds exciting.  I would probably proceed conservatively, i.e.,
recover past performance by reverting to the previous code pattern, before
embarking on JIT which could be a very big project indeed.

I pushed the following changes:

  http://hg.savannah.gnu.org/hgweb/octave/rev/5bf76ab4cce3
  http://hg.savannah.gnu.org/hgweb/octave/rev/a2d3fa82b730
  http://hg.savannah.gnu.org/hgweb/octave/rev/fcaecdbc8d8a

The last changeset is the most significant and reverts the changes made for version 4.4 to use the visitor pattern for expression evaluation. As I note in the commit message, although it is desirable to have all parse tree evaluation functions grouped together in a single file, using the visitor pattern can be inefficient, especially when the visitor function is small and the extra levels of indirection and virtual function resolution can take more time than the evaluation function itself (evaluation of constants, for example). That's a big reason for the huge hit in performance when the loop contains only a simple assignment of a constant value.

I'm not proposing a JIT compiler as an immediate fix. My suggestion of a byte compiler was to walk the parse tree once and emit a list of simple instructions that could be evaluated without having to recursively walk the parse tree again. I suspect that would be more efficient, but it would also require a significant amount of work. And if we are going this route, maybe we should be looking for an existing virtual machine to use (JVM, even) instead of creating our own from scratch.

Another simpler thing that would provide immediate benefit would be to improve or replace the unwind_protect objects we use to save and restore the interpreter state. Currently, the unwind_protect class uses a stack to store actions to perform when the unwind_protect object goes out of scope. It also uses virtual functions to choose the specific type of action to perform. That's nice and generic, but in many cases, we just want to save and restore a single value and the overhead is huge. I added some counting to the unwind_protect class and found that when the actions are performed, we often have just a few objects to clean up or values to restore. Here are the stats I collected when running "make check":

  size   instances
    9           8
    7          15
    6         101
    5     1358693
    3      736350
    2       40139
    1    12775752
    0     8170259

So in the vast majority of cases, we only have one action to perform, so creating a stack and using virtual function dispatch doesn't make much sense. We could speed this up a lot just by defining a few special case classes that save and restore single values or call a single function (and with lambda capture, we just need a single simple interface). I'll take a look at doing that soon.

jwe



reply via email to

[Prev in Thread] Current Thread [Next in Thread]