I finally ran the full test case again, and I will concentrate on a piece of data:
The constructor IntCell::IntCell(long) alone is responsible for 19.65% CPU time. What's most revealing is that it's being called 6210421276 times. For those who don't like counting numbers, that's over 6 billion times.
We also have the pair IntCell::get_cell_type() and IntCell::get_int_value() being called the exact same number of times.
My takeaway from that is that we have an excessive number of array duplications, creating arrays that are just traversed a single time, and then being dropped.
Anyone interested in the details concerning my original experimentation with this can search back in the mailing list archives, but here's what happens:
Let's take a simple _expression_:
foo ← 1+2×⍳N
This performs the following operations internally:
- Allocate array of size N and fill it with the sequence 1, 2, 3, …
- Create a new array of size N and copy the values from the previous point
- Multiply all values by 2
- Create a new array of size N and copy the values from the previous point
- Add 1 to all values
- Create a new array of size N and copy…
- Store this array in variable foo
My original work attempted to mark arrays as "mutable". For example, the value "⍳N" would be mutable, since they are safe to modify in-place, while the result of "foo" would not be, since they should not be changed in-place. Thus, 1+⍳N should be able to create an array and update it, while 1+foo should copy the array before the addition. The implementation was similar in concept to your vanilla copy-on-write algorithm.
My solution worked for simple cases, but had some very serious bugs that caused arrays to be changed that shouldn't be changed, which is why it was never integrated. However, for the cases where it worked, the performance improvement was massive.
I have no current plans to resume this, but it would be interesting to discuss it if someone else wants to play around with it.
Regards,
Elias