Re: Slowing scatter down for the sake of Matlab compatibility

From:

Robert T. Short

Subject:

Date:

Sun, 04 Dec 2011 17:48:13 -0800

User-agent:

Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.1.4) Gecko/20091017 SeaMonkey/2.0

Ben Abbott wrote:

On Dec 4, 2011, at 7:55 PM, Jordi Gutiérrez Hermoso wrote:

2011/12/4 Ben Abbott <address@hidden>:

On Dec 4, 2011, at 6:25 PM, Ben Abbott wrote:

I tried the script below on ML R2011b.

for p = 0:5
tic
n = 10^p;
x = rand (n,1);
y = rand (n,1);
colours = [ ones(n,1) zeros(n,1) zeros(n,1) ];
colours(1,:) = [0 0 1];  % different color for first element
hg = scatter (x, y, 15, colours);
hp = findall (hg, 'type', 'patch');
fprintf ('numel(x) = %d, numel(hg) = %d, numel(hp) = %d\n', numel (x), numel (hg), numel (hp))
fprintf ('Time to render = %f seconds\n', toc)
end

[snip]

It appears that ML consistently creates one patch for each point.

Perhaps an option can be added to scatter() to skip ML
compatibility? There is already a "filled" option. Can a "collect"
option be added? Then the patches with consistent size, and color
may be collected into a single patch?

Sigh... It's not much use if it's not the default behaviour...

Using a modified test script

for p = 0:5
tic
n = 10^p;
x = rand (n,1);
y = rand (n,1);
hg = scatter (x, y, 15);
hp = findall (hg, 'type', 'patch');
fprintf ('numel(x) = %d, numel(hg) = %d, numel(hp) = %d\n', numel (x), numel (hg), numel (hp))
fprintf ('Time to render = %f seconds\n', toc)
end

numel(x) = 1, numel(hg) = 1, numel(hp) = 1
Time to render = 0.007314 seconds
numel(x) = 10, numel(hg) = 1, numel(hp) = 10
Time to render = 0.009084 seconds
numel(x) = 100, numel(hg) = 1, numel(hp) = 100
Time to render = 0.034167 seconds
numel(x) = 1000, numel(hg) = 1, numel(hp) = 1
Time to render = 0.017748 seconds
numel(x) = 10000, numel(hg) = 1, numel(hp) = 1
Time to render = 0.099743 seconds
numel(x) = 100000, numel(hg) = 1, numel(hp) = 1
Time to render = 0.964281 seconds

Okay, so is this the behaviour we must mimic? Create one patch object
per dot unless there is no colour parameter passed and there are more
than 100 dots?

Is there any way to speed this up? What's the big slowdown about
calling __go_patch__ object thousands of times? Does this produce a
redraw of the graphics every time? Is there a way to first record all
the objects and only draw at the end?

- Jordi G. H.

No. As far as I know, calling __go_patch__ does not force a redraw.

Unless there is an explicit call to drawnow() nothing will render until the end.

However, since the above ML behavior isn't documented I no longer have a problem modifying Octave's behavior. The only reason I see is to avoid complaints by those who are used to ML's undocumented features.

Ben



Ben

Here is the post I was referring to. This was Jan 11, 2010.

On Mon, Jan 11, 2010 at 8:17 AM, David Bateman <[hidden email]> wrote:

> Jaroslav Hajek wrote:
>>
>> hi,
>>
>> I recently started some work where I'm going to use scatter plots
>> heavily with a few thousand points.
>> I found out, however, that scatter plots were unusably slow with so
>> many points. This has also been reported before:
>>
>> http://old.nabble.com/scatter3-is-really-slow-to24312164.htm
>>
>> by this changeset
>> http://hg.savannah.gnu.org/hgweb/octave/rev/2f435ed48143
>>
>> I tried to optimize the plotting strategy to get more reasonable
>> times, especially in the simplest cases. The old strategy of creating
>> one object per point is only used for small numbers of points (<= 20).
>> Otherwise, the points are split into subsets with common color and
>> size, and these subsets are plotted using a single primitive. This
>> seems similar to what Matlab does.
>>
>
> If this is done in the baackend ok, but I deliberately didn't make this
> change in the frontend as it breaks matlab compatibility
>

... [show rest of quote]

What breaks Matlab compatibility? Please explain. Note that it is
*not* true that Matlab always creates one object per point (as you
seemed to imply in the above linked mail), at least in 2007a. It does
so only when there is a small number of points:

>> n = 10;
>> h = scatter (rand (n, 1), rand (n, 1))

h =

159.0022

>> get(h,'children')

ans =

169.0011
168.0011
167.0011
166.0011
165.0011
164.0011
163.0011
162.0011
161.0015
160.0012

>> n = 1000;
>> h = scatter (rand (n, 1), rand (n, 1))

h =

159.0028

>> get(h,'children')

ans =

160.0013

I bet the strategy Matlab uses is similar to ours. I didn't
investigate the switch point, but I don't really think compatibility
is needed at that level.

regards