octave-maintainers
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Changesets: Re: Parallel access to data created with Octave


From: Jaroslav Hajek
Subject: Re: Changesets: Re: Parallel access to data created with Octave
Date: Tue, 25 May 2010 09:43:19 +0200

On Tue, May 25, 2010 at 6:16 AM, Jarno Rajahalme
<address@hidden> wrote:
>
> On May 3, 2010, at 10:33 PM, ext Jaroslav Hajek wrote:
>
>> On Mon, May 3, 2010 at 6:53 PM, Jarno Rajahalme <address@hidden> wrote:
>>
>>> For the case where I test for scalar type, it would suffice if numel() on a 
>>> scalar would not depend on a static initialization of a "static dim_vector 
>>> dv(1,1); return dv;" within a function dims(). If octave_base_scalar 
>>> implemented a numel() always returning 1, I would not have to care whether 
>>> a specific cell is a scalar or not. And it would be faster, too :-).
>>>
>>
>> Feel free to do this, but it's just the tip of an iceberg. You simply
>> shouldn't depend on internals like this.
>>
>
>
> I did this (see the attached 2 changesets):
>
>
>
>
> (Please apply both sets, there is minor back and forth between them).
>
> All tests pass (except for the svds, which has one, non-related, fail either 
> way).
>
> liboctave:
>
> - Followed the Array.h practice of returning const ref from dims() (Sparse.h)

I'm not sure this is a good idea. It is a bit redundant to store a
dim_vector in Sparse, because it can only have 2 dimensions, and we
may want to optimize the storage in the future.


> - Added ov_ndims(), const methods declared as such (dim_vector.h)

What for? I don't understand what it does. The number of dimensions
are queried by dim_vector::length. Trailing singletons should be
removed on construction where suitable (for instance, Array always
does it). dim_vector::length is always >= 2.
Maybe you got confused by the existing octave_base_value::ndims
fallback. It makes no sense in current sources, I cleaned it up.

> - Changed idx_vector::orig_rows() and idx_vector::orig_columns() to call new 
> internal rep virtual functions (idx-vector.h)

Hmm, no problem, but these methods aren't used anywhere anyway. Maybe
it's better to just delete them.

> - Changed dim_vector local variable to const ref when applicable (Array.cc, 
> CNDArray.cc, CSparse.cc, Sparse.cc, dNDArray.cc, dSparse.cc, fCNDArray.cc, 
> fNDArray.cc, idx-vector.cc, lo-specfun.cc, mx-inlines.cc)

This is something I don't really like much. The practice of
return-by-value using reference-counted classes is ubiquitous
throughout octave sources, and I'm used to it a lot. I don't think I
want to relinquish this convenient coding style because of partial
thread safety. This will also make things harder to modify if the
dim_vector is constructed rather than stored.

>
> src:
>
> - Made rows() and columns() virtual, changed default implementations of 
> ndims, rows, columns, numel, and capacity to be thread-safe, subclasses need 
> to override. (ov-base.h)

I have no problem with overriding numel () and capacity () (though
that is very seldom used) in subclasses, likewise I'm OK with making
rows() and columns() virtual (although I think there's little need,
especially if we simplify the existing implementations), but I insist
that the base methods should implement the fallback computations.
Requiring all derived classes to override five methods instead of just
one is a bad idea.

> - Added thread-safe (virtual) ndims(), numel(), capacity(), rows() and 
> columns() (ov-base-diag.h, ov-base-mat.h, ov-base-scalar.h, ov-base-sparse.h, 
> ov-class.h, ov-cs-list.h, ov-fcn-handle.h, ov-lazy-idx.h, ov-perm.h, 
> ov-range.h, ov-struct.h)

See the comment above. I think it makes sense to override, say,
ndims() for scalars for performance reasons, but I think that
explicitly implementing octave_fcn_handle::ndims is a plain nonsense.

> - Followed the Array.h practice of returning const ref from dims() (oct-map.h)

OK

> - Changed dim_vector local variable to const ref when available (bitfcns.cc, 
> data.cc, oct-map.cc, ov-base-diag.cc, ov-base-int.cc, ov-base-mat.cc, 
> ov-base-sparse.cc, ov-bool-mat.cc, ov-bool-sparse.cc, ov-cell.cc, 
> ov-cx-mat.cc, ov-cx-sparse.cc, ov-flt-cx-mat.cc, ov-flt-re-mat.cc, ov-intx.h, 
> ov-perm.cc, ov-re-mat.cc, ov-re-sparse.cc, ov-str-mat.cc, ov-struct.cc, 
> pr-output.cc, pt-mat.cc, strfns.cc, xpow.cc)

See above. The return-by-value style is idiomatic and convenient
(because it doesn't matter whether the function returns a value or a
reference), even if we make this change now, there's no guarantee
people will keep using this style. I probably won't.

> - Changed dim_vector local variable to const when possible (data.cc, 
> gl-render.cc, graphics.cc, ls-mat5.cc, mex.cc, ov-base.cc, ov.cc, pt-mat.cc)

Yes, this is always a good idea.

> - Changed to avoid calling dims() or numel() via the virtual members 
> (ov-base-diag.cc, ov-base-mat.cc, ov-base-sparse.cc, ov-bool-mat.cc, 
> ov-bool-mat.h, ov-bool-sparse.cc, ov-cell.cc, ov-cx-mat.cc, ov-cx-sparse.cc, 
> ov-flt-cx-mat.cc, ov-flt-re-mat.cc, ov-intx.h, ov-perm.cc, ov-range.cc, 
> ov-re-mat.cc, ov-re-sparse.cc, ov-str-mat.cc, ov-struct.cc)
>

Except in supposedly performance-critical places, I'd rather avoid
doing this. Pieces of code are copied often between subclass
implementations, and this change makes it harder.

>
> With these changes the octave_(base)_value functions ndims, rows, columns, 
> numel and capacity are now thread safe. My own OpenMP code used to crash 
> calling numel, due to the unprotected reference counting, but does not crash 
> any more, as these now do not need to touch any reference counters.

Although parts of this work are useful per se, it feels more like a
workaround than a solution. What you really need is a thread-safe
reference-counting.

>
> These, and dependent functions are now also marginally faster. However, when 
> timing these I noticed some variation due to the changed branch prediction 
> behavior, are the patterns of calling virtual functions have changed. From 
> this viewpoint it would be nice if all numeric data types had the real 
> dim_vector, so that they could share the same implementation for all these 
> functions. This would require a common base class for numeric types. Also, 
> the I had some earlier crashes related to the initialization of static 
> members in threaded code, I believe, so this would take some more testing.
>

I think most of the speed-ups can be achieved by much simpler changes,
mostly cleaning up old code. See below.

> dims() itself cannot be made thread safe without protecting the reference 
> counting, as for some types the dim_vector is created dynamically.

Yes. The point is that dim_vector is just one class that is reference
counted, there are many more - idx_vector, Array, Sparse, SparseQR,
octave_shlib, octave_mutex, octave_value, graphics properties ...

> There are minor efficiency increases all over the place, due to not creating 
> a local instance of dim_vector, where it is not needed.

The difference between const dim_vector and const dim_vector& is just
the counter increment/decrement, and I doubt that is visible in most
places.

> Anyway, in the following you will see that the time it takes for cellfun () 
> to complete depends on the order of the data.


I checked in some simple changes (10651:10654) and tested. What I
don't understand about your benchmark code is why you use mean() which
has a nontrivial overhead, so I hacked together my own benchmark:

N = 1e6;

function ticcellfun (rep, c, fun, type)
  tic;
  for i = 1:rep
    cellfun (fun, c);
  endfor
  t = toc;
  printf ("%s %s: %e\n", type, fun, t / rep);
endfunction

function ticall (c, type)
  ticcellfun (20, c, "numel", type);
  ticcellfun (20, c, "ndims", type);
  ticcellfun (20, c, "length", type);
  ticcellfun (20, c, "isempty", type);
endfunction


c = num2cell (rand (N, 1));
ticall (c, "all scalars");

c = num2cell (rand (N, 5), 2);
ticall (c, "all arrays");

c(1:N/2) = num2cell (rand (N/2, 1));
c(N/2+1:N) = num2cell (rand (N/2, 5), 2);
ticall (c, "scalars/arrays");

c(1:2:N) = num2cell (rand (N/2, 1));
c(2:2:N) = num2cell (rand (N/2, 5), 2);
ticall (c, "mixed scalars/arrays");


with previous tip, I get:

all scalars numel: 1.651729e-02
all scalars ndims: 1.707314e-02
all scalars length: 7.021310e-02
all scalars isempty: 1.209819e-02
all arrays numel: 1.612484e-02
all arrays ndims: 1.942660e-02
all arrays length: 6.965815e-02
all arrays isempty: 1.190956e-02
scalars/arrays numel: 1.640530e-02
scalars/arrays ndims: 1.824336e-02
scalars/arrays length: 6.993690e-02
scalars/arrays isempty: 1.205795e-02
mixed scalars/arrays numel: 2.249750e-02
mixed scalars/arrays ndims: 2.329570e-02
mixed scalars/arrays length: 9.565995e-02
mixed scalars/arrays isempty: 1.770819e-02

with the new changes, I get:

all scalars numel: 1.062530e-02
all scalars ndims: 1.069200e-02
all scalars length: 1.968530e-02
all scalars isempty: 6.249142e-03
all arrays numel: 1.658055e-02
all arrays ndims: 1.633325e-02
all arrays length: 2.177579e-02
all arrays isempty: 1.202180e-02
scalars/arrays numel: 1.374600e-02
scalars/arrays ndims: 1.344680e-02
scalars/arrays length: 2.040290e-02
scalars/arrays isempty: 9.098101e-03
mixed scalars/arrays numel: 2.409670e-02
mixed scalars/arrays ndims: 2.170360e-02
mixed scalars/arrays length: 4.485935e-02
mixed scalars/arrays isempty: 1.441931e-02

so especially length() has been improved. But of course, compared to
the "normal" speed of cellfun with a function handle, all cases were
blazingly fast even before. Cellfun is a special example; in most
other cases calls to dims() et al. are parts of larger code bodies and
hence their time is drowned in the total time.

regards

-- 
RNDr. Jaroslav Hajek, PhD
computing expert & GNU Octave developer
Aeronautical Research and Test Institute (VZLU)
Prague, Czech Republic
url: www.highegg.matfyz.cz



reply via email to

[Prev in Thread] Current Thread [Next in Thread]