octave-maintainers
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Thread-safe reference counting (modified dim-vector.h)


From: Jaroslav Hajek
Subject: Re: Thread-safe reference counting (modified dim-vector.h)
Date: Wed, 26 May 2010 11:37:03 +0200

On Wed, May 26, 2010 at 10:39 AM, Jarno Rajahalme
<address@hidden> wrote:
> I have included a modified dim-vector.h that does what was proposed below. 
> However, it seems other changes are needed to avoid subtle race conditions. I 
> have tried to address these issues, but more testing may be needed. See the 
> attached .diff file.
>
>
>
>
>
> The attached dim-vector-atomic-gnu.h uses GNU builtins if ATOMIC_GNU is 
> defined, OpenMP atomic if ATOMIC_OPENMP is defined, and "no atomic" if 
> neither is defined.
>
> Without atomic constructs the test code below fails on either malloc error, 
> or on the assert() just before the delete in the dim-vector-atomic-gnu.h.  
> With either ATOMIC_GNU or ATOMIC_OPENMP the test completes without problems. 
> GNU builtins are a bit faster, it seems, but either increases the execution 
> (CPU) time about 15x.
>
> This can be tested with this simple c++ code:
>
> main.cc:
>
> #include <octave/config.h>
> #include <omp.h>
> #include "dim-vector-atomic-gnu.h"
>
> const dim_vector& dims(void); // in a different compilation unit
>
> int main (void)
> {
>  int result;
>
> #pragma omp parallel for reduction (+: result)
>  for (int i=0; i < 10000000; i++) {
>    const dim_vector dv = dims(); // clone
> //    const dim_vector& dv = dims(); // reference
>    result += dv(1);
>  }
>  return result;
> }
>
>
> sub.cc:
>
> #include <octave/config.h>
> #include "dim-vector-atomic-gnu.h"
>
> const dim_vector& dims(void)
> {
>  static const dim_vector dv(1,1);
>  return dv;
> }
>
> Compile with:
>
> g++ -fopenmp -DATOMIC_GNU -I../include/octave-3.3.51+ -O3 main.cc sub.cc 
> -lgomp -o dimtest
>
> - you may need to modify the octave include directory
> - I tested with GCC 4.5
> - try with and without -DATOMIC_GNU
> - try with either // clone or // reference line
>  - in my testing, without atomic, and when cloning, the code will reliably 
> fail
>  - taking a reference, it always works, and the atomic code (even if used) is 
> not executed within the loop.
> - use "time ./dimtest"
>  - There is about 24x overhead for creating the local object and atomically 
> fiddling with the reference counter.
>
> dims (), when cloned, is same as the octave_base_scalar::dims (), so the 
> above can happen in real octave code as well. In Octave the corresponding 
> function is virtual, so it cannot be inlined, so there is no chance for the 
> compiler to optimize the reference counting away.
>
> However, by taking a reference, the reference counting does not kick in even 
> if dims() is virtual and the code above becomes a lot faster.

But, by returning a reference, you require the referenced object to be
permanently stored somewhere. I think we absolutely do not want to
make such a requirement for octave_value::dims, objects should be
allowed to compute their dimensions on demand. Hence, making
octave_base_value::dims return a const reference is out of question.
If the need arises, we'll probably create an extra method for that.

Interestingly enough, using the original dim_vector.h and stripping
away the openmp stuff, the reference counting is so fast that there is
no visible difference:

address@hidden:~/devel/octave> g++ -I main/liboctave/ -I main/ -I
main/libcruft/misc/ -O3 main.cc sub.cc -o dimtest
address@hidden:~/devel/octave> time ./dimtest

real    0m0.145s
user    0m0.141s
sys     0m0.001s
address@hidden:~/devel/octave> time ./dimtest

real    0m0.149s
user    0m0.140s
sys     0m0.001s
address@hidden:~/devel/octave> g++ -D USE_CLONE -I main/liboctave/ -I
main/ -I main/libcruft/misc/ -O3 main.cc sub.cc -o dimtest
address@hidden:~/devel/octave> time ./dimtest

real    0m0.143s
user    0m0.142s
sys     0m0.000s
address@hidden:~/devel/octave> time ./dimtest

real    0m0.143s
user    0m0.141s
sys     0m0.001s

int main (void)
{
 int result;

 for (int i=0; i < 50000000; i++) {
#ifdef USE_CLONE
   const dim_vector dv = dims(); // clone
#else
   const dim_vector& dv = dims(); // reference
#endif
   result += dv(1);
 }
 return result;
}

I checked the assembler output and the reference counting is not
eliminated, it's just fast. :-O Another interesting point is that if
you make dims() return a dim_vector, both versions generate exactly
the same assembler output, suggesting that using a const reference
brings no advantage if a value is returned.

I'd say that the results of the test are not so bad. I expected worse.


-- 
RNDr. Jaroslav Hajek, PhD
computing expert & GNU Octave developer
Aeronautical Research and Test Institute (VZLU)
Prague, Czech Republic
url: www.highegg.matfyz.cz



reply via email to

[Prev in Thread] Current Thread [Next in Thread]