[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Help-gsl] Usage of GSL random in parallel code [OpenMP]

From: Matthias Sitte
Subject: Re: [Help-gsl] Usage of GSL random in parallel code [OpenMP]
Date: Mon, 05 May 2014 11:17:49 -0500
User-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:24.0) Gecko/20100101 Thunderbird/24.4.0

Hi everybody,

a couple of notes from my experience with random numbers and parallel codes:

1) You create a lot of GSL RNGs. Why? You only need to create and initialize as many RNGS as you have client threads, and then you re-use them (without re-initialization) for all ensembles.

With N CPUs per node and M threads per CPU (don't use hyperthreading) this totals N*M threads. Each thread itself runs a loop over a subset of DIM_ENSEMBLE/(N*M) ensembles, and that's it.

2) In your implementation, the master thread is allocating all of the GSL RNGs, but that's a 'NUMA sin', because they will be allocated in the memory close to the master thread. The other threads, however, do not have to be as close to that memory as the master thread.

Example: The master thread (#0) is on CPU #1 with RAM banks 1-4, but thread #1 is on CPU #2 with RAM banks 5-8. Then accessing the GSL RNG always entails using some sort of "memory hyperlink" or whatever they call it. That's slowing down the whole RNG. If you'd use MPI, then thread #1 could sit on a entirely different machine, maybe connected by an Infiniband link which is even slower.

So, bottom line: Each thread should allocate and initialize the GSL RNG on its own. In your code, that would mean the allocation must take place within the #omp parallel loop, not outside.

3) You have to properly initialize the GSL RNGs. There are different ways to do that. I like using microseconds obtained from time functions. Beware that if you initialize using only the thread number, then subsequent calls of the same program will always generate the same result. If you'd use some time function, there would be some "random" element to it, so you're (much) less likely to generate the same result running the program twice.

Furthermore, if you use MPI you could also let the master thread determine a "master seed", e.g. through environment variable), and then pass it on to the clients through MPI_Bcast, and each client could use a seed based on the master seed.

Hope that helps.

  Matthias Sitte

On 5/4/14, 8:14 AM, Klaus Huthmacher wrote:
Dear Altro,

I would simply use a wrapper around a C++ std::vector who holds as many
C++11 random number generators (engines) as you use threads in your code.
By calling a random number, the engine in the ith index of the vector is
used, where i is the thread id.

Please see the appended minimal example mb.cpp with the wrapper

Best wishes and keep us informed,
-- Klaus.

Dear All

I wish to use effectively use the gsl random number generator jointly
with OpenMP on a HPC cluster.

The simulation I have to do are quite simple: just let evolve an
"ensemble" of system with the same dynamics. Parallelize such a code
with OpenMP "seems" straightforward.
As the GSL does not support parallel processing, I suppose that we must
use a different random number generator for each element of the ensemble;
Indeed I don't know if this is a good procedure.

The ensemble could be very big and it would be simply impossible to
allocate memory for all generators associated with the Systems of the
Then my question is:
Is this still a correct way to use the GSL random generator on OpenMP?

An idea of how the allocation could be done is given in the code below.
Does it make any sense? If not, which is the correct way?

Best wishes,


#include <stdlib.h>
#include <stdio.h>
#include <gsl/gsl_rng.h>
#include <gsl/gsl_randist.h>
#include <omp.h>

int main (){
      int i;

      int DIM_ENSEMBLE = 10000000000;

      double *d;
      gsl_rng **r ;

      d = malloc(DIM_ENSEMBLE*sizeof(double));
      r = malloc(DIM_ENSEMBLE*sizeof(gsl_rng*))
;                                            ;
      for (i=0;i<DIM_ENSEMBLE;i++){
          r[i] = gsl_rng_alloc (gsl_rng_mt19937);
//        r[i] = gsl_rng_alloc (gsl_rng_taus);

      size_t n = gsl_rng_size (r[0]);  //size in byte (the size of
tausworthe is 24 ; the size of mersenne twister is 5000)

#pragma omp  parallel for default(none) shared(r,d) shared(DIM_ENSEMBLE)
      for  (i =0; i< DIM_ENSEMBLE; i++) {
          d[i] = gsl_rng_uniform(r[i]);

#pragma omp  parallel for default(none) shared(d) shared(DIM_ENSEMBLE)
      for (i=0;i<DIM_ENSEMBLE;i++){
          printf("%d %f\n",i,d[i]);

      printf("The size in Mb of the vector r is %lu


     for (i=0;i<DIM_ENSEMBLE;i++){
          gsl_rng_free (r[i]);

return 0;

reply via email to

[Prev in Thread] Current Thread [Next in Thread]