[Getfem-commits] (no subject)

getfem-commits
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Getfem-commits] (no subject)

From:	Andriy Andreykiv
Subject:	[Getfem-commits] (no subject)
Date:	Fri, 21 Dec 2018 12:33:22 -0500 (EST)
branch: consistent_partitioning_for_open_mp
commit df435160faea754a2391af843466aa8839c0d319
Author: Andriy.Andreykiv <address@hidden>
Date:   Fri Dec 21 18:20:54 2018 +0100

    Besides doing some basic clean up in the Open MP related partitioning and 
global storage the main purpose
    is to make region and subsequently assembly partitioning independent from 
the number of parallel threads being used.
    For instance, if we choose to partition the whole calculation into 8 
partitions, but use only 2 parallel threads, then each thread
    will be looping through 4 independent partitions. This way, no matter how 
many threads we choose we partition the regions
    and assembly into the same number of partitions.
    
    Some other specific changes:
     - I had to introduce getfem::standard_locale besides gmm::standard_locale. 
getfem::standard_locale is triggered only in serial regions
     - list_distro is now accumulated_distro in a separate file. I can handle 
containers of assemblies or separate matrices and vectors
     - geometric_transformation now also carries it's name, similar to 
getfem::fem
     - some clean-up in singletons and global storage. In particular I hid the 
variable dal_static_stored_tab_valid__ into a macro, as it's
       not thread safe. I hope you also don't mind me introducing more C++11/14 
syntax
    - re-factoring of getfem_omp.h/cc: more re-use in omp_distribute. 
omp_distribute now is also able to accept a thread policy as an
      argument. The whole partitioning logic is encapsulated in 
partition_master. If you want to parallelize some code and take advantage
       from thread independent partitioning you'd need to use macro 
GETFEM_OMP_PARALLEL
    - some re-factoring for mesh_region: C++11/14 syntax and caching of indices 
in serial and parallel
---
 configure.ac                                       |   2 +-
 src/Makefile.am                                    |  29 +-
 src/bgeot_ftool.cc                                 |  13 +-
 src/bgeot_geometric_trans.cc                       |   9 +-
 src/bgeot_poly.cc                                  |  23 +-
 src/bgeot_torus.cc                                 |   9 +-
 src/dal_singleton.cc                               |  61 +-
 src/dal_static_stored_objects.cc                   | 508 +++++++----------
 src/getfem/bgeot_geometric_trans.h                 |   3 +
 src/getfem/bgeot_small_vector.h                    |   4 +-
 src/getfem/dal_naming_system.h                     |   9 +-
 src/getfem/dal_singleton.h                         | 159 +++---
 src/getfem/dal_static_stored_objects.h             |  41 +-
 src/getfem/getfem_accumulated_distro.h             | 220 ++++++++
 src/getfem/getfem_context.h                        |   2 +-
 src/getfem/getfem_interpolation.h                  |  18 +-
 src/getfem/getfem_locale.h                         |  58 ++
 src/getfem/getfem_mesh_region.h                    | 128 ++---
 src/getfem/getfem_omp.h                            | 626 ++++++++++++---------
 src/getfem_assembling_tensors.cc                   |   3 +-
 ...fem_generic_assembly_functions_and_operators.cc |  16 +-
 src/getfem_locale.cc                               |  58 ++
 src/getfem_mesh_region.cc                          | 502 ++++++++---------
 src/getfem_models.cc                               | 212 +------
 src/getfem_omp.cc                                  | 388 +++++++++----
 src/gmm/gmm_std.h                                  |  75 +--
 26 files changed, 1700 insertions(+), 1476 deletions(-)

diff --git a/configure.ac b/configure.ac
index b7b3a95..e521276 100644
--- a/configure.ac
+++ b/configure.ac
@@ -384,7 +384,7 @@ if test x$useopenmp = xYES; then
   AC_OPENMP
   if test "x$ac_cv_prog_cxx_openmp" != "xunsupported" && test 
"x$ac_cv_prog_cxx_openmp" != "x"; then
     AC_SUBST(AM_CXXFLAGS,"$OPENMP_CXXFLAGS")
-    CPPFLAGS="$CPPFLAGS -DGETFEM_HAVE_OPENMP"
+    CPPFLAGS="$CPPFLAGS -DGETFEM_HAS_OPENMP"
   else
     AC_MSG_ERROR([OpenMP support not found. Use --enable-openmp=no flag to 
compile GetFEM++ without OpenMP]);
   fi
diff --git a/src/Makefile.am b/src/Makefile.am
index 843649b..d7e1a61 100644
--- a/src/Makefile.am
+++ b/src/Makefile.am
@@ -15,11 +15,11 @@
 #  along  with  this program;  if not, write to the Free Software Foundation,
 #  Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
 
-nobase_include_HEADERS =                               \
+nobase_include_HEADERS =                \
        getfem_boost/workaround.hpp                     \
        getfem_boost/noncopyable.hpp                    \
        gmm/gmm.h                                       \
-       gmm/gmm_arch_config.h                           \
+       gmm/gmm_arch_config.h                           \
        gmm/gmm_matrix.h                                \
        gmm/gmm_iter_solvers.h                          \
        gmm/gmm_iter.h                                  \
@@ -42,7 +42,7 @@ nobase_include_HEADERS =                              \
        gmm/gmm_modified_gram_schmidt.h                 \
        gmm/gmm_dense_Householder.h                     \
        gmm/gmm_dense_lu.h                              \
-       gmm/gmm_dense_matrix_functions.h                \
+       gmm/gmm_dense_matrix_functions.h      \
        gmm/gmm_dense_qr.h                              \
        gmm/gmm_dense_sylvester.h                       \
        gmm/gmm_tri_solve.h                             \
@@ -67,7 +67,7 @@ nobase_include_HEADERS =                              \
        gmm/gmm_lapack_interface.h                      \
        gmm/gmm_condition_number.h                      \
        gmm/gmm_least_squares_cg.h                      \
-       gmm/gmm_range_basis.h                           \
+       gmm/gmm_range_basis.h                           \
        gmm/gmm_opt.h                                   \
        gmm/gmm_algobase.h                              \
        gmm/gmm_ref.h                                   \
@@ -75,13 +75,13 @@ nobase_include_HEADERS =                            \
        gmm/gmm_except.h                                \
        gmm/gmm_feedback_management.h                   \
        gmm/gmm_MUMPS_interface.h                       \
-       getfem/dal_config.h                             \
-       getfem/dal_singleton.h                          \
+       getfem/dal_config.h                                     \
+       getfem/dal_singleton.h                                \
        getfem/dal_basic.h                              \
        getfem/dal_bit_vector.h                         \
        getfem/dal_static_stored_objects.h              \
        getfem/dal_naming_system.h                      \
-       getfem/dal_backtrace.h                          \
+       getfem/dal_backtrace.h                          \
        getfem/dal_tas.h                                \
        getfem/dal_tree_sorted.h                        \
        getfem/bgeot_config.h                           \
@@ -92,20 +92,21 @@ nobase_include_HEADERS =                            \
        getfem/bgeot_poly.h                             \
        getfem/bgeot_geometric_trans.h                  \
        getfem/bgeot_geotrans_inv.h                     \
-       getfem/bgeot_kdtree.h                           \
+       getfem/bgeot_kdtree.h                                   \
        getfem/bgeot_mesh_structure.h                   \
        getfem/bgeot_mesh.h                             \
        getfem/bgeot_poly_composite.h                   \
-       getfem/bgeot_rtree.h                            \
-       getfem/bgeot_node_tab.h                         \
+       getfem/bgeot_rtree.h                                    \
+       getfem/bgeot_node_tab.h                                 \
        getfem/bgeot_small_vector.h                     \
        getfem/bgeot_sparse_tensors.h                   \
        getfem/bgeot_tensor.h                           \
-       getfem/bgeot_comma_init.h                       \
+       getfem/bgeot_comma_init.h                               \
        getfem/bgeot_torus.h                            \
        getfem/bgeot_ftool.h                            \
+    getfem/getfem_accumulated_distro.h      \
        getfem/getfem_arch_config.h                     \
-       getfem/getfem_copyable_ptr.h                    \
+       getfem/getfem_copyable_ptr.h                    \
        getfem/getfem_integration.h                     \
        getfem/getfem_assembling.h                      \
        getfem/getfem_assembling_tensors.h              \
@@ -150,6 +151,7 @@ nobase_include_HEADERS =                            \
        getfem/getfem_models.h                          \
        getfem/getfem_model_solvers.h                   \
        getfem/getfem_linearized_plates.h               \
+    getfem/getfem_locale.h                      \
        getfem/getfem_contact_and_friction_common.h     \
        getfem/getfem_contact_and_friction_large_sliding.h \
        getfem/getfem_contact_and_friction_nodal.h      \
@@ -206,7 +208,8 @@ SRC =                                               \
        getfem_fem_composite.cc                         \
        getfem_mat_elem.cc                              \
        getfem_mat_elem_type.cc                         \
-       getfem_level_set.cc                             \
+       getfem_level_set.cc                                 \
+    getfem_locale.cc                        \
        getfem_mesh_level_set.cc                        \
        getfem_mesh_im_level_set.cc                     \
        getfem_mesh_fem_level_set.cc                    \
diff --git a/src/bgeot_ftool.cc b/src/bgeot_ftool.cc
index 14b1729..e8c03b8 100644
--- a/src/bgeot_ftool.cc
+++ b/src/bgeot_ftool.cc
@@ -22,6 +22,7 @@
 
 #include "getfem/bgeot_config.h"
 #include "getfem/bgeot_ftool.h"
+#include "getfem/getfem_locale.h"
 #include <ctype.h>
 #include <limits.h>
 #ifndef _WIN32
@@ -404,14 +405,14 @@ namespace bgeot {
   }
 
   void md_param::read_param_file(std::istream &f) {
-    gmm::standard_locale sl;
+    getfem::standard_locale sl;
     token_is_valid = false; current_line = 1;
     if (read_instruction_list(f) > 1)
       syntax_error("Parameter file terminated by an else");
   }
 
   void md_param::read_command_line(int argc, char *argv[]) {
-    gmm::standard_locale sl;
+    getfem::standard_locale sl;
     for (int aa = 1; aa < argc; aa++) {
       if (argv[aa][0] != '-') {
         current_file = std::string(argv[aa]);
@@ -442,7 +443,7 @@ namespace bgeot {
         return default_val;
       else {
         double f;
-        gmm::standard_locale sl;
+        getfem::standard_locale sl;
         cout << "No parameter " << name << " found, please enter its value\n";
         cout << comment << " : "; cin >> f;
         parameters[name] = param_value(f);
@@ -461,7 +462,7 @@ namespace bgeot {
         return default_val;
       else {
         long f;
-        gmm::standard_locale sl;
+        getfem::standard_locale sl;
         cout << "No parameter " << name << " found, please enter its value\n";
         cout << comment << " : "; cin >> f;
         parameters[name] = param_value(double(f));
@@ -481,7 +482,7 @@ namespace bgeot {
         return default_val;
       else {
         std::string s;
-        gmm::standard_locale sl;
+        getfem::standard_locale sl;
         cout << "No parameter " << name << " found, please enter its value\n";
         cout << comment << " : "; cin >> s;
         parameters[name] = param_value(s);
@@ -501,7 +502,7 @@ namespace bgeot {
       if (comment == 0) return empty_array;
       else {
         std::string s;
-        gmm::standard_locale sl;
+        getfem::standard_locale sl;
         cout << "No parameter " << name << " found, please enter its value\n";
         cout << comment << " : "; cin >> s;
         parameters[name] = param_value(s);
diff --git a/src/bgeot_geometric_trans.cc b/src/bgeot_geometric_trans.cc
index 7d29aa1..1ea29e9 100644
--- a/src/bgeot_geometric_trans.cc
+++ b/src/bgeot_geometric_trans.cc
@@ -1159,8 +1159,13 @@ namespace bgeot {
   }
 
   pgeometric_trans geometric_trans_descriptor(std::string name) {
-    size_type i=0;
-    return 
dal::singleton<geometric_trans_naming_system>::instance().method(name, i);
+    size_type i = 0;
+    auto &name_system = 
dal::singleton<geometric_trans_naming_system>::instance();
+    auto ptrans = name_system.method(name, i);
+    auto &trans = const_cast<bgeot::geometric_trans&>(*ptrans);
+    auto short_name = name_system.shorter_name_of_method(ptrans);
+    trans.set_name(short_name);
+    return ptrans;
   }
 
   std::string name_of_geometric_trans(pgeometric_trans p) {
diff --git a/src/bgeot_poly.cc b/src/bgeot_poly.cc
index 545125e..6c9ef2b 100644
--- a/src/bgeot_poly.cc
+++ b/src/bgeot_poly.cc
@@ -24,6 +24,7 @@
 #include "getfem/bgeot_poly.h"
 #include "getfem/bgeot_small_vector.h"
 #include "getfem/bgeot_ftool.h"
+#include "getfem/getfem_locale.h"
 
 namespace bgeot {
 
@@ -66,7 +67,7 @@ namespace bgeot {
     }
     return *this;
   }
-  
+
   const power_index &power_index::operator --() {
     short_type n = short_type(size()), l;
     if (n > 0) {
@@ -78,17 +79,17 @@ namespace bgeot {
       if (l != short_type(-1)) {
         short_type a = (*this)[l];
         (*this)[l] = 0; (*this)[n-1] = short_type(a - 1);
-        if (l > 0) ((*this)[l-1])++; 
+        if (l > 0) ((*this)[l-1])++;
         else if (short_type(deg+1)) degree_ = short_type(deg-1);
       }
       if (g_idx+1) global_index_ = g_idx-1;
     }
     return *this;
   }
-  
+
   short_type power_index::degree() const {
     if (degree_ != short_type(-1)) return degree_;
-    degree_ = short_type(std::accumulate(begin(), end(), 0)); 
+    degree_ = short_type(std::accumulate(begin(), end(), 0));
     return degree_;
   }
 
@@ -101,7 +102,7 @@ namespace bgeot {
       { global_index_ += alpha_(n,short_type(d-1)); d=short_type(d-*it); --n; }
     return global_index_;
   }
- 
+
   power_index::power_index(short_type nn) : v(nn), degree_(0), global_index_(0)
   { std::fill(begin(), end(), short_type(0)); alpha_init_(); }
 
@@ -127,7 +128,7 @@ namespace bgeot {
 
   static base_poly read_expression(short_type n, std::istream &f) {
     gmm::stream_standard_locale sl(f);
-    gmm::standard_locale sll;
+    getfem::standard_locale sll;
     base_poly result(n,0);
     std::string s;
     int i = get_next_token(s, f), j;
@@ -183,13 +184,13 @@ namespace bgeot {
     base_poly &p2 = *(value_list.rbegin());
     if (op_list.back() != 6) {
       assert(value_list.size()>1);
-      base_poly &p1 = *(value_list.rbegin()+1);    
+      base_poly &p1 = *(value_list.rbegin()+1);
       switch (op_list.back()) {
         case 1  : p1 *= p2; break;
         case 2  : if (p2.degree() > 0) parse_error(6); p1 /= p2[0]; break;
         case 3  : p1 += p2; break;
         case 4  : p1 -= p2; break;
-        case 5  : 
+        case 5  :
           {
             if (p2.degree() > 0) parse_error(7);
             int pow = int(to_scalar(p2[0]));
@@ -200,7 +201,7 @@ namespace bgeot {
           break;
         default: assert(0);
       }
-      value_list.pop_back(); 
+      value_list.pop_back();
     } else {
       p2 *= opt_long_scalar_type(-1);
     }
@@ -228,11 +229,11 @@ namespace bgeot {
       value_list.push_back(read_expression(n, f));
       op_list.push_back(op);
       prior_list.push_back(prior);
-      
+
       i = get_next_token(s, f);
       operator_priority_(i, i ? s[0] : '0', prior, op);
     }
-    
+
     if (i == 5 && s[0] == ')') { f.putback(')'); }
     else if (i != 0 && (i != 5 || s[0] != ';')) {
       cout << "s = " << s << endl;
diff --git a/src/bgeot_torus.cc b/src/bgeot_torus.cc
index 056f501..524537b 100644
--- a/src/bgeot_torus.cc
+++ b/src/bgeot_torus.cc
@@ -43,7 +43,7 @@ namespace bgeot{
       base_node point2D = point;
       point2D.resize(2);
       return ori_ref_convex_->is_in_face(f, point2D);
-    }  
+    }
     torus_reference(bgeot::pconvex_ref ori_ref_convex) :
       convex_of_reference(
         torus_structure_descriptor(ori_ref_convex->structure()), 
ori_ref_convex->is_basic())
@@ -57,7 +57,7 @@ namespace bgeot{
       for(size_type n = 0; n < ori_normals.size(); ++n){
        normals_[n] = ori_normals[n];
        normals_[n].resize(3);
-      }            
+      }
 
       std::copy(ori_points.begin(), ori_points.end(), 
convex<base_node>::points().begin());
       for(size_type pt = 0; pt < convex<base_node>::points().size(); ++pt){
@@ -129,7 +129,7 @@ namespace bgeot{
     bgeot::base_vector &val) const{
       base_node pt_2d(pt);
       pt_2d.resize(2);
-      poriginal_trans_->poly_vector_val(pt_2d, ind_ct, val);     
+      poriginal_trans_->poly_vector_val(pt_2d, ind_ct, val);
   }
 
   void torus_geom_trans::poly_vector_grad(const base_node &pt, 
bgeot::base_matrix &pc) const{
@@ -178,12 +178,13 @@ namespace bgeot{
     poriginal_trans_->project_into_reference_convex(pt);
   }
 
-  torus_geom_trans::torus_geom_trans(pgeometric_trans poriginal_trans) 
+  torus_geom_trans::torus_geom_trans(pgeometric_trans poriginal_trans)
     : poriginal_trans_(poriginal_trans){
       geometric_trans::is_lin = poriginal_trans_->is_linear();
       geometric_trans::cvr = ptorus_reference(poriginal_trans_->convex_ref());
       complexity_ = poriginal_trans_->complexity();
       fill_standard_vertices();
+      name_ = poriginal_trans->debug_name();
   }
 
   pgeometric_trans torus_geom_trans::get_original_transformation() const{
diff --git a/src/dal_singleton.cc b/src/dal_singleton.cc
index 4d9cc96..9843a89 100644
--- a/src/dal_singleton.cc
+++ b/src/dal_singleton.cc
@@ -27,48 +27,41 @@
 
 namespace dal {
 
- 
-  singletons_manager::singletons_manager() : lst() {}
-  
-  singletons_manager& singletons_manager::m = manager();
+  singletons_manager::singletons_manager()
+    : lst{}, nb_partitions{lst.num_threads()}
+  {}
 
-  singletons_manager& singletons_manager::manager()
-  {
+  singletons_manager& singletons_manager::manager(){
     static singletons_manager x;
     return x;
   }
 
+  void singletons_manager::register_new_singleton(singleton_instance_base *p){
+    register_new_singleton(p, manager().lst.this_thread());
+  }
 
-       void singletons_manager::register_new_singleton(singleton_instance_base 
*p) 
-  {    
-    manager().lst.thrd_cast().push_back(p);
-       }
-
-       void singletons_manager::register_new_singleton(singleton_instance_base 
*p, size_t ithread) 
-  {    
-    manager().lst(ithread).push_back(p);
-       }
+  void singletons_manager::register_new_singleton(singleton_instance_base *p, 
size_t ithread){
+    if (p) manager().lst(ithread).push_back(p);
+  }
 
+  void singletons_manager::on_partitions_change(){
+    auto new_nb_partitions = manager().lst.num_threads();
+    auto &nb_partitions = manager().nb_partitions;
+    if (new_nb_partitions > nb_partitions){ //not allowing reducing nb. of 
partitions,
+      manager().lst.on_thread_update();     //as it will invalidate global 
storage
+      nb_partitions = new_nb_partitions;
+    }
+  }
 
-       static int level_compare(singleton_instance_base *a,
-               singleton_instance_base *b) 
-       {
-               return a->level() < b->level();
-       }
+  static int level_compare(singleton_instance_base *a, singleton_instance_base 
*b) {
+    return a->level() < b->level();
+  }
 
-  singletons_manager::~singletons_manager() { 
-    GMM_ASSERT1(!getfem::me_is_multithreaded_now(), 
-                "singletons_manager destructor should" 
-                "not be running in parallel !!");
-    //arrange distruction per thread
-    for(size_t i=0;i<getfem::num_threads();i++)
-    {
-      /* sort singletons in increasing levels,
-      lowest levels will be destroyed first */
-      std::sort(lst(i).begin(),lst(i).end(), level_compare);
-      std::vector<singleton_instance_base *>::const_iterator it  = 
lst(i).begin();
-      std::vector<singleton_instance_base *>::const_iterator ite = 
lst(i).end();                       
-      for ( ; it != ite; ++it) { delete *it; }
+  singletons_manager::~singletons_manager() {
+    for(size_type i = 0; i != nb_partitions; ++i){
+      std::sort(lst(i).begin(), lst(i).end(), level_compare);
+      for (auto &&p : lst(i)) delete p;
     }
   }
-}
+
+}/* end of namespace dal                                                       
      */
\ No newline at end of file
diff --git a/src/dal_static_stored_objects.cc b/src/dal_static_stored_objects.cc
index 5080355..38810f5 100644
--- a/src/dal_static_stored_objects.cc
+++ b/src/dal_static_stored_objects.cc
@@ -31,16 +31,20 @@
 
 namespace dal {
 
-  // 0 = only undestroyed, 1 =  Normal, 2 very noisy, 
+  // 0 = only undestroyed, 1 =  Normal, 2 very noisy,
 #define DAL_STORED_OBJECT_DEBUG_NOISY 2
 
-static bool dal_static_stored_tab_valid__ = true;
-  
 #if DAL_STORED_OBJECT_DEBUG
+
+  static bool dal_static_stored_tab_valid__ = true;
+
+  #define STORED_ASSERT(test, message) GMM_ASSERT1(test, message);
+  #define ON_STORED_DEBUG(expression) expression;
+
   static std::map <const static_stored_object *, std::string> _created_objects;
   static std::map <const static_stored_object *, std::string> _added_objects;
   static std::map <const static_stored_object *, std::string> _deleted_objects;
- 
+
 
   void stored_debug_created(const static_stored_object *o,
                            const std::string &name) {
@@ -126,125 +130,98 @@ static bool dal_static_stored_tab_valid__ = true;
 #   endif
        _deleted_objects.erase(it);
       }
-      
+
     }
   }
 
-
-
-
+#else
+  #define STORED_ASSERT(test, message)
+  #define ON_STORED_DEBUG(expression)
 #endif
 
-
-
-
   // Gives a pointer to a key of an object from its pointer, while looking in 
the storage of
   // a specific thread
-  pstatic_stored_object_key key_of_stored_object(pstatic_stored_object o, 
size_t thread) 
-  {
-    stored_object_tab::stored_key_tab& stored_keys 
-      = dal::singleton<stored_object_tab>::instance(thread).stored_keys_;
-    GMM_ASSERT1(dal_static_stored_tab_valid__, "Too late to do that");
-    stored_object_tab::stored_key_tab::iterator it = stored_keys.find(o);
+  pstatic_stored_object_key key_of_stored_object(pstatic_stored_object o, 
size_t thread){
+    auto& stored_keys = 
singleton<stored_object_tab>::instance(thread).stored_keys_;
+    STORED_ASSERT(dal_static_stored_tab_valid__, "Too late to do that");
+    auto it = stored_keys.find(o);
     if (it != stored_keys.end()) return it->second;
-    return 0;
+    return nullptr;
   }
 
   // gives a key of the stored object while looking in the storage of other 
threads
-  pstatic_stored_object_key 
key_of_stored_object_other_threads(pstatic_stored_object o) 
-  {
-    for(size_t thread = 0; thread<getfem::num_threads();thread++)
+  pstatic_stored_object_key 
key_of_stored_object_other_threads(pstatic_stored_object o){
+    for(size_t thread = 0; thread != 
singleton<stored_object_tab>::num_threads(); ++thread)
     {
-      if (thread == this_thread()) continue;
-      pstatic_stored_object_key key = key_of_stored_object(o,thread);
+      if (thread == singleton<stored_object_tab>::this_thread()) continue;
+      auto key = key_of_stored_object(o,thread);
       if (key) return key;
     }
-    return 0;
+    return nullptr;
   }
 
-  pstatic_stored_object_key key_of_stored_object(pstatic_stored_object o) 
-  {
-    pstatic_stored_object_key key = key_of_stored_object(o,this_thread());
+  pstatic_stored_object_key key_of_stored_object(pstatic_stored_object o){
+    auto key = key_of_stored_object(o, 
singleton<stored_object_tab>::this_thread());
     if (key) return key;
-    else return (num_threads() > 1) ? key_of_stored_object_other_threads(o) : 
0;
-    return 0;
+    else return singleton<stored_object_tab>::num_threads() > 1 ?
+                          key_of_stored_object_other_threads(o) : nullptr;
+    return nullptr;
   }
 
-  bool exists_stored_object(pstatic_stored_object o) 
-  {
-    stored_object_tab::stored_key_tab& stored_keys 
-      = dal::singleton<stored_object_tab>::instance().stored_keys_;
-    if (dal_static_stored_tab_valid__) {
-      return  (stored_keys.find(o) != stored_keys.end());
-    }
-    return false;
+  bool exists_stored_object(pstatic_stored_object o){
+    auto& stored_keys = singleton<stored_object_tab>::instance().stored_keys_;
+    ON_STORED_DEBUG(if (!dal_static_stored_tab_valid__) return false)
+    return  (stored_keys.find(o) != stored_keys.end());
   }
 
-  pstatic_stored_object search_stored_object(pstatic_stored_object_key k) 
-  {
-    stored_object_tab& stored_objects
-        = dal::singleton<stored_object_tab>::instance();
-    if (dal_static_stored_tab_valid__) {
-      pstatic_stored_object p = stored_objects.search_stored_object(k);
-      if (p) return p;
-    }
-    return 0;
+  pstatic_stored_object search_stored_object(pstatic_stored_object_key k){
+    auto& stored_objects = singleton<stored_object_tab>::instance();
+    ON_STORED_DEBUG(if (!dal_static_stored_tab_valid__) return nullptr)
+    return stored_objects.search_stored_object(k);
   }
 
-  pstatic_stored_object 
search_stored_object_on_all_threads(pstatic_stored_object_key k)
-  {
+  pstatic_stored_object 
search_stored_object_on_all_threads(pstatic_stored_object_key k){
     auto& stored_objects = singleton<stored_object_tab>::instance();
-    if (!dal_static_stored_tab_valid__) return nullptr;
+    ON_STORED_DEBUG(if (!dal_static_stored_tab_valid__) return nullptr)
     auto p = stored_objects.search_stored_object(k);
     if (p) return p;
-    if (num_threads()  == 1) return nullptr;
-    for(size_t thread = 0; thread < getfem::num_threads(); thread++)
-    {
-      if (thread == this_thread()) continue;
+    if (singleton<stored_object_tab>::num_threads() == 1) return nullptr;
+    for(size_t thread = 0; thread < 
singleton<stored_object_tab>::num_threads(); ++thread){
+      if (thread == singleton<stored_object_tab>::this_thread()) continue;
       auto& other_objects = singleton<stored_object_tab>::instance(thread);
       p = other_objects.search_stored_object(k);
       if (p) return p;
     }
     return nullptr;
-   }
+  }
 
-  std::pair<stored_object_tab::iterator, stored_object_tab::iterator> 
iterators_of_object(
-    pstatic_stored_object o)
-  {
-    for(size_t thread=0; thread < num_threads(); ++thread)
-    {
-      stored_object_tab& stored_objects
-        = dal::singleton<stored_object_tab>::instance(thread);
-      if (!dal_static_stored_tab_valid__) continue;
-      stored_object_tab::iterator it = stored_objects.iterator_of_object_(o);
+  std::pair<stored_object_tab::iterator, stored_object_tab::iterator>
+    iterators_of_object(pstatic_stored_object o){
+    for(size_t thread = 0; thread != 
singleton<stored_object_tab>::num_threads(); ++thread){
+      auto& stored_objects = singleton<stored_object_tab>::instance(thread);
+      ON_STORED_DEBUG(if (!dal_static_stored_tab_valid__) continue;)
+      auto it = stored_objects.iterator_of_object_(o);
       if (it != stored_objects.end()) return {it, stored_objects.end()};
     }
-    return {dal::singleton<stored_object_tab>::instance().end(),
-            dal::singleton<stored_object_tab>::instance().end()};
+    return {singleton<stored_object_tab>::instance().end(),
+            singleton<stored_object_tab>::instance().end()};
   }
 
 
-  void test_stored_objects(void) 
-  {
-    for(size_t thread = 0; thread < num_threads(); ++thread)
-    {
-      stored_object_tab& stored_objects 
-        = dal::singleton<stored_object_tab>::instance(thread);
-      if (!dal_static_stored_tab_valid__) continue;
-      stored_object_tab::stored_key_tab& stored_keys = 
stored_objects.stored_keys_;
-
-      GMM_ASSERT1(stored_objects.size() == stored_keys.size(), 
-        "keys and objects tables don't match");
-      for (stored_object_tab::stored_key_tab::iterator it = 
stored_keys.begin();
-        it != stored_keys.end(); ++it)
-      {
-        auto itos = iterators_of_object(it->first);
+  void test_stored_objects(void){
+    for(size_t thread = 0; thread != 
singleton<stored_object_tab>::num_threads(); ++thread){
+      auto& stored_objects = singleton<stored_object_tab>::instance(thread);
+      ON_STORED_DEBUG(if (!dal_static_stored_tab_valid__) continue;)
+      auto& stored_keys = stored_objects.stored_keys_;
+
+      GMM_ASSERT1(stored_objects.size() == stored_keys.size(),
+                  "keys and objects tables don't match");
+      for (auto &&pair : stored_keys){
+        auto itos = iterators_of_object(pair.first);
         GMM_ASSERT1(itos.first != itos.second, "Key without object is found");
       }
-      for (stored_object_tab::iterator it = stored_objects.begin();
-        it != stored_objects.end(); ++it)
-      {
-        auto itos = iterators_of_object(it->second.p);
+      for (auto &&pair : stored_objects){
+        auto itos = iterators_of_object(pair.second.p);
         GMM_ASSERT1(itos.first != itos.second, "Object has key but cannot be 
found");
       }
     }
@@ -253,364 +230,301 @@ static bool dal_static_stored_tab_valid__ = true;
   void add_dependency(pstatic_stored_object o1,
                       pstatic_stored_object o2) {
     bool dep_added = false;
-    for(size_t thread=0; thread < num_threads(); ++thread)
-    {
-      stored_object_tab& stored_objects
-          = dal::singleton<stored_object_tab>::instance(thread);
-      if (!dal_static_stored_tab_valid__) return;
+    for(size_t thread = 0; thread != 
singleton<stored_object_tab>::num_threads(); ++thread){
+      auto& stored_objects = singleton<stored_object_tab>::instance(thread);
+      ON_STORED_DEBUG(if (!dal_static_stored_tab_valid__) return)
       if ((dep_added = stored_objects.add_dependency_(o1,o2))) break;
     }
     GMM_ASSERT1(dep_added, "Failed to add dependency between " << o1
-               << " of type " << typeid(*o1).name() << " and " << o2
-               << " of type "  << typeid(*o2).name() << ". ");
+                << " of type " << typeid(*o1).name() << " and " << o2
+                << " of type "  << typeid(*o2).name() << ". ");
 
     bool dependent_added = false;
-    for(size_t thread=0; thread < num_threads(); ++thread)
-    {
-      stored_object_tab& stored_objects
-          = dal::singleton<stored_object_tab>::instance(thread);
+    for(size_t thread = 0; thread != 
singleton<stored_object_tab>::num_threads(); ++thread){
+      auto& stored_objects = singleton<stored_object_tab>::instance(thread);
       if ((dependent_added = stored_objects.add_dependent_(o1,o2))) break;
     }
     GMM_ASSERT1(dependent_added, "Failed to add dependent between " << o1
-               << " of type " << typeid(*o1).name() << " and " << o2
-               << " of type "  << typeid(*o2).name() << ". ");
+                << " of type " << typeid(*o1).name() << " and " << o2
+                << " of type "  << typeid(*o2).name() << ". ");
   }
 
 
 
-  /*remove a dependency (from storages of all threads). 
+  /*remove a dependency (from storages of all threads).
   Return true if o2 has no more dependent object. */
-  bool del_dependency(pstatic_stored_object o1,
-    pstatic_stored_object o2) 
-  {
+  bool del_dependency(pstatic_stored_object o1, pstatic_stored_object o2){
     bool dep_deleted = false;
-    for(size_t thread=0; thread < num_threads(); ++thread)
-    {
-      stored_object_tab& stored_objects
-          = dal::singleton<stored_object_tab>::instance(thread);
-      if (!dal_static_stored_tab_valid__) return false;
+    for(size_t thread = 0; thread != 
singleton<stored_object_tab>::num_threads(); ++thread){
+      auto& stored_objects = singleton<stored_object_tab>::instance(thread);
+      ON_STORED_DEBUG(if (!dal_static_stored_tab_valid__) return false)
       if ((dep_deleted = stored_objects.del_dependency_(o1,o2))) break;
     }
     GMM_ASSERT1(dep_deleted, "Failed to delete dependency between " << o1 << " 
of type "
-    << typeid(*o1).name() << " and " << o2 << " of type "  << 
typeid(*o2).name() << ". ");
+                << typeid(*o1).name() << " and " << o2 << " of type "
+                << typeid(*o2).name() << ". ");
 
     bool dependent_deleted = false;
     bool dependent_empty = false;
-    for(size_t thread=0; thread < num_threads(); ++thread)
-    {
-      stored_object_tab& stored_objects
-          = dal::singleton<stored_object_tab>::instance(thread);
+    for(size_t thread = 0; thread != 
singleton<stored_object_tab>::num_threads(); ++thread){
+      auto& stored_objects = singleton<stored_object_tab>::instance(thread);
       dependent_deleted = stored_objects.del_dependent_(o1,o2);
-      if (dependent_deleted)
-      {
+      if (dependent_deleted){
         dependent_empty = stored_objects.has_dependent_objects(o2);
         break;
       }
     }
     GMM_ASSERT1(dependent_deleted, "Failed to delete dependent between " << o1 
<< " of type "
-    << typeid(*o1).name() << " and " << o2 << " of type "  << 
typeid(*o2).name() << ". ");
+                << typeid(*o1).name() << " and " << o2 << " of type "
+                << typeid(*o2).name() << ". ");
 
     return dependent_empty;
   }
 
-
   void add_stored_object(pstatic_stored_object_key k, pstatic_stored_object o,
                          permanence perm) {
-    GMM_ASSERT1(dal_static_stored_tab_valid__, "Too late to add an object");
-    stored_object_tab& stored_objects
-      = dal::singleton<stored_object_tab>::instance();
-    if (dal_static_stored_tab_valid__)
-      stored_objects.add_stored_object(k,o,perm);
+    STORED_ASSERT(dal_static_stored_tab_valid__, "Too late to add an object");
+    auto& stored_objects = singleton<stored_object_tab>::instance();
+    stored_objects.add_stored_object(k,o,perm);
   }
 
-  void basic_delete(std::list<pstatic_stored_object> &to_delete) {
-    stored_object_tab& stored_objects_this_thread
-      = dal::singleton<stored_object_tab>::instance();
-
-    if (dal_static_stored_tab_valid__) {
+  void basic_delete(std::list<pstatic_stored_object> &to_delete){
+    auto& stored_objects_this_thread = 
singleton<stored_object_tab>::instance();
 
+    ON_STORED_DEBUG(if (!dal_static_stored_tab_valid__) return)
     stored_objects_this_thread.basic_delete_(to_delete);
-    
-    if (!to_delete.empty()) //need to delete from other threads
-      {
-        for(size_t thread=0; thread < num_threads(); ++thread)
-         { 
-           if (thread == this_thread()) continue;
-           stored_object_tab& stored_objects
-              = dal::singleton<stored_object_tab>::instance(thread);
-           stored_objects.basic_delete_(to_delete);
-           if (to_delete.empty()) break;
-         }
+
+    if (!to_delete.empty()){ //need to delete from other threads
+      for(size_t thread = 0; thread != 
singleton<stored_object_tab>::num_threads(); ++thread){
+        if (thread == singleton<stored_object_tab>::this_thread()) continue;
+        auto& stored_objects = singleton<stored_object_tab>::instance(thread);
+        stored_objects.basic_delete_(to_delete);
+        if (to_delete.empty()) break;
       }
-    if (me_is_multithreaded_now())
-      {
+    }
+    if (getfem::me_is_multithreaded_now()){
         if (!to_delete.empty()) GMM_WARNING1("Not all objects were deleted");
-      }
-    else
-      {
-        GMM_ASSERT1(to_delete.empty(), "Could not delete objects");
-      }
     }
+    else{GMM_ASSERT1(to_delete.empty(), "Could not delete objects");}
   }
-  
+
   void del_stored_objects(std::list<pstatic_stored_object> &to_delete,
-                          bool ignore_unstored) 
-  {
+                          bool ignore_unstored) {
     getfem::omp_guard lock;
     GMM_NOPERATION(lock);
-    // stored_object_tab& stored_objects
-    //   = dal::singleton<stored_object_tab>::instance();
-
-    if (dal_static_stored_tab_valid__) {
 
-      std::list<pstatic_stored_object>::iterator it, itnext;
-      for (it = to_delete.begin(); it != to_delete.end(); it = itnext) {
-        itnext = it; itnext++;
-
-        auto itos = iterators_of_object(*it);
-        if (itos.first == itos.second) {
-          if (ignore_unstored)
-            to_delete.erase(it);
-          else
-            if (me_is_multithreaded_now()) {
-              GMM_WARNING1("This object is (already?) not stored : "<< 
it->get()
-              << " typename: " << typeid(*it->get()).name() 
-              << "(which could happen in multithreaded code and is OK)");
-            } else {
-              GMM_ASSERT1(false, "This object is not stored : " << it->get()
-              << " typename: " << typeid(*it->get()).name());
-            }
+    ON_STORED_DEBUG(if (dal_static_stored_tab_valid__) return);
+
+    std::list<pstatic_stored_object>::iterator it, itnext;
+    for (it = to_delete.begin(); it != to_delete.end(); it = itnext) {
+      itnext = it; itnext++;
+
+      auto itos = iterators_of_object(*it);
+      if (itos.first == itos.second) {
+        if (ignore_unstored) to_delete.erase(it);
+        else if (getfem::me_is_multithreaded_now()) {
+            GMM_WARNING1("This object is (already?) not stored : "<< it->get()
+            << " typename: " << typeid(*it->get()).name()
+            << "(which could happen in multithreaded code and is OK)");
+        } else {
+          GMM_ASSERT1(false, "This object is not stored : " << it->get()
+          << " typename: " << typeid(*it->get()).name());
         }
-        else
-          itos.first->second.valid = false;
       }
+      else itos.first->second.valid = false;
+    }
 
-      for (pstatic_stored_object pobj : to_delete) {
-        if (pobj) {
-          auto itos = iterators_of_object(pobj);
-          GMM_ASSERT1(itos.first != itos.second, "An object disapeared !");
-          itos.first->second.valid = false;
-          auto second_dep = itos.first->second.dependencies;
-          for (const pstatic_stored_object pdep : second_dep) {
-            if (del_dependency(pobj, pdep)) {
-              auto itods = iterators_of_object(pdep);
-              if (itods.first->second.perm == AUTODELETE_STATIC_OBJECT
-                  && itods.first->second.valid) {
-                itods.first->second.valid = false;
-                to_delete.push_back(pdep);
-              }
+    for (auto &&pobj : to_delete) {
+      if (pobj) {
+        auto itos = iterators_of_object(pobj);
+        GMM_ASSERT1(itos.first != itos.second, "An object disapeared !");
+        itos.first->second.valid = false;
+        auto second_dep = itos.first->second.dependencies;
+        for (const auto &pdep : second_dep) {
+          if (del_dependency(pobj, pdep)) {
+            auto itods = iterators_of_object(pdep);
+            if (itods.first->second.perm == AUTODELETE_STATIC_OBJECT
+                && itods.first->second.valid) {
+              itods.first->second.valid = false;
+              to_delete.push_back(pdep);
             }
           }
-          for (pstatic_stored_object
-               pdep : itos.first->second.dependent_object) {
-            auto itods = iterators_of_object(pdep);
-            if (itods.first != itods.second) {
-              GMM_ASSERT1(itods.first->second.perm != PERMANENT_STATIC_OBJECT,
-              "Trying to delete a permanent object " << pdep);
-              if (itods.first->second.valid) {
-                itods.first->second.valid = false;
-                to_delete.push_back(itods.first->second.p);
-              }
+        }
+        for (auto &&pdep : itos.first->second.dependent_object) {
+          auto itods = iterators_of_object(pdep);
+          if (itods.first != itods.second) {
+            GMM_ASSERT1(itods.first->second.perm != PERMANENT_STATIC_OBJECT,
+            "Trying to delete a permanent object " << pdep);
+            if (itods.first->second.valid) {
+              itods.first->second.valid = false;
+              to_delete.push_back(itods.first->second.p);
             }
           }
         }
       }
-      basic_delete(to_delete);
     }
+    basic_delete(to_delete);
   }
 
-
-  void del_stored_object(const pstatic_stored_object &o, bool ignore_unstored) 
-  {
+  void del_stored_object(const pstatic_stored_object &o, bool ignore_unstored){
     std::list<pstatic_stored_object> to_delete;
     to_delete.push_back(o);
     del_stored_objects(to_delete, ignore_unstored);
   }
 
 
-  void del_stored_objects(permanence perm) 
-  {
+  void del_stored_objects(permanence perm){
     std::list<pstatic_stored_object> to_delete;
-    for(size_t thread=0; thread<getfem::num_threads();thread++)
-    {
-      stored_object_tab& stored_objects
-        = dal::singleton<stored_object_tab>::instance(thread);
-      if (!dal_static_stored_tab_valid__) continue;
+    for(size_t thread = 0; thread != 
singleton<stored_object_tab>::num_threads(); ++thread){
+      auto& stored_objects = singleton<stored_object_tab>::instance(thread);
+      ON_STORED_DEBUG(if (!dal_static_stored_tab_valid__) continue;)
       if (perm == PERMANENT_STATIC_OBJECT) perm = STRONG_STATIC_OBJECT;
-      stored_object_tab::iterator it;
-      for (it = stored_objects.begin(); it != stored_objects.end(); ++it)
-        if (it->second.perm >= perm)
-          to_delete.push_back(it->second.p);
+      for (auto &&pair : stored_objects){
+        if (pair.second.perm >= perm) to_delete.push_back(pair.second.p);
+      }
     }
     del_stored_objects(to_delete, false);
   }
 
-  void list_stored_objects(std::ostream &ost) 
-  {
-    for(size_t thread=0; thread<getfem::num_threads();thread++)
-    {
-      stored_object_tab::stored_key_tab& stored_keys 
-        = dal::singleton<stored_object_tab>::instance(thread).stored_keys_;
-      if (!dal_static_stored_tab_valid__) continue;
+  void list_stored_objects(std::ostream &ost){
+    for(size_t thread = 0; thread != 
singleton<stored_object_tab>::num_threads(); ++thread){
+      auto& stored_keys = 
singleton<stored_object_tab>::instance(thread).stored_keys_;
+      ON_STORED_DEBUG(if (!dal_static_stored_tab_valid__) continue;)
       if (stored_keys.begin() == stored_keys.end())
         ost << "No static stored objects" << endl;
       else ost << "Static stored objects" << endl;
-      for (const auto &t : stored_keys) 
+      for (const auto &t : stored_keys)
         ost << "Object: " << t.first << " typename: "
             << typeid(*(t.first)).name() << endl;
     }
   }
 
-  size_t nb_stored_objects(void) 
-  {
-    long num_objects=0;
-    for(size_t thread=0;thread<getfem::num_threads(); ++thread)
-    {
-      stored_object_tab::stored_key_tab& stored_keys 
-      = dal::singleton<stored_object_tab>::instance(thread).stored_keys_;
-      if (!dal_static_stored_tab_valid__) continue;
-      num_objects+=stored_keys.size();
+  size_t nb_stored_objects(void){
+    long num_objects = 0;
+    for(size_t thread = 0; thread != 
singleton<stored_object_tab>::num_threads(); ++thread){
+      auto& stored_keys = 
singleton<stored_object_tab>::instance(thread).stored_keys_;
+      ON_STORED_DEBUG(if (!dal_static_stored_tab_valid__) continue;)
+      num_objects += stored_keys.size();
     }
     return num_objects;
   }
 
-
-
-
 /**
   STATIC_STORED_TAB -------------------------------------------------------
 */
-  stored_object_tab::stored_object_tab() 
+  stored_object_tab::stored_object_tab()
     : std::map<enr_static_stored_object_key, enr_static_stored_object>(),
-      locks_(), stored_keys_()
-  { dal_static_stored_tab_valid__ = true; }
+      locks_{}, stored_keys_{}{
+      ON_STORED_DEBUG(dal_static_stored_tab_valid__ = true;)
+    }
 
-  stored_object_tab::~stored_object_tab()
-  { dal_static_stored_tab_valid__ = false; }
+  stored_object_tab::~stored_object_tab(){
+    ON_STORED_DEBUG(dal_static_stored_tab_valid__ = false;)
+  }
 
   pstatic_stored_object
-  stored_object_tab::search_stored_object(pstatic_stored_object_key k) const
-  {
-   getfem::local_guard guard = locks_.get_lock();
-   stored_object_tab::const_iterator it=find(enr_static_stored_object_key(k));
-   return (it != end()) ? it->second.p : 0;
+  stored_object_tab::search_stored_object(pstatic_stored_object_key k) const{
+    auto guard = locks_.get_lock();
+    auto it = find(enr_static_stored_object_key(k));
+    return (it != end()) ? it->second.p : nullptr;
   }
 
   bool stored_object_tab::add_dependency_(pstatic_stored_object o1,
-                                          pstatic_stored_object o2)
-  {
-    getfem::local_guard guard = locks_.get_lock();
-    stored_key_tab::const_iterator it = stored_keys_.find(o1);
+                                          pstatic_stored_object o2){
+    auto guard = locks_.get_lock();
+    auto it = stored_keys_.find(o1);
     if (it == stored_keys_.end()) return false;
-    iterator ito1 = find(it->second);
+    auto ito1 = find(it->second);
     GMM_ASSERT1(ito1 != end(), "Object has a key, but cannot be found");
     ito1->second.dependencies.insert(o2);
     return true;
   }
 
-  void stored_object_tab::add_stored_object(pstatic_stored_object_key k, 
-    pstatic_stored_object o,  permanence perm) 
-  {
+  void stored_object_tab::add_stored_object(pstatic_stored_object_key k,
+    pstatic_stored_object o,  permanence perm){
     DAL_STORED_OBJECT_DEBUG_ADDED(o.get());
-    getfem::local_guard guard = locks_.get_lock();
+    auto guard = locks_.get_lock();
     GMM_ASSERT1(stored_keys_.find(o) == stored_keys_.end(),
       "This object has already been stored, possibly with another key");
     stored_keys_[o] = k;
     insert(std::make_pair(enr_static_stored_object_key(k),
                           enr_static_stored_object(o, perm)));
-    size_t t = this_thread();
-    GMM_ASSERT2(stored_keys_.size() == size() && t != size_t(-1), 
+    auto t = singleton<stored_object_tab>::this_thread();
+    GMM_ASSERT2(stored_keys_.size() == size() && t != size_t(-1),
       "stored_keys are not consistent with stored_object tab");
   }
 
   bool stored_object_tab::add_dependent_(pstatic_stored_object o1,
-    pstatic_stored_object o2)
-  {
-    getfem::local_guard guard = locks_.get_lock();
-    stored_key_tab::const_iterator it = stored_keys_.find(o2);
+    pstatic_stored_object o2){
+    auto guard = locks_.get_lock();
+    auto it = stored_keys_.find(o2);
     if (it == stored_keys_.end()) return false;
-    iterator ito2 = find(it->second);
+    auto ito2 = find(it->second);
     GMM_ASSERT1(ito2 != end(), "Object has a key, but cannot be found");
     ito2->second.dependent_object.insert(o1);
     return true;
   }
 
   bool stored_object_tab::del_dependency_(pstatic_stored_object o1,
-    pstatic_stored_object o2)
-  {
-    getfem::local_guard guard = locks_.get_lock();
-    stored_key_tab::const_iterator it1 = stored_keys_.find(o1);
+                                          pstatic_stored_object o2){
+    auto guard = locks_.get_lock();
+    auto it1 = stored_keys_.find(o1);
     if (it1 == stored_keys_.end()) return false;
-    iterator ito1 = find(it1->second);
+    auto ito1 = find(it1->second);
     GMM_ASSERT1(ito1 != end(), "Object has a key, but cannot be found");
     ito1->second.dependencies.erase(o2);
     return true;
   }
 
-  stored_object_tab::iterator 
stored_object_tab::iterator_of_object_(pstatic_stored_object o)
-  {
-    stored_key_tab::const_iterator itk = stored_keys_.find(o);
+  stored_object_tab::iterator stored_object_tab
+    ::iterator_of_object_(pstatic_stored_object o){
+    auto itk = stored_keys_.find(o);
     if (itk == stored_keys_.end()) return end();
-    iterator ito = find(itk->second);
+    auto ito = find(itk->second);
     GMM_ASSERT1(ito != end(), "Object has a key, but is not stored");
     return ito;
   }
 
   bool stored_object_tab::del_dependent_(pstatic_stored_object o1,
-    pstatic_stored_object o2)
-  {
-    getfem::local_guard guard = locks_.get_lock();
-    stored_key_tab::const_iterator it2 = stored_keys_.find(o2);
+                                         pstatic_stored_object o2){
+    auto guard = locks_.get_lock();
+    auto it2 = stored_keys_.find(o2);
     if (it2 == stored_keys_.end()) return false;
-    iterator ito2 = find(it2->second);
+    auto ito2 = find(it2->second);
     GMM_ASSERT1(ito2 != end(), "Object has a key, but cannot be found");
     ito2->second.dependent_object.erase(o1);
     return true;
   }
 
-  bool stored_object_tab::exists_stored_object(pstatic_stored_object o) const
-  {
-    getfem::local_guard guard = locks_.get_lock();
+  bool stored_object_tab::exists_stored_object(pstatic_stored_object o) const{
+    auto guard = locks_.get_lock();
     return (stored_keys_.find(o) != stored_keys_.end());
   }
 
-  bool stored_object_tab::has_dependent_objects(pstatic_stored_object o) const
-  {
-    getfem::local_guard guard = locks_.get_lock();
-    stored_key_tab::const_iterator it = stored_keys_.find(o);
+  bool stored_object_tab::has_dependent_objects(pstatic_stored_object o) const{
+    auto guard = locks_.get_lock();
+    auto it = stored_keys_.find(o);
     GMM_ASSERT1(it != stored_keys_.end(), "Object is not stored");
-    const_iterator ito = find(it->second);
+    auto ito = find(it->second);
     GMM_ASSERT1(ito != end(), "Object has a key, but cannot be found");
     return ito->second.dependent_object.empty();
   }
 
-
-
-  void stored_object_tab::basic_delete_
-  (std::list<pstatic_stored_object> &to_delete)
-  {
-    getfem::local_guard guard = locks_.get_lock();
-    std::list<pstatic_stored_object>::iterator it;
-    for (it = to_delete.begin(); it != to_delete.end(); ) 
-    {
+  void stored_object_tab::basic_delete_(std::list<pstatic_stored_object> 
&to_delete){
+    auto guard = locks_.get_lock();
+    for (auto it = to_delete.begin(); it != to_delete.end();){
       DAL_STORED_OBJECT_DEBUG_DELETED(it->get());
-      stored_key_tab::iterator itk = stored_keys_.find(*it);
-      stored_object_tab::iterator ito = end();
-      if (itk != stored_keys_.end())
-      {
+      auto itk = stored_keys_.find(*it);
+      auto ito = end();
+      if (itk != stored_keys_.end()){
           ito = find(itk->second);
           stored_keys_.erase(itk);
       }
-      if (ito != end()) 
-      {
+      if (ito != end()){
         erase(ito);
         it = to_delete.erase(it);
-      } else {
-        ++it;
-      }
+      } else ++it;
     }
   }
 
-
-
-}
+}/* end of namespace dal                                                       
      */
\ No newline at end of file
diff --git a/src/getfem/bgeot_geometric_trans.h 
b/src/getfem/bgeot_geometric_trans.h
index 5159b0a..0c10fba 100644
--- a/src/getfem/bgeot_geometric_trans.h
+++ b/src/getfem/bgeot_geometric_trans.h
@@ -110,6 +110,7 @@ namespace bgeot {
     std::vector<size_type> vertices_;
     size_type complexity_; /* either the degree or the refinement of the
                             *  transformation */
+    std::string name_;
 
     void fill_standard_vertices();
   public :
@@ -159,6 +160,8 @@ namespace bgeot {
     template<class CONT> base_node transform(const base_node &pt,
                                              const CONT &PTAB) const;
     base_node transform(const base_node &pt, const base_matrix &G) const;
+    void set_name(const std::string &name){name_ = name;}
+    const std::string& debug_name() const {return name_;}
     virtual void project_into_reference_convex(base_node &pt) const
       { cvr->project_into(pt); }
     size_type complexity() const { return complexity_; }
diff --git a/src/getfem/bgeot_small_vector.h b/src/getfem/bgeot_small_vector.h
index a952d26..12cd4fc 100644
--- a/src/getfem/bgeot_small_vector.h
+++ b/src/getfem/bgeot_small_vector.h
@@ -154,7 +154,7 @@ namespace bgeot {
     static_block_allocator() { if (!palloc) 
palloc=&dal::singleton<block_allocator,1000>::instance(); } //new 
block_allocator(); }
   };
   
-#if !defined GETFEM_HAVE_OPENMP
+#if !defined GETFEM_HAS_OPENMP
   /** container for small vectors of POD (Plain Old Data) types. Should be as 
fast as 
       std::vector<T> while beeing smaller and uses copy-on-write. The gain is 
especially
       valuable on 64 bits architectures.
@@ -379,7 +379,7 @@ namespace bgeot {
     }
 
 
-#endif // #if !defined GETFEM_HAVE_OPENMP
+#endif // #if !defined GETFEM_HAS_OPENMP
 
 
   template<class T> std::ostream& operator<<(std::ostream& os, const 
small_vector<T>& v) {
diff --git a/src/getfem/dal_naming_system.h b/src/getfem/dal_naming_system.h
index e06c1cc..23cfe0d 100644
--- a/src/getfem/dal_naming_system.h
+++ b/src/getfem/dal_naming_system.h
@@ -34,10 +34,11 @@
 
 #include <deque>
 #include <map>
+
 #include "dal_static_stored_objects.h"
+#include "getfem/getfem_locale.h"
 #include "getfem_omp.h"
 
-
 namespace dal {
 
   /** @file dal_naming_system.h
@@ -116,7 +117,7 @@ namespace dal {
     std::string shorter_name_of_method(pmethod pm) const;
     pmethod method(const std::string &name, size_type &i,
                   bool throw_if_not_found = true)
-    { gmm::standard_locale sl; return method_(name, i, throw_if_not_found); }
+    { getfem::standard_locale sl; return method_(name, i, throw_if_not_found); 
}
     naming_system(std::string pr) : prefix(pr) {}
     bool delete_method(std::string name);
   };
@@ -240,7 +241,7 @@ namespace dal {
          state = 3; break;
        case 3  : {
          char *p;
-         gmm::standard_locale sl;
+         getfem::standard_locale sl;
          params.push_back(parameter(strtod(&(name[i]), &p)));
          i += l; if (p < &(name[i])) error = true;
          state = 3; break;
@@ -262,7 +263,7 @@ namespace dal {
                  << " of the string : " << name);
       if (isend) {
        std::stringstream norm_name(suff); //norm_name.imbue(std::locale("C"));
-       gmm::standard_locale loc;
+       getfem::standard_locale loc;
        norm_name << suff;
        if (params.size() > 0) {
          norm_name << '(';
diff --git a/src/getfem/dal_singleton.h b/src/getfem/dal_singleton.h
index 454f018..282cb5d 100644
--- a/src/getfem/dal_singleton.h
+++ b/src/getfem/dal_singleton.h
@@ -35,102 +35,108 @@
 @brief A simple singleton implementation
 
 Singleton was made thread safe for OpenMP
-However, now there is a singleton instance for every 
+However, now there is a singleton instance for every
 thread (singleton is thread local). This replicates
-the behaviour of singletons in distirbuted MPI-like 
+the behaviour of singletons in distributed MPI-like
 environment;
 */
-#ifndef DAL_SINGLETON
-#define DAL_SINGLETON
+
+#pragma once
 
 #include <vector>
 #include <memory>
+
 #include "getfem_omp.h"
 
 
 namespace dal {
 
+  using bgeot::size_type;
+
   class singleton_instance_base {
   public:
-    virtual ~singleton_instance_base() {}
-    virtual int level() = 0;
+    virtual ~singleton_instance_base() {};
+    virtual int level() const = 0;
   };
 
-
   class singletons_manager {
-  protected:
     getfem::omp_distribute<std::vector<singleton_instance_base *>> lst;
+    size_type nb_partitions;
     static singletons_manager& manager();
-    static singletons_manager& m;
 
   public:
     static void register_new_singleton(singleton_instance_base *p);
     static void register_new_singleton(singleton_instance_base *p,
-                                      size_t ithread);
+                                       size_t ithread);
+    static void on_partitions_change();
+
+    /**destroy singletons in increasing order*/
     ~singletons_manager();
+
   private:
     singletons_manager();
   };
 
+  template <typename T, int LEV>
+  class singleton_instance : public singleton_instance_base {
 
+    static getfem::omp_distribute<T*>& omp_distro() {
+      static auto instance = getfem::omp_distribute<T*>{};
+      return instance;
+    }
 
+    static T*& instance_pointer() {
+      return omp_distro().thrd_cast();
+    }
 
-  template <typename T, int LEV> class singleton_instance
-    : public singleton_instance_base {
-    static getfem::omp_distribute<T*>* instance_;
-    static getfem::omp_distribute<T*>* omp_distro_pointer() {
-      static getfem::omp_distribute<T*>*
-       pointer = new getfem::omp_distribute<T*>();
-      return pointer;
+    static T*& instance_pointer(size_t ithread) {
+      return omp_distro()(ithread);
     }
-    static T*& instance_pointer() { return omp_distro_pointer()->thrd_cast(); }
-    static T*& instance_pointer(size_t ithread)
-    { return (*omp_distro_pointer())(ithread);}
-    
+
   public:
-    
-    singleton_instance() {}
-    
-    /** Instance from the current thread*/
-    inline static T& instance() { 
-      T*& tinstance_ = instance_pointer();
-      if (!tinstance_) {
-        tinstance_ = new T();
-        singletons_manager::register_new_singleton
-         (new singleton_instance<T,LEV>());
-      }
-      return *tinstance_; 
-    }
-    
+
     /**Instance from thread ithread*/
-    inline static T& instance(size_t ithread) { 
+    inline static T& instance(size_t ithread) {
+      omp_distro().on_thread_update();
       T*& tinstance_ = instance_pointer(ithread);
       if (!tinstance_) {
         tinstance_ = new T();
-        singletons_manager::register_new_singleton
-         (new singleton_instance<T,LEV>(),ithread);
+        singletons_manager::register_new_singleton(
+          new singleton_instance<T,LEV>(), ithread);
       }
-      return *tinstance_; 
+      return *instance_pointer(ithread);
+    }
+
+    /** Instance from the current thread*/
+    inline static T& instance() {
+      return instance(this_thread());
+    }
+
+    inline static size_type num_threads() {
+      return omp_distro().num_threads();
+    }
+
+    inline static size_type this_thread() {
+      return omp_distro().this_thread();
+    }
+
+    int level() const override {
+      return LEV;
     }
-    
-    int level() { return LEV; }
-    
+
     ~singleton_instance() {
-      if (instance_) {
-        for(size_t i = 0; i < getfem::num_threads(); i++) {
-          if((*instance_)(i))  { 
-            delete (*instance_)(i); 
-            (*instance_)(i) = 0; 
-          }
-        } 
+      for(size_t i = 0; i != omp_distro().num_threads(); ++i) {
+        if(omp_distro()(i)){
+          delete omp_distro()(i);
+          omp_distro()(i) = nullptr;
+        }
       }
-      delete instance_; instance_=0;
     }
   };
 
-  /** singleton class. 
-      
-      usage: 
+  /** singleton class.
+
+      usage:
       @code
       foo &f = singleton<foo>::instance();
       const foo &f = singleton<foo>::const_instance();
@@ -141,33 +147,40 @@ namespace dal {
   */
   template <typename T, int LEV=1> class singleton {
   public:
-    
+
+    singleton(const singleton&) = delete;
+    singleton& operator=(const singleton&) = delete;
+
     /** Instance from the current thread*/
-    inline static T& instance() { 
+    inline static T& instance() {
       return singleton_instance<T,LEV>::instance();
     }
-    inline static const T& const_instance() { return instance(); }
 
-    inline static T& instance(size_t ithread) { 
+    inline static const T& const_instance() {
+      return instance();
+    }
+
+    inline static T& instance(size_t ithread) {
       return singleton_instance<T,LEV>::instance(ithread);
     }
-    inline static const T& const_instance(size_t ithread)
-    { return instance(ithread); }
 
-    
-  protected:
-    singleton() {}
-    ~singleton() {}
-  private:
-    singleton(const singleton&);            
-    singleton& operator=(const singleton&);
-  };
-  
-  template <typename T, int LEV> 
-  getfem::omp_distribute<T*>* singleton_instance<T,LEV>::instance_
-  = singleton_instance<T,LEV>::omp_distro_pointer();
-}
+    inline static const T& const_instance(size_t ithread){
+      return instance(ithread);
+    }
 
-#endif
+    /** number of threads this singleton is distributed on.*/
+    inline static size_type num_threads(){
+      return singleton_instance<T,LEV>::num_threads();
+    }
+
+    /** this thread number according to the threading policy of the singleton*/
+    inline static size_type this_thread() {
+      return singleton_instance<T, LEV>::this_thread();
+    }
 
+  protected:
+    singleton() = default;
+    ~singleton() = default;
+  };
 
+}/* end of namespace dal                                                       
      */
\ No newline at end of file
diff --git a/src/getfem/dal_static_stored_objects.h 
b/src/getfem/dal_static_stored_objects.h
index ce5b547..be8ca7e 100644
--- a/src/getfem/dal_static_stored_objects.h
+++ b/src/getfem/dal_static_stored_objects.h
@@ -77,7 +77,7 @@ std::shared_ptr are used.
 
 #include "getfem/getfem_arch_config.h"
 
-#ifdef GETFEM_HAVE_OPENMP
+#ifdef GETFEM_HAS_OPENMP
   #include <boost/atomic.hpp>
   typedef boost::atomic_bool atomic_bool;
   typedef boost::atomic<int> atomic_int;
@@ -371,38 +371,25 @@ namespace dal {
 
   /** delete all the specific type of stored objects*/
   template<typename OBJECT_TYPE>
-  void delete_specific_type_stored_objects(bool all_thread = false)
-  {
-    typedef typename stored_object_tab::iterator iterator;
+  void delete_specific_type_stored_objects(bool all_threads = false){
     std::list<pstatic_stored_object> delete_object_list;
 
-    if(!all_thread){
-      stored_object_tab& stored_objects
-        = dal::singleton<stored_object_tab>::instance();
-
-      iterator itb = stored_objects.begin();
-      iterator ite = stored_objects.end();
-
-      for(iterator it = itb; it != ite; ++it){
-        const OBJECT_TYPE *p_object
-          = std::dynamic_pointer_cast<const OBJECT_TYPE>(it->second.p).get();
-        if(p_object != 0) delete_object_list.push_back(it->second.p);
+    auto filter_objects = [&](stored_object_tab &stored_objects){
+      for(auto &&pair : stored_objects){
+        auto p_object = std::dynamic_pointer_cast<const 
OBJECT_TYPE>(pair.second.p);
+        if(p_object != nullptr) delete_object_list.push_back(pair.second.p);
       }
+    };
+
+    if (!all_threads){
+      auto& stored_objects = singleton<stored_object_tab>::instance();
+      filter_objects(stored_objects);
     }
     else{
-      for(size_t thread = 0; thread<getfem::num_threads();thread++)
+      for(size_t thread = 0; thread < 
singleton<stored_object_tab>::num_threads(); ++thread)
       {
-        stored_object_tab& stored_objects
-          = dal::singleton<stored_object_tab>::instance(thread);
-
-        iterator itb = stored_objects.begin();
-        iterator ite = stored_objects.end();
-
-        for(iterator it = itb; it != ite; ++it){
-          const OBJECT_TYPE *p_object
-            = std::dynamic_pointer_cast<const OBJECT_TYPE>(it->second.p).get();
-          if(p_object != 0) delete_object_list.push_back(it->second.p);
-        }
+        auto& stored_objects = singleton<stored_object_tab>::instance(thread);
+        filter_objects(stored_objects);
       }
     }
     del_stored_objects(delete_object_list, false);
diff --git a/src/getfem/getfem_accumulated_distro.h 
b/src/getfem/getfem_accumulated_distro.h
new file mode 100644
index 0000000..5ee5813
--- /dev/null
+++ b/src/getfem/getfem_accumulated_distro.h
@@ -0,0 +1,220 @@
+/* -*- c++ -*- (enables emacs c++ mode) */
+/*===========================================================================
+
+ Copyright (C) 2018 Andriy Andreykiv
+
+ This file is a part of GetFEM++
+
+ GetFEM++  is  free software;  you  can  redistribute  it  and/or modify it
+ under  the  terms  of the  GNU  Lesser General Public License as published
+ by  the  Free Software Foundation;  either version 3 of the License,  or
+ (at your option) any later version along with the GCC Runtime Library
+ Exception either version 3.1 or (at your option) any later version.
+ This program  is  distributed  in  the  hope  that it will be useful,  but
+ WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ or  FITNESS  FOR  A PARTICULAR PURPOSE.  See the GNU Lesser General Public
+ License and GCC Runtime Library Exception for more details.
+ You  should  have received a copy of the GNU Lesser General Public License
+ along  with  this program;  if not, write to the Free Software Foundation,
+ Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+
+ As a special exception, you  may use  this file  as it is a part of a free
+ software  library  without  restriction.  Specifically,  if   other  files
+ instantiate  templates  or  use macros or inline functions from this file,
+ or  you compile this  file  and  link  it  with other files  to produce an
+ executable, this file  does  not  by itself cause the resulting executable
+ to be covered  by the GNU Lesser General Public License.  This   exception
+ does not  however  invalidate  any  other  reasons why the executable file
+ might be covered by the GNU Lesser General Public License.
+
+===========================================================================*/
+
+/address@hidden getfem_accumulated_distro.h
address@hidden  Andriy Andreykiv <address@hidden>
address@hidden November 21, 2018.
address@hidden Distribution of assembly results (matrices/vectors) for parallel
+       assembly.
+
+This is the kernel of getfem.
+*/
+#pragma once
+
+#include <gmm/gmm_def.h>
+
+#include "getfem_omp.h"
+
+namespace getfem
+{
+
+namespace detail {
+
+  //T is a single vector
+  template<class T>
+  void add_spec(const T &a, T &b,
+                gmm::abstract_null_type, gmm::abstract_vector){
+    gmm::add(a, b);
+  }
+
+  //T is a single matrix
+  template<class T>
+  void add_spec(const T &a, T &b,
+                gmm::abstract_null_type, gmm::abstract_matrix){
+    gmm::add(a, b);
+  }
+
+  //T is a vector of either matrices or vectors
+  template<class T>
+  void add_list(const T &a, T &b){
+    GMM_ASSERT2(a.size() == b.size(), "size mismatch");
+    auto ita = begin(a);
+    auto itb = begin(b);
+    auto ita_end = end(a);
+    for (;ita != ita_end; ++ita, ++itb) gmm::add(*ita, *itb);
+  }
+
+  //T is a vector of vectors
+  template<class T>
+  void add_spec(const T &a, T &b,
+                gmm::abstract_vector, gmm::abstract_vector){
+    add_list(a, b);
+  }
+
+  //T is a vector of matrices
+  template<class T>
+  void add_spec(const T &a, T &b,
+                gmm::abstract_matrix, gmm::abstract_vector){
+    add_list(a, b);
+  }
+
+  //T is a single vector
+  template<class T>
+  void equal_resize_spec(T &a, const T &b,
+                         gmm::abstract_null_type, gmm::abstract_vector){
+    gmm::resize(a, gmm::vect_size(b));
+  }
+
+  //T is a single matrix
+  template<class T>
+  void equal_resize_spec(T &a, const T &b,
+                         gmm::abstract_null_type, gmm::abstract_matrix){
+    gmm::resize(a, gmm::mat_nrows(b), gmm::mat_ncols(b));
+  }
+
+  //T is a vector of either matrices or vectors
+  template<class T>
+  void equal_resize_list(T &a, const T &b){
+    GMM_ASSERT2(a.empty(), "the first list should be still empty");
+    a.resize(b.size());
+    auto ita = begin(a);
+    auto itb = begin(b);
+    auto ita_end = end(a);
+    using Component = typename T::value_type;
+    using AlgoC = gmm::linalg_traits<Component>::linalg_type;
+    for (;ita != ita_end; ++ita, ++itb){
+      equal_resize_spec(*ita, *itb, gmm::abstract_null_type{}, AlgoC{});
+    }
+  }
+
+  //T is a vector of vectors
+  template<class T>
+  void equal_resize_spec(T &a, const T &b,
+                         gmm::abstract_vector, gmm::abstract_vector){
+    equal_resize_list(a, b);
+  }
+
+  //T is a vector of matrices
+  template<class T>
+  void equal_resize_spec(T &a, const T &b,
+                         gmm::abstract_matrix, gmm::abstract_vector){
+    equal_resize_list(a, b);
+  }
+
+} // namespace detail
+
+  /**Generic addition for gmm types as well as vectors of gmm types*/
+  template<class T>
+  void gen_add(const T &a, T &b){
+    using AlgoT = typename gmm::linalg_traits<T>::linalg_type;
+    using Component = typename T::value_type;
+    using AlgoC = typename gmm::linalg_traits<Component>::linalg_type;
+    detail::add_spec(a, b, AlgoC{}, AlgoT{});
+  }
+
+  /**Resize 'a' to the same size as 'b'.
+      a and b can be matrices, vectors, or lists of matrices or vectors
+  */
+  template<class T>
+  void equal_resize(T &a, const T &b){
+    using AlgoT = typename gmm::linalg_traits<T>::linalg_type;
+    using Component = typename T::value_type;
+    using AlgoC = typename gmm::linalg_traits<Component>::linalg_type;
+    detail::equal_resize_spec(a, b, AlgoC{}, AlgoT{});
+  }
+
+  /**Takes a matrix or vector, or vector of matrices or vectors and
+  creates an empty copy on each thread. When the
+  thread computations are done (in the destructor), accumulates
+  the assembled copies into the original*/
+  template <class T>
+  class accumulated_distro
+  {
+    T& original;
+    omp_distribute<T> distributed;
+
+  public:
+
+    explicit accumulated_distro(T& l)
+      : original{l}{
+      if (distributed.num_threads() == 1) return;
+      //intentionally skipping thread 0, as accumulated_distro will
+      //use original for it
+      for(size_type t = 1; t != distributed.num_threads(); ++t){
+        equal_resize(distributed(t), original);
+      }
+    }
+
+    operator T&(){
+      if (distributed.num_threads() == 1 ||
+          distributed.this_thread() == 0) return original;
+      else return distributed;
+    }
+
+    T& operator = (const T &x){
+      return distributed = x;
+    }
+
+    ~accumulated_distro(){
+      if (distributed.num_threads() == 1) return;
+
+      GMM_ASSERT1(!me_is_multithreaded_now(),
+                  "Accumulation distribution should not run in parallel");
+
+      using namespace std;
+      auto to_add = vector<T*>{};
+      to_add.push_back(&original);
+      for (size_type t = 1; t != distributed.num_threads(); ++t){
+        to_add.push_back(&distributed(t));
+      }
+
+      //Accumulation in parallel.
+      //Adding, for instance, elements 1 to 0, 2 to 3, 5 to 4 and 7 to 6
+      //on separate 4 threads in case of parallelization of the assembly
+      //on 8 threads.
+      while (to_add.size() > 1){
+        GETFEM_OMP_PARALLEL(
+          auto i = distributed.this_thread() * 2;
+          if (i + 1 < to_add.size()){
+            auto &target = *to_add[i];
+            auto &source = *to_add[i + 1];
+            gen_add(source, target);
+          }
+        )
+        //erase every second item , as it was already added
+        for (auto it = begin(to_add);
+             next(it) < end(to_add);
+             it = to_add.erase(next(it)));
+      }
+    }
+  };
+
+} // namespace getfem
\ No newline at end of file
diff --git a/src/getfem/getfem_context.h b/src/getfem/getfem_context.h
index 064e611..e24f4e5 100644
--- a/src/getfem/getfem_context.h
+++ b/src/getfem/getfem_context.h
@@ -41,7 +41,7 @@
 #include "getfem_omp.h"
 #include <list>
 
-#ifdef GETFEM_HAVE_OPENMP
+#ifdef GETFEM_HAS_OPENMP
   #include <boost/atomic.hpp>
   typedef boost::atomic_bool atomic_bool;
 #else
diff --git a/src/getfem/getfem_interpolation.h 
b/src/getfem/getfem_interpolation.h
index 5ee69d7..ea09c6a 100644
--- a/src/getfem/getfem_interpolation.h
+++ b/src/getfem/getfem_interpolation.h
@@ -605,7 +605,7 @@ namespace getfem {
           }
           else
            mti.add_point_with_id(mf_target.point_of_basic_dof(dof),dof/qdim_t);
-        }  
+        }
     }
     interpolation(mf_source, mti, U, V, M, version, extrapolation, 
0,rg_source);
 
@@ -704,13 +704,10 @@ namespace getfem {
       interpolation_same_mesh(mf_source, mf_target, U, V, M, 0);
     else {
       omp_distribute<VECTV> V_distributed;
-      thread_exception exception;
+      auto partitioning_allowed = rg_source.is_partitioning_allowed();
       rg_source.prohibit_partitioning();
 
-      #pragma omp parallel default(shared)
-      {
-        exception.run(
-        [&] {
+      GETFEM_OMP_PARALLEL(
           auto &V_thrd = V_distributed.thrd_cast();
           gmm::resize(V_thrd, V.size());
           interpolation(
@@ -721,11 +718,8 @@ namespace getfem {
             for (size_type i = 0; i < V_thrd.size(); ++i) {
               if (gmm::abs(V_thrd[i]) > EPS) V[i] = V_thrd[i];
             }
-        });
-      }
-
-      rg_source.allow_partitioning();
-      exception.rethrow();
+      )
+      if (partitioning_allowed) rg_source.allow_partitioning();
     }
   }
 
@@ -795,7 +789,7 @@ namespace getfem {
 
     base_matrix G;
     base_vector coeff;
-    bgeot::base_tensor tensor_int_point(im_target.tensor_size());    
+    bgeot::base_tensor tensor_int_point(im_target.tensor_size());
     fem_precomp_pool fppool;
     for (dal::bv_visitor cv(im_data_convex_index); !cv.finished(); ++cv) {
 
diff --git a/src/getfem/getfem_locale.h b/src/getfem/getfem_locale.h
new file mode 100644
index 0000000..d8fe6dc
--- /dev/null
+++ b/src/getfem/getfem_locale.h
@@ -0,0 +1,58 @@
+/* -*- c++ -*- (enables emacs c++ mode) */
+/*===========================================================================
+
+ Copyright (C) 2002-2017 Yves Renard
+
+ This file is a part of GetFEM++
+
+ GetFEM++  is  free software;  you  can  redistribute  it  and/or modify it
+ under  the  terms  of the  GNU  Lesser General Public License as published
+ by  the  Free Software Foundation;  either version 3 of the License,  or
+ (at your option) any later version along with the GCC Runtime Library
+ Exception either version 3.1 or (at your option) any later version.
+ This program  is  distributed  in  the  hope  that it will be useful,  but
+ WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ or  FITNESS  FOR  A PARTICULAR PURPOSE.  See the GNU Lesser General Public
+ License and GCC Runtime Library Exception for more details.
+ You  should  have received a copy of the GNU Lesser General Public License
+ along  with  this program;  if not, write to the Free Software Foundation,
+ Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+
+ As a special exception, you  may use  this file  as it is a part of a free
+ software  library  without  restriction.  Specifically,  if   other  files
+ instantiate  templates  or  use macros or inline functions from this file,
+ or  you compile this  file  and  link  it  with other files  to produce an
+ executable, this file  does  not  by itself cause the resulting executable
+ to be covered  by the GNU Lesser General Public License.  This   exception
+ does not  however  invalidate  any  other  reasons why the executable file
+ might be covered by the GNU Lesser General Public License.
+
+===========================================================================*/
+
+/address@hidden getfem_locale.h
+  @author  Andriy Andreykiv address@hidden
+  @date November 29, 2018
+  @brief thread safe standard locale with RAII semantics
+*/
+
+#pragma once
+
+#include <locale.h>
+#include <string>
+
+#include "getfem_omp.h"
+
+namespace getfem {
+
+  /**Identical to gmm::standard_locale, but does not
+     change std::locale in multi-threaded sections
+     of the code, which is not thread-safe*/
+  class standard_locale {
+    std::string cloc;
+    std::locale cinloc;
+  public :
+    standard_locale();
+    ~standard_locale();
+  };
+
+}  /* end of namespace getfem.                                             */
\ No newline at end of file
diff --git a/src/getfem/getfem_mesh_region.h b/src/getfem/getfem_mesh_region.h
index d930a7d..a3f414c 100644
--- a/src/getfem/getfem_mesh_region.h
+++ b/src/getfem/getfem_mesh_region.h
@@ -35,65 +35,41 @@
 @brief  region objects (set of convexes and/or convex faces)
 */
 
-#ifndef GETFEM_MESH_REGION
-#define GETFEM_MESH_REGION
+#pragma once
 
 #include <bitset>
 #include <iostream>
+#include <map>
+
 #include "dal_bit_vector.h"
 #include "bgeot_convex_structure.h"
 #include "getfem_config.h"
 
-// #if __cplusplus > 199711L
-// #include <unordered_map>
-// #elif defined(GETFEM_HAVE_BOOST) && BOOST_VERSION >= 103600
-// #include <boost/unordered_map.hpp>
-// #else
-#include <map>
-// #endif
-
-// #ifdef GETFEM_HAVE_BOOST
-// #include <boost/shared_ptr.hpp>
-// #endif
-
-
 namespace getfem {
   class mesh;
 
   /** structure used to hold a set of convexes and/or convex faces.
-  @see mesh::region
+     @see mesh::region
   */
   class APIDECL mesh_region {
   public:
-    typedef std::bitset<MAX_FACES_PER_CV+1> face_bitset;
-
-    // #if __cplusplus > 199711L
-    //    typedef std::unordered_map<size_type,face_bitset> map_t;
-    // #elif defined(GETFEM_HAVE_BOOST) && BOOST_VERSION >= 103600
-    // typedef boost::unordered_map<size_type,face_bitset> map_t;
-    // #else
-    typedef std::map<size_type,face_bitset> map_t;
-    // #endif
+    using face_bitset = std::bitset<MAX_FACES_PER_CV+1>;
+    using map_t = std::map<size_type, face_bitset>;
 
   private:
 
-    typedef map_t::const_iterator const_iterator;
+    using const_iterator = map_t::const_iterator;
 
     struct impl {
       mutable map_t m;
       mutable omp_distribute<dal::bit_vector> index_;
+      mutable dal::bit_vector serial_index_;
     };
-
-    // #ifdef GETFEM_HAVE_BOOST
-    //need to use boost smart pointer, cause it's reference
-    //counting is thread safe
-    // boost::shared_ptr<impl> p;  /* the real region data */
-    // #else
     std::shared_ptr<impl> p;  /* the real region data */
-    // #endif
 
-    size_type id_;            /* used temporarily when the 
-                                mesh_region(size_type) constructor is used */
+    size_type id_;            /* used temporarily when the
+                                 mesh_region(size_type)
+                                 constructor is used */
 
     size_type type_; //optional type of the region
     omp_distribute<bool> partitioning_allowed; /** specifies that in
@@ -104,10 +80,19 @@ namespace getfem {
                           a mesh (to provide feedback) */
 
 
-    //cashing iterators for paritions
+    //cashing iterators for partitions
     mutable omp_distribute<const_iterator> itbegin;
     mutable omp_distribute<const_iterator> itend;
+
+    //flags for all the cashes
     mutable omp_distribute<bool> index_updated;
+    mutable omp_distribute<bool> partitions_updated;
+    mutable bool serial_index_updated;
+
+    void mark_region_changed() const;
+
+    void update_index() const;
+
     void update_partition_iterators() const;
 
     impl &wp() { return *p.get(); }
@@ -118,27 +103,27 @@ namespace getfem {
 
     /**when running while multithreaded, gives the iterator
     for the beginning of the region partition for the current thread*/
-    const_iterator partition_begin( ) const;
+    const_iterator partition_begin() const;
 
     /**when running while multithreaded, gives the iterator
     for the end of the region partition for the current thread*/
-    const_iterator partition_end  ( ) const;
+    const_iterator partition_end() const;
 
     /**begin iterator of the region depending if its partitioned or not*/
-    const_iterator begin( ) const;
+    const_iterator begin() const;
 
-    /**end iteratorof the region depending if its partitioned or not*/
-    const_iterator end  ( ) const;
+    /**end iterator of the region depending if its partitioned or not*/
+    const_iterator end() const;
 
-    /**number of region entries before partitining*/
-    size_type unpartitioned_size() const; 
+    /**number of region entries before partitioning*/
+    size_type unpartitioned_size() const;
 
   public:
     mesh_region(const mesh_region &other);
     mesh_region();
-    /** a mesh_region can be built from a integer parameter 
+    /** a mesh_region can be built from a integer parameter
     (a region number in a mesh),
-    but it won't be usable until 'from_mesh(m)' has been called 
+    but it won't be usable until 'from_mesh(m)' has been called
     Note that these regions are read-only, this constructor is
     mostly used for backward-compatibility.
     */
@@ -152,23 +137,23 @@ namespace getfem {
     /** provide a default value for the mesh_region parameters of assembly
     procedures etc. */
     static mesh_region all_convexes() {
-      return mesh_region(size_type(-1)); 
+      return mesh_region(size_type(-1));
     }
     /** return the intersection of two mesh regions */
-    static mesh_region intersection(const mesh_region& a, 
-                                    const mesh_region& b); 
+    static mesh_region intersection(const mesh_region& a,
+                                    const mesh_region& b);
     /** return the union of two mesh_regions */
-    static mesh_region merge(const mesh_region &a, 
+    static mesh_region merge(const mesh_region &a,
                              const mesh_region &b);
     /** subtract the second region from the first one */
-    static mesh_region subtract(const mesh_region &a, 
+    static mesh_region subtract(const mesh_region &a,
                                 const mesh_region &b);
     /** Test if the region is a boundary of a list of faces of elements of
         region `rg`. Return 0 if not, -1 if only partially, 1 if the region
         contains only some faces which are all faces of elements of `rg`. */
     int region_is_faces_of(const getfem::mesh& m1,
-                          const mesh_region &rg2,
-                          const getfem::mesh& m2) const;
+                           const mesh_region &rg2,
+                           const getfem::mesh& m2) const;
 
     size_type id() const { return id_; }
 
@@ -176,8 +161,8 @@ namespace getfem {
 
     void  set_type(size_type type)  { type_ = type; }
 
-    /** In multithreaded part of the program makes only a partition of the 
-    region visible in the index() and size() operations, as well as during 
+    /** In multithreaded part of the program makes only a partition of the
+    region visible in the index() and size() operations, as well as during
     iterations with mr_visitor. This is a default behaviour*/
     void  allow_partitioning();
 
@@ -185,20 +170,19 @@ namespace getfem {
     void bounding_box(base_node& Pmin, base_node& Pmax) const;
 
     /** Disregard partitioning, which means being able to see the whole region
-    in multirheaded code. Can be used, for instance, for contact problems
+    in multithreaded code. Can be used, for instance, for contact problems
     where master region is partitioned, while the slave region is not*/
     void  prohibit_partitioning();
 
     bool is_partitioning_allowed() const;
 
-    /** Extract the next region number 
+    /** Extract the next region number
     that does not yet exists in the mesh*/
     static size_type free_region_id(const getfem::mesh& m);
 
-
     /** For regions which have been built with just a number 'id',
-    from_mesh(m) sets the current region to 'm.region(id)'.  
-    (works only once) 
+    from_mesh(m) sets the current region to 'm.region(id)'.
+    (works only once)
     */
     const mesh_region& from_mesh(const mesh &m) const;
 
@@ -208,7 +192,7 @@ namespace getfem {
 
     face_bitset operator[](size_t cv) const;
 
-    /** Index of the region convexes, or the convexes from the partition on 
the 
+    /** Index of the region convexes, or the convexes from the partition on the
     current thread. */
     const dal::bit_vector& index() const;
     void add(const dal::bit_vector &bv);
@@ -226,7 +210,7 @@ namespace getfem {
 
     /** Number of convexes in the region, or on the partition on the current
     thread*/
-    size_type nb_convex() const { return  index().card();}  
+    size_type nb_convex() const { return  index().card();}
     bool is_empty() const;
     /** Return true if the region do contain only convex faces */
     bool is_only_faces() const;
@@ -242,8 +226,6 @@ namespace getfem {
     const mesh *get_parent_mesh(void) const { return parent_mesh; }
     void set_parent_mesh(mesh *pm) { parent_mesh = pm; }
 
-
-
     /** "iterator" class for regions. Usage similar to bv_visitor:
     for (mr_visitor i(region); !i.finished(); ++i) {
     ...
@@ -265,10 +247,10 @@ namespace getfem {
       void init(const mesh_region &s);
       void init(const dal::bit_vector &s);
 
-    public: 
+    public:
       visitor(const mesh_region &s);
       visitor(const mesh_region &s, const mesh &m,
-             bool intersect_with_mpi = false);
+        bool intersect_with_mpi = false);
       size_type cv() const { return cv_; }
       size_type is_face() const { return f_ != 0; }
       short_type f() const { return short_type(f_-1); }
@@ -278,10 +260,9 @@ namespace getfem {
 
       bool finished() const { return finished_; }
 
-      bool next_face() 
-      {
-       if (whole_mesh) return false;
-       if (c.none()) return false;
+      bool next_face(){
+        if (whole_mesh) return false;
+        if (c.none()) return false;
         do { ++f_; } while (!c.test(f_));
         c.set(f_,0);
         return true;
@@ -291,12 +272,9 @@ namespace getfem {
     friend std::ostream & operator <<(std::ostream &os, const mesh_region &w);
   };
 
-  typedef mesh_region::visitor mr_visitor;
+  using mr_visitor = mesh_region::visitor;
 
-  /* Dummy mesh_region for default parameter of functions. */
+  /** Dummy mesh_region for default parameter of functions. */
   const mesh_region &dummy_mesh_region();
 
-}
-
-
-#endif
+}  /* end of namespace getfem.                                             */
\ No newline at end of file
diff --git a/src/getfem/getfem_omp.h b/src/getfem/getfem_omp.h
index d264c7c..6cd784a 100644
--- a/src/getfem/getfem_omp.h
+++ b/src/getfem/getfem_omp.h
@@ -37,56 +37,45 @@
 This is the kernel of getfem.
 */
 #pragma once
-#ifndef GETFEM_OMP
-#define GETFEM_OMP
 
+#include <atomic>
+#include <memory>
+#include <set>
 #include <vector>
-#include <algorithm>
-
-#ifdef _WIN32
-#include <mbctype.h>
-#endif
 
-#include <locale.h>
-#include <memory>
-#include "gmm/gmm_std.h"
 #include "bgeot_config.h"
-#ifdef GETFEM_HAVE_OPENMP
-#include <omp.h>
-#include <boost/thread.hpp>
-#include <boost/shared_ptr.hpp>
-#include <boost/make_shared.hpp>
-#endif
 
+#ifdef GETFEM_HAS_OPENMP
+  #include <mutex>
+  #include <boost/thread.hpp> /**TODO: get rid of this dependency as soon
+                                       as thread_local is widely supported*/
+#endif
 
 namespace getfem
 {
   using bgeot::size_type;
-  class mesh;
-  class mesh_region;
 
-
-#ifdef GETFEM_HAVE_OPENMP
+#ifdef GETFEM_HAS_OPENMP
   //declaring a thread lock, to protect multi-threaded accesses to
   //asserts, traces and warnings. Using a global mutex
-  class omp_guard: public boost::lock_guard<boost::recursive_mutex>
+  class omp_guard: public std::lock_guard<std::recursive_mutex>
   {
   public:
     omp_guard();
+
   private:
-    static boost::recursive_mutex boost_mutex;
+    static std::recursive_mutex mutex_;
   };
 
-  //like boost::lock_guard, but copyable
+  //like std::lock_guard, but copyable
   class local_guard
   {
   public:
-    local_guard(boost::recursive_mutex&);
-    local_guard(const local_guard&);
+    local_guard(std::recursive_mutex&);
 
   private:
-    boost::recursive_mutex& mutex_;
-    boost::shared_ptr<boost::lock_guard<boost::recursive_mutex>> plock_;
+    std::recursive_mutex& mutex_;
+    std::shared_ptr<std::lock_guard<std::recursive_mutex>> plock_;
   };
 
   //produces scoped lock on the
@@ -94,13 +83,12 @@ namespace getfem
   class lock_factory
   {
   public:
-    lock_factory();
 
-    //get a lock object with RAII acuire/release semantics
+    //get a lock object with RAII acquire/release semantics
     //on the mutex from this factory
     local_guard get_lock() const;
   private:
-    mutable boost::recursive_mutex mutex_;
+    mutable std::recursive_mutex mutex_;
   };
 
 
@@ -114,284 +102,400 @@ namespace getfem
   };
 #endif
 
-
-#ifdef GETFEM_HAVE_OPENMP
-  /**number of OpenMP threads*/
-  inline size_t num_threads(){return omp_get_max_threads();}
-
   /**set maximum number of OpenMP threads*/
-  inline void set_num_threads(int n){omp_set_num_threads(n);}
+  void set_num_threads(int n);
 
-  /**index of the current thread*/
-  inline size_type this_thread() {return omp_get_thread_num();}
   /**is the program running in the parallel section*/
-  inline bool me_is_multithreaded_now(){return 
static_cast<bool>(omp_in_parallel());}
+  bool me_is_multithreaded_now();
 
-#else
-  inline size_type num_threads(){return size_type(1);}
-  inline void set_num_threads(int /* n */) { }
-  inline size_type this_thread() {return size_type(0);}
-  inline bool me_is_multithreaded_now(){return false;}
-#endif
+  /** is the program is running on a single thread*/
+  bool not_multithreaded();
 
+  /**Maximum number of threads that can run concurrently*/
+  size_type max_concurrency();
 
+  /**Thread policy, where partitioning is based on true threads*/
+  struct true_thread_policy{
+    static size_type this_thread();
+    static size_type num_threads();
+  };
 
-  /**use this template class for any object you want to
-  distribute to open_MP threads. The creation of this
-  object should happen in serial, while accessing the individual
-  thread local instances will take place in parallel. If
-  one needs creation of thread local object, use the macro
-  DEFINE_STATIC_THREAD_LOCAL
-  */
-  template <typename T> class omp_distribute {
-    std::vector<T> thread_values;
-    friend struct all_values_proxy;
-    struct all_values_proxy{
-      omp_distribute& distro;
-      all_values_proxy(omp_distribute& d): distro(d){}
-      void operator=(const T& x){
-        for(typename std::vector<T>::iterator it=distro.thread_values.begin();
-          it!=distro.thread_values.end();it++) *it=x;
-      }
+  /** Thread policy, regulated by partition_master
+     (can be true thread- or partition-based)*/
+  struct global_thread_policy{
+    static size_type this_thread();
+    static size_type num_threads();
+  };
+
+  //implementation classes for omp_distribute
+  namespace detail{
+
+    struct general_tag{};
+    struct vector_tag{};
+    struct bool_tag{};
+
+    template<typename T>
+    struct distribute_traits
+    {
+      using type = general_tag;
     };
-  public:
 
-    template <class... Args>
-    omp_distribute(Args&&... value){
-      for (size_type i = 0; i < num_threads(); ++i) 
thread_values.emplace_back(value...);
-    }
-    operator T& (){return thread_values[this_thread()];}
-    operator const T& () const {return thread_values[this_thread()];}
-    T& thrd_cast(){return thread_values[this_thread()];}
-    const T& thrd_cast() const {return thread_values[this_thread()];}
-    T& operator()(size_type i) {
-      return thread_values[i];
-    }
-    const T& operator()(size_type i) const {
-      return thread_values[i];
-    }
-    T& operator = (const T& x){
-      return (thread_values[this_thread()]=x);
-    }
+    template<typename T>
+    struct distribute_traits<std::vector<T>>
+    {
+      using type = vector_tag;
+    };
 
-    all_values_proxy all_threads(){return all_values_proxy(*this);}
+    template<>
+    struct distribute_traits<bool>
+    {
+      using type = bool_tag;
+    };
 
-    ~omp_distribute(){}
-  };
+    template<typename T, typename thread_policy, typename tag>
+    class omp_distribute_impl;
+
+    template <typename T, typename thread_policy>
+    class omp_distribute_impl<T, thread_policy, general_tag> {
+    private:
+      std::vector<T> thread_values;
+      friend struct all_values_proxy;
+
+      struct all_values_proxy{
+        omp_distribute_impl& distro;
+        all_values_proxy(omp_distribute_impl& d)
+          : distro(d)
+        {}
+
+        void operator = (const T& x){
+          for(auto it  = distro.thread_values.begin();
+                   it != distro.thread_values.end(); ++it){
+            *it=x;
+          }
+        }
+      };
+
+    public:
+
+      template <class... args>
+       explicit omp_distribute_impl(args&&... value){
+        thread_values.reserve(thread_policy::num_threads());
+        for (size_type i = 0; i != thread_policy::num_threads(); ++i){
+          thread_values.emplace_back(value...);
+        }
+      }
 
-  template <typename T> class omp_distribute<std::vector<T> > {
-    std::vector<std::vector<T> > thread_values;
-    friend struct all_values_proxy;
-    struct all_values_proxy{
-      omp_distribute& distro;
-      all_values_proxy(omp_distribute& d): distro(d){}
-      void operator=(const T& x){
-        for(typename std::vector<T>::iterator it=distro.thread_values.begin();
-          it!=distro.thread_values.end();it++) *it=x;
+      operator T& (){
+        return thread_values[thread_policy::this_thread()];
+      }
+
+      operator const T& () const {
+        return thread_values[thread_policy::this_thread()];
+      }
+
+      T& thrd_cast(){
+        return thread_values[thread_policy::this_thread()];
+      }
+
+      const T& thrd_cast() const {
+        return thread_values[thread_policy::this_thread()];
+      }
+
+      T& operator()(size_type i) {
+        return thread_values[i];
+      }
+
+      void on_thread_update() {
+        std::unique_ptr<omp_guard> p_guard = nullptr;
+        if (me_is_multithreaded_now()) p_guard = std::make_unique<omp_guard>();
+        if (thread_values.size() != thread_policy::num_threads()) {
+          thread_values.resize(thread_policy::num_threads());
+        }
+      }
+
+      size_type num_threads() const {
+        return thread_policy::num_threads();
+      }
+
+      size_type this_thread() const {
+        return thread_policy::this_thread();
+      }
+
+      const T& operator()(size_type i) const {
+        return thread_values[i];
+      }
+
+      T& operator = (const T& x){
+        if (me_is_multithreaded_now()){
+          thread_values[thread_policy::this_thread()] = x;
+        }
+        else all_threads() = x;
+
+        return thread_values[thread_policy::this_thread()];
+      }
+
+      all_values_proxy all_threads(){
+        return all_values_proxy(*this);
       }
     };
 
-  public:
-    typedef std::vector<T> VEC;
-    omp_distribute() : thread_values(num_threads()) {}
-    omp_distribute(size_t n, const T& value) :
-      thread_values(num_threads(), std::vector<T>(n,value)){}
-    operator VEC& (){return thread_values[this_thread()];}
-    operator const VEC& () const
-    {return thread_values[this_thread()];}
-    VEC& operator()(size_type i) {return thread_values[i];}
-    const VEC& operator()(size_type i) const {return thread_values[i];}
-    VEC& thrd_cast(){return thread_values[this_thread()];}
-    const VEC& thrd_cast() const
-    {return thread_values[this_thread()];}
-    T& operator[](size_type i)
-    {return thread_values[this_thread()][i];}
-    const T& operator[](size_type i) const
-    {return thread_values[this_thread()][i];}
-    T& operator = (const T& value) {
-      return (thread_values[this_thread()]=value);
-    }
-    all_values_proxy all_threads(){return all_values_proxy(*this);}
-    ~omp_distribute(){}
-  };
+    /**Specialization for std::vector<T>, adds vector indexing operator*/
+    template <typename T,
+              typename thread_policy>
+    class omp_distribute_impl<std::vector<T>, thread_policy, vector_tag>
+      : public omp_distribute_impl<std::vector<T>, thread_policy, general_tag>
+    {
+    public:
+      using base = omp_distribute_impl<std::vector<T>, thread_policy, 
general_tag>;
 
-  /**specialization for bool, to circumvent the shortcommings
-  of standards library specialization for std::vector<bool>*/
-  template <> class omp_distribute<bool> {
-    typedef int BOOL;
-    std::vector<BOOL> thread_values;
-    friend struct all_values_proxy;
-    struct all_values_proxy{
-      omp_distribute<bool>& distro;
-      all_values_proxy(omp_distribute& d): distro(d){}
-      void operator=(const bool& x);
+      template <class... args>
+      explicit omp_distribute_impl(args&&... value)
+        : base(value...)
+      {}
+
+      T& operator[](size_type i){
+        return base::thrd_cast()[i];
+      }
+      const T& operator[](size_type i) const{
+        return base::thrd_cast()[i];
+      }
+
+      std::vector<T>& operator = (const std::vector<T>& x){
+        return base::operator=(x);
+      }
     };
 
+    /**Specialization for bool, to circumvent the shortcomings
+    of standards library's specialization for std::vector<bool>,
+    we use std::vector<int> instead*/
+    template <typename thread_policy>
+    class omp_distribute_impl<bool, thread_policy, bool_tag>
+      : public omp_distribute_impl<int, thread_policy, general_tag>
+    {
+    public:
+      using base = omp_distribute_impl<int, thread_policy, general_tag>;
+
+      template <class... Args>
+      explicit omp_distribute_impl(Args&&... value)
+        : base(value...)
+      {}
+
+      operator bool () const {
+        return base::operator const int&();
+      }
 
+      bool operator = (const bool& x){
+        return base::operator=(x);
+      }
+    };
+
+  } /* end of namespace detail.                                             */
+
+  template<typename T, typename thread_policy>
+  using od_base = typename detail::omp_distribute_impl<
+                    T, thread_policy, detail::distribute_traits<T>::type>;
+
+  /**
+    Use this template class for any object you want to
+    distribute to open_MP threads. The creation of this
+    object should happen in serial, while accessing the individual
+    thread local instances will take place in parallel.
+    Use thread_policy to either distribute the objects between physical
+    threads or a fixed number of partitions, independent of the number
+    of threads. If you change the default policy, remember to also
+    use this_thread() and num_threads() from the corresponding policy
+    for iterating over the thread-specific components.
+  */
+  template<typename T,
+  typename thread_policy = global_thread_policy>
+  class omp_distribute : public od_base<T, thread_policy>
+  {
   public:
+    using base = od_base<T, thread_policy>;
 
-    omp_distribute() : thread_values(num_threads()) {}
-    omp_distribute(const bool& value) :
-      thread_values(num_threads(),value) {}
-    operator BOOL& (){return thread_values[this_thread()];}
-    operator const BOOL& () const {return thread_values[this_thread()];}
-    BOOL& thrd_cast(){return thread_values[this_thread()];}
-    const BOOL& thrd_cast() const {return thread_values[this_thread()];}
-    BOOL& operator()(size_type i) {
-      return thread_values[i];
-    }
-    const BOOL& operator()(size_type i) const {
-      return thread_values[i];
-    }
-    BOOL& operator = (const BOOL& x){
-      return (thread_values[this_thread()]=x);
-    }
-    all_values_proxy all_threads(){return all_values_proxy(*this);}
-    ~omp_distribute(){}
-  private:
+    template <class... args>
+    explicit omp_distribute(args&&... value)
+      : base(value...)
+    {}
 
+    auto operator = (const T& x) -> decltype(std::declval<base>() = x){
+      return base::operator=(x);
+    }
   };
 
   /* Use these macros only in function local context to achieve
   the effect of thread local storage for any type of objects
   and their initialization (it's more general and portable
   then using __declspec(thread))*/
-#ifdef GETFEM_HAVE_OPENMP
-#define        DEFINE_STATIC_THREAD_LOCAL_INITIALIZED(Type,Var,initial) \
-  static boost::thread_specific_ptr<Type> ptr_##Var; \
-  if(!ptr_##Var.get()) {ptr_##Var.reset(new Type(initial));} \
-  Type& Var=*ptr_##Var;
-
-#define        DEFINE_STATIC_THREAD_LOCAL(Type,Var) \
-  static boost::thread_specific_ptr<Type> ptr_##Var; \
-  if(!ptr_##Var.get()) {ptr_##Var.reset(new Type());} \
-  Type& Var=*ptr_##Var;
-
-#define DEFINE_STATIC_THREAD_LOCAL_CONSTRUCTED(Type, Var, ...) \
-  static boost::thread_specific_ptr<Type> ptr_##Var; \
-  if(!ptr_##Var.get()) {ptr_##Var.reset(new Type(__VA_ARGS__));} \
-  Type& Var=*ptr_##Var;
+  #ifdef GETFEM_HAS_OPENMP
 
-#else
-#define        DEFINE_STATIC_THREAD_LOCAL_INITIALIZED(Type,Var,initial) \
-  static Type Var(initial);
+    #define DEFINE_STATIC_THREAD_LOCAL_INITIALIZED(Type,Var,initial) \
+      static boost::thread_specific_ptr<Type> ptr_##Var; \
+      if(!ptr_##Var.get()) {ptr_##Var.reset(new Type(initial));} \
+      Type& Var=*ptr_##Var;
 
-#define        DEFINE_STATIC_THREAD_LOCAL(Type,Var) \
-  static Type Var;
+    #define DEFINE_STATIC_THREAD_LOCAL(Type,Var) \
+      static boost::thread_specific_ptr<Type> ptr_##Var; \
+      if(!ptr_##Var.get()) {ptr_##Var.reset(new Type());} \
+      Type& Var=*ptr_##Var;
 
-#define DEFINE_STATIC_THREAD_LOCAL_CONSTRUCTED(Type, Var, ...) \
-  static Type Var(__VA_ARGS__);
+    #define DEFINE_STATIC_THREAD_LOCAL_CONSTRUCTED(Type, Var, ...) \
+      static boost::thread_specific_ptr<Type> ptr_##Var; \
+      if(!ptr_##Var.get()) {ptr_##Var.reset(new Type(__VA_ARGS__));} \
+      Type& Var=*ptr_##Var;
 
-#endif
+  #else
 
-  class open_mp_is_running_properly{
-    static omp_distribute<bool> answer;
+    #define DEFINE_STATIC_THREAD_LOCAL_INITIALIZED(Type,Var,initial) \
+      static Type Var(initial);
 
-  public:
-    open_mp_is_running_properly();
-    ~open_mp_is_running_properly();
-    static bool is_it();
-  };
+    #define DEFINE_STATIC_THREAD_LOCAL(Type,Var) \
+      static Type Var;
 
-#if defined _WIN32 && !defined (__GNUC__)
-  /**parallelization function for a for loop*/
-  template<class LOOP_BODY>
-  inline void open_mp_for(int begin, int end,
-    const LOOP_BODY& loop_body){
-      _configthreadlocale(_ENABLE_PER_THREAD_LOCALE);
-      gmm::standard_locale locale;
-      open_mp_is_running_properly check;
-#pragma omp parallel default(shared)
-      {
-        _setmbcp(_MB_CP_ANSI);
-#pragma omp for schedule(static)
-        for(int i=begin;i<end;i++) loop_body(i);
-      }
-      _configthreadlocale(_DISABLE_PER_THREAD_LOCALE);
-  }
-#else /*LINUX*/
-  /**parallelization function for a for loop*/
-  template<class LOOP_BODY>
-  inline void open_mp_for(
-    int begin, int end, const LOOP_BODY& loop_body){
-      gmm::standard_locale locale;
-      open_mp_is_running_properly check;
-#pragma omp parallel default(shared)
-      {
-#pragma omp for schedule(static)
-        for(int i=begin;i<end;i++) loop_body(i);
-      }
-  }
-#endif
+    #define DEFINE_STATIC_THREAD_LOCAL_CONSTRUCTED(Type, Var, ...) \
+      static Type Var(__VA_ARGS__);
 
-  /**parallelization macro of a for loop*/
-#define OPEN_MP_FOR(begin,end,loop_counter,loop_body) \
-  getfem::open_mp_for(begin,end,loop_body(loop_counter));
+  #endif
 
-  /**used to partition a mesh region so that
-  each partition can be used on a different thread. Thread safe*/
-  class region_partition {
-    mesh* pparent_mesh;
-    std::shared_ptr<mesh_region> original_region;
-    mutable std::vector<size_type> partitions;
+  class partition_master;
+
+  /**Iterator that runs over partitions on the current
+     thread and sets the global (but thread-specific)
+     partition during incrementation*/
+  class partition_iterator
+  {
   public:
-    region_partition(mesh* mmesh=0,size_type id=-1);
-    region_partition(const region_partition& rp);
-    void operator=(const region_partition& rp);
-    size_type thread_local_partition() const;
+
+    partition_iterator operator ++();
+    bool operator==(const partition_iterator&) const;
+    bool operator!=(const partition_iterator&) const;
+    size_type operator*() const;
+
+  private:
+
+    friend class partition_master;
+
+    /**Only partition_master can create one*/
+    partition_iterator(partition_master &master,
+                       std::set<size_type>::const_iterator it);
+
+    partition_master &master;
+    std::set<size_type>::const_iterator it;
   };
 
-  /** Allows to re-throw exceptions, generated in OpemMP parallel section.
-      Collects exceptions from all threads and on destruction re-throws
-      the first one, so that
-      it can be again caught in the master thread. */
-  class thread_exception {
+  enum class thread_behaviour {true_threads, partition_threads};
+
+  /**
+    A singleton that Manages partitions on individual threads.
+  */
+  class partition_master
+  {
   public:
-    thread_exception();
-
-    /**re-throws the first captured exception*/
-    ~thread_exception();
-
-    /** Run function f in parallel part to capture it's exceptions.
-        Possible syntax can be:
-       thread_exception exception;
-       #pragma omp parallel...
-       {
-         exception.run([&]
-         {
-            your code that can throw exceptions
-         });
-       }
-       exception.rethrow();
-    */
-    template <typename Function, typename... Parameters>
-    void run(Function f, Parameters... params)
-    {
-#ifdef GETFEM_HAVE_OPENMP
-      try { f(params...); } catch (...) { captureException(); }
-#else
-      f(params...);
-#endif
-    }
 
-    /**vector of pointers to caught exceptions*/
-    std::vector<std::exception_ptr> caughtExceptions() const;
-    void rethrow();
+    static partition_master &get();
+
+    /**beginning of the partitions for the current thread*/
+    partition_iterator begin();
+
+    /**end of the partitions for the current thread*/
+    partition_iterator end();
+
+    /**Sets the behaviour for the full program: either partitioning parallel 
loops
+       according to the number of true threads, specified by the user,
+       or to the number of the fixed partitions equal to the max concurrency 
of the system.
+       The later makes the partitioning independent of the number of the 
threads set*/
+    void set_behaviour(thread_behaviour);
+
+    /**active partition on the thread. If number of threads is equal to the
+    max concurrency of the system, then it's also the index of the actual 
thread*/
+    size_type get_current_partition() const;
+
+    /**number of partitions or threads, depending on thread policy*/
+    size_type get_nb_partitions() const;
+
+    /**for thread_behaviour::partition_threads set the total number of 
partitions.
+      This call must be made before all the omp_distribute based classes are 
created.
+      Otherwise they become invalid*/
+    void set_nb_partitions(size_type);
+
+    void check_threads();
 
   private:
-    void captureException();
 
-    std::vector<std::exception_ptr> exceptions_;
+    void rewind_partitions();
+
+    //Parallel execution of a lambda. Please use the macros below
+    friend void parallel_execution(std::function<void(void)> lambda, bool 
iterate_over_partitions);
+
+    /**set current partition, which will be also returned in this_thread() 
call*/
+    void set_current_partition(size_type);
+
+    friend partition_iterator;
+
+    partition_master();
+
+    void update_partitions();
+
+    omp_distribute<std::set<size_type>, true_thread_policy> partitions;
+    omp_distribute<size_type, true_thread_policy> current_partition;
+    std::atomic<size_type> nb_user_threads;
+    thread_behaviour behaviour = thread_behaviour::partition_threads;
+    std::atomic<bool> partitions_updated = false;
+    size_type nb_partitions;
+
+    static partition_master instance;
   };
 
-  void parallel_execution(std::function<void(void)> lambda);
+  class standard_locale;
+  class thread_exception;
 
-#ifdef GETFEM_HAVE_OPENMP
-  #define GETFEM_OMP_PARALLEL(body) parallel_execution([&](){body;});
-#else
-  #define GETFEM_OMP_PARALLEL(body) body
-#endif
+  /**Encapsulates open_mp-related initialization and de-initialization*/
+  class parallel_boilerplate
+  {
+    std::unique_ptr<standard_locale> plocale;
+    std::unique_ptr<thread_exception> pexception;
+
+  public:
+    parallel_boilerplate();
+    void run_lamda(std::function<void(void)> lambda);
+    ~parallel_boilerplate();
+  };
+
+  #ifdef __GNUC__
+    #define pragma_op(arg) _Pragma("arg")
+  #else
+    #define pragma_op(arg) __pragma(arg)
+  #endif
+
+  /**
+   Organizes a proper parallel omp section:
+   - iteration on thread independent partitions
+   - passing exceptions to the master thread
+   - thread-safe locale
+   */
+  #ifdef GETFEM_HAS_OPENMP
+    #define GETFEM_OMP_PARALLEL(body) getfem::parallel_execution([&](){body;}, 
true);
+
+    /**execute in parallel, but do not iterate over partitions*/
+    #define GETFEM_OMP_PARALLEL_NO_PARTITION(body) 
getfem::parallel_execution([&](){body;}, false);
+
+    /**execute for loop in parallel. Not iterating over partitions*/
+    #define GETFEM_OMP_FOR(init, check, increment, body) {\
+      auto boilerplate = getfem::parallel_boilerplate{};  \
+      pragma_op(omp parallel for)                         \
+      for (init; check; increment){                       \
+        boilerplate.run_lamda([&](){body;});              \
+      }                                                   \
+    }
+
+  #else
+    #define GETFEM_OMP_PARALLEL(body) body
+    #define GETFEM_OMP_PARALLEL_NO_PARTITION(body) body;
+    #define GETFEM_OMP_FOR(init, check, increment, body)\
+      for (init; check; increment) {                    \
+        body                                            \
+      }
 
-}
+  #endif
 
-#endif //GETFEM_OMP
\ No newline at end of file
+}  /* end of namespace getfem.                                             */
\ No newline at end of file
diff --git a/src/getfem_assembling_tensors.cc b/src/getfem_assembling_tensors.cc
index 378863c..bf7548c 100644
--- a/src/getfem_assembling_tensors.cc
+++ b/src/getfem_assembling_tensors.cc
@@ -22,6 +22,7 @@
 #include "gmm/gmm_blas_interface.h"
 #include "getfem/getfem_arch_config.h"
 #include "getfem/getfem_assembling_tensors.h"
+#include "getfem/getfem_locale.h"
 #include "getfem/getfem_mat_elem.h"
 
 namespace getfem {
@@ -1140,7 +1141,7 @@ namespace getfem {
   }
 
   void asm_tokenizer::get_tok() {
-    gmm::standard_locale sl;
+    standard_locale sl;
     curr_tok_ival = -1;
     while (tok_pos < str.length() && isspace(str[tok_pos])) ++tok_pos;
     if (tok_pos == str.length()) {
diff --git a/src/getfem_generic_assembly_functions_and_operators.cc 
b/src/getfem_generic_assembly_functions_and_operators.cc
index 796e2c7..1b3a056 100644
--- a/src/getfem_generic_assembly_functions_and_operators.cc
+++ b/src/getfem_generic_assembly_functions_and_operators.cc
@@ -86,8 +86,8 @@ namespace getfem {
     }
     return false;
   }
-  
-  
+
+
   static scalar_type ga_Heaviside(scalar_type t) { return (t >= 0.) ? 1.: 0.; }
   static scalar_type ga_pos_part(scalar_type t) { return (t >= 0.) ? t : 0.; }
   static scalar_type ga_reg_pos_part(scalar_type t, scalar_type eps)
@@ -411,9 +411,9 @@ namespace getfem {
     : ftype_(1), dtype_(3), nbargs_(1), expr_(expr__),
       derivative1_(""), derivative2_(""), t(1, 0.), u(1, 0.), gis(nullptr) {}
 
-  
+
   ga_predef_function_tab::ga_predef_function_tab() {
-    
+
     ga_predef_function_tab &PREDEF_FUNCTIONS = *this;
 
     // Power functions and their derivatives
@@ -421,7 +421,7 @@ namespace getfem {
     PREDEF_FUNCTIONS["sqr"] = ga_predef_function(ga_sqr, 2, "2*t");
     PREDEF_FUNCTIONS["pow"] = ga_predef_function(pow, 1, "DER_PDFUNC1_POW",
                                                  "DER_PDFUNC2_POW");
-    
+
     PREDEF_FUNCTIONS["DER_PDFUNC_SQRT"] =
       ga_predef_function(ga_der_sqrt, 2, "-0.25/(t*sqrt(t))");
     PREDEF_FUNCTIONS["DER_PDFUNC1_POW"] =
@@ -529,7 +529,7 @@ namespace getfem {
       = ga_predef_function(ga_der_reg_pos_part, 1, "DER2_REG_POS_PART", "");
     PREDEF_FUNCTIONS["DER_REG_POS_PART"]
       = ga_predef_function(ga_der2_reg_pos_part);
-    
+
     PREDEF_FUNCTIONS["max"]
       = ga_predef_function(ga_max, 1, "DER_PDFUNC1_MAX", "DER_PDFUNC2_MAX");
     PREDEF_FUNCTIONS["min"]
@@ -545,7 +545,7 @@ namespace getfem {
   ga_spec_function_tab::ga_spec_function_tab() {
     // Predefined special functions
     ga_spec_function_tab &SPEC_FUNCTIONS = *this;
-    
+
     SPEC_FUNCTIONS.insert("pi");
     SPEC_FUNCTIONS.insert("meshdim");
     SPEC_FUNCTIONS.insert("timestep");
@@ -624,7 +624,7 @@ namespace getfem {
     PREDEF_FUNCTIONS[name] = ga_predef_function(expr);
     ga_predef_function &F = PREDEF_FUNCTIONS[name];
     F.gis = std::make_unique<instruction_set>();
-    for (size_type thread = 0; thread < num_threads(); ++thread)
+    for (size_type thread = 0; thread < F.workspace.num_threads(); ++thread)
     {
       F.workspace(thread).add_fixed_size_variable("t", gmm::sub_interval(0,1),
                                                   F.t(thread));
diff --git a/src/getfem_locale.cc b/src/getfem_locale.cc
new file mode 100644
index 0000000..ea15d14
--- /dev/null
+++ b/src/getfem_locale.cc
@@ -0,0 +1,58 @@
+/*===========================================================================
+
+ Copyright (C) 2012-2017 Andriy Andreykiv.
+
+ This file is a part of GetFEM++
+
+ GetFEM++  is  free software;  you  can  redistribute  it  and/or modify it
+ under  the  terms  of the  GNU  Lesser General Public License as published
+ by  the  Free Software Foundation;  either version 3 of the License,  or
+ (at your option) any later version along with the GCC Runtime Library
+ Exception either version 3.1 or (at your option) any later version.
+ This program  is  distributed  in  the  hope  that it will be useful,  but
+ WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ or  FITNESS  FOR  A PARTICULAR PURPOSE.  See the GNU Lesser General Public
+ License and GCC Runtime Library Exception for more details.
+ You  should  have received a copy of the GNU Lesser General Public License
+ along  with  this program;  if not, write to the Free Software Foundation,
+ Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+
+===========================================================================*/
+
+#include <iostream>
+
+#include <getfem/getfem_locale.h>
+#include <getfem/getfem_omp.h>
+
+namespace getfem{
+
+  #ifdef _WIN32
+
+    standard_locale::standard_locale()
+      : cinloc(std::cin.getloc()){
+      if (!me_is_multithreaded_now()){
+          cloc=setlocale(LC_NUMERIC, 0);
+          setlocale(LC_NUMERIC,"C");
+      }
+    }
+
+    standard_locale::~standard_locale() {
+      if (!me_is_multithreaded_now())
+          setlocale(LC_NUMERIC, cloc.c_str());
+
+    }
+
+  #else
+
+    standard_locale::standard_locale()
+      : cloc(setlocale(LC_NUMERIC, 0)), cinloc(cin.getloc()){
+      setlocale(LC_NUMERIC,"C"); cin.imbue(std::locale("C"));
+    }
+
+    standard_locale::~standard_locale(){
+      setlocale(LC_NUMERIC, cloc.c_str()); cin.imbue(cinloc);
+    }
+
+  #endif
+
+}  /* end of namespace getfem.                                             */
\ No newline at end of file
diff --git a/src/getfem_mesh_region.cc b/src/getfem_mesh_region.cc
index e558664..92df302 100644
--- a/src/getfem_mesh_region.cc
+++ b/src/getfem_mesh_region.cc
@@ -24,52 +24,55 @@
 #include "getfem/getfem_omp.h"
 
 namespace getfem {
-  typedef mesh_region::face_bitset face_bitset;
+
+  using face_bitset = mesh_region::face_bitset;
 
   mesh_region::mesh_region(const mesh_region &other)
-    : p(std::make_shared<impl>()), id_(size_type(-2)), parent_mesh(0),
-      index_updated(false)
-  {
+    : p(std::make_shared<impl>()), id_(size_type(-2)), parent_mesh(0) {
     this->operator=(other);
   }
 
-
-  mesh_region::mesh_region() : p(std::make_shared<impl>()), id_(size_type(-2)),
-                               type_(size_type(-1)),
-    partitioning_allowed(true), parent_mesh(0), index_updated(false)
-  {
+  mesh_region::mesh_region()
+    : p(std::make_shared<impl>()), id_(size_type(-2)), type_(size_type(-1)),
+    partitioning_allowed(true), parent_mesh(nullptr){
     if (me_is_multithreaded_now()) prohibit_partitioning();
+    mark_region_changed();
   }
 
   mesh_region::mesh_region(size_type id__) : id_(id__), type_(size_type(-1)),
-    partitioning_allowed(true), parent_mesh(0), index_updated(false)
-  { }
+    partitioning_allowed(true), parent_mesh(nullptr){
+    mark_region_changed();
+  }
 
   mesh_region::mesh_region(mesh& m, size_type id__, size_type type) :
-    p(std::make_shared<impl>()), id_(id__), type_(type), 
partitioning_allowed(true), parent_mesh(&m),
-    index_updated(false)
-  {
+    p(std::make_shared<impl>()), id_(id__), type_(type),
+    partitioning_allowed(true), parent_mesh(&m){
     if (me_is_multithreaded_now()) prohibit_partitioning();
+    mark_region_changed();
   }
 
-  mesh_region::mesh_region(const dal::bit_vector &bv) :
-    p(std::make_shared<impl>()), id_(size_type(-2)), type_(size_type(-1)),
-    partitioning_allowed(true), parent_mesh(0), index_updated(false)
-  {
+  mesh_region::mesh_region(const dal::bit_vector &bv)
+    : p(std::make_shared<impl>()), id_(size_type(-2)), type_(size_type(-1)),
+    partitioning_allowed(true), parent_mesh(nullptr){
     if (me_is_multithreaded_now()) prohibit_partitioning();
     add(bv);
+    mark_region_changed();
   }
 
-  void mesh_region::touch_parent_mesh()
-  {
+  void mesh_region::mark_region_changed() const{
+    index_updated.all_threads() = false;
+    partitions_updated.all_threads() = false;
+    serial_index_updated = false;
+  }
+
+  void mesh_region::touch_parent_mesh(){
     if (parent_mesh) parent_mesh->touch_from_region(id_);
   }
 
-  const mesh_region& mesh_region::from_mesh(const mesh &m) const
-  {
-    if (!p.get())
+  const mesh_region& mesh_region::from_mesh(const mesh &m) const{
+    if (!p)
     {
-      mesh_region *r = const_cast<mesh_region*>(this);
+      auto r = const_cast<mesh_region*>(this);
       if (id_ == size_type(-1))
       {
         r->p = std::make_shared<impl>();
@@ -80,146 +83,133 @@ namespace getfem {
         *r = m.region(id_);
       }
     }
-    index_updated = false;
+    mark_region_changed();
     return *this;
   }
 
-  mesh_region& mesh_region::operator=(const mesh_region &from)
-  {
-    if (!this->parent_mesh && !from.parent_mesh)
-    {
-      this->id_ = from.id_;
-      this->type_ = from.type_;
-      this->partitioning_allowed = from.partitioning_allowed;
-      if (from.p.get()) {
-        if (!this->p.get()) this->p = std::make_shared<impl>();
-        this->wp() = from.rp();
+  mesh_region& mesh_region::operator=(const mesh_region &from){
+    if (!parent_mesh && !from.parent_mesh){
+      id_ = from.id_;
+      type_ = from.type_;
+      partitioning_allowed = from.partitioning_allowed;
+      if (from.p) {
+        if (!p) p = std::make_shared<impl>();
+        wp() = from.rp();
       }
-      else
-        this->p.reset();
+      else p = nullptr;
     }
-    else if (!this->parent_mesh) {
-      this->p = from.p;
-      this->id_ = from.id_;
-      this->type_ = from.type_;
-      this->parent_mesh = from.parent_mesh;
-      this->partitioning_allowed = from.partitioning_allowed;
+    else if (!parent_mesh) {
+      p = from.p;
+      id_ = from.id_;
+      type_ = from.type_;
+      parent_mesh = from.parent_mesh;
+      partitioning_allowed = from.partitioning_allowed;
     }
     else {
-      if (from.p.get())
-      {
-        this->wp() = from.rp();
-        this->type_= from.get_type();
-        this->partitioning_allowed = from.partitioning_allowed;
+      if (from.p){
+        wp() = from.rp();
+        type_= from.get_type();
+        partitioning_allowed = from.partitioning_allowed;
       }
       else if (from.id_ == size_type(-1)) {
-        this->clear();
-        this->add(this->parent_mesh->convex_index());
-        this->type_ = size_type(-1);
-        this->partitioning_allowed = true;
+        clear();
+        add(parent_mesh->convex_index());
+        type_ = size_type(-1);
+        partitioning_allowed = true;
       }
       touch_parent_mesh();
     }
-    index_updated = false;
+    mark_region_changed();
     return *this;
   }
 
-  bool mesh_region::compare(const mesh &m1, const mesh_region &mr,
+  bool mesh_region::compare(const mesh &m1,
+                            const mesh_region &mr,
                             const mesh &m2) const {
     if (&m1 != &m2) return false;
-    if (!p.get() && !mr.p.get()) return (id_ == mr.id_);
-    this->from_mesh(m1);
+    if (!p && !mr.p) return (id_ == mr.id_);
+    from_mesh(m1);
     mr.from_mesh(m2);
-    if (this->p.get() && !(mr.p.get())) return false;
-    if (!(this->p.get()) && (mr.p.get())) return false;
-    if (this->p.get())
-      if (this->p.get()->m != mr.p.get()->m) return false;
+    if (p && !(mr.p)) return false;
+    if (!p && mr.p) return false;
+    if (p)
+      if (p->m != mr.p->m) return false;
     return true;
   }
 
-  face_bitset mesh_region::operator[](size_t cv) const
-  {
-    map_t::const_iterator it = rp().m.find(cv);
-    if (it != rp().m.end()) return (*it).second;
-    else return face_bitset();
+  face_bitset mesh_region::operator[](size_t cv) const{
+    auto it = rp().m.find(cv);
+    if (it != rp().m.end()) return it->second;
+    else return {};
   }
 
-  void mesh_region::update_partition_iterators() const
-  {
-    if (index_updated) return;
+  void mesh_region::update_partition_iterators() const{
+    if (partitions_updated) return;
     itbegin = partition_begin();
-    itend   = partition_end  ();
-    index_updated = true;
+    itend = partition_end  ();
+    partitions_updated = true;
   }
   mesh_region::const_iterator
-    mesh_region::partition_begin( ) const
-  {
-    size_type region_size = rp().m.size();
-    if (region_size < num_threads())
-    { //for small regions: put the whole region into zero thread
-      if (this_thread() == 0) return rp().m.begin();
+    mesh_region::partition_begin( ) const{
+    auto region_size = rp().m.size();
+    if (region_size < partitions_updated.num_threads()){
+      //for small regions: put the whole region into zero thread
+      if (partitions_updated.this_thread() == 0) return rp().m.begin();
       else return rp().m.end();
     }
-    size_type partition_size = static_cast<size_type>
+    auto partition_size = static_cast<size_type>
       (std::ceil(static_cast<scalar_type>(region_size)/
-       static_cast<scalar_type >(num_threads())));
-    size_type index_begin = partition_size * this_thread();
-    if (index_begin >= region_size ) return  rp().m.end();
+       static_cast<scalar_type >(partitions_updated.num_threads())));
+    auto index_begin = partition_size * partitions_updated.this_thread();
+    if (index_begin >= region_size ) return rp().m.end();
 
-    const_iterator it = rp().m.begin();
-    for (size_type i=0;i<index_begin;++i) ++it;
+    auto it = rp().m.begin();
+    for (size_type i = 0; i != index_begin; ++i) ++it;
     return it;
   }
 
   mesh_region::const_iterator
-    mesh_region::partition_end( ) const
-  {
-    size_type region_size = rp().m.size();
-    if (region_size < num_threads()) return rp().m.end();
+    mesh_region::partition_end( ) const{
+    auto region_size = rp().m.size();
+    if (region_size< partitions_updated.num_threads()) return rp().m.end();
 
-    size_type partition_size = static_cast<size_type>
+    auto partition_size = static_cast<size_type>
       (std::ceil(static_cast<scalar_type>(region_size)/
-       static_cast<scalar_type >(num_threads())));
-    size_type index_end = partition_size * (this_thread() + 1);
+       static_cast<scalar_type >(partitions_updated.num_threads())));
+    auto index_end = partition_size * (partitions_updated.this_thread() + 1);
     if (index_end >= region_size ) return  rp().m.end();
 
-    const_iterator it = rp().m.begin();
-    for (size_type i=0;i<index_end && it!=rp().m.end();++i) ++it;
+    auto it = rp().m.begin();
+    for (size_type i = 0; i < index_end && it != rp().m.end(); ++i) ++it;
     return it;
   }
 
-  mesh_region::const_iterator mesh_region::begin( ) const
-  {
+  mesh_region::const_iterator mesh_region::begin() const{
     GMM_ASSERT1(p != 0, "Internal error");
-    if (me_is_multithreaded_now() && partitioning_allowed)
-    {
+    if (me_is_multithreaded_now() && partitioning_allowed){
       update_partition_iterators();
       return itbegin;
     }
-    else { return rp().m.begin(); }
+    else return rp().m.begin();
   }
 
-  mesh_region::const_iterator mesh_region::end  ( ) const
-  {
-    if (me_is_multithreaded_now() && partitioning_allowed)
-    {
+  mesh_region::const_iterator mesh_region::end() const{
+    if (me_is_multithreaded_now() && partitioning_allowed){
       update_partition_iterators();
       return itend;
     }
     else return rp().m.end();
   }
 
-  void  mesh_region::allow_partitioning()
-  {
-    if (me_is_multithreaded_now()) partitioning_allowed = true;
-    else partitioning_allowed.all_threads() = true;
+  void mesh_region::allow_partitioning(){
+    partitioning_allowed = true;
   }
 
-  void mesh_region::bounding_box(base_node& Pmin, base_node& Pmax) const {
-    auto &mesh = *this->get_parent_mesh();
-    for (auto cv : dal::bv_iterable_c(index())) {
+  void mesh_region::bounding_box(base_node& Pmin, base_node& Pmax) const{
+    auto &mesh = *get_parent_mesh();
+    for (auto &&cv : dal::bv_iterable_c{index()}) {
       for (const auto &pt : mesh.points_of_convex(cv)) {
-        for (auto j = 0; j < Pmin.size(); j++){
+        for (auto j = 0; j != Pmin.size(); ++j){
           Pmin[j] = std::min(Pmin[j], pt[j]);
           Pmax[j] = std::max(Pmax[j], pt[j]);
         }
@@ -227,92 +217,95 @@ namespace getfem {
     }
   }
 
-  void  mesh_region::prohibit_partitioning()
-  {
-    if (me_is_multithreaded_now()) partitioning_allowed = false;
-    else partitioning_allowed.all_threads() = false;
+  void  mesh_region::prohibit_partitioning(){
+    partitioning_allowed = false;
   }
 
-  bool mesh_region::is_partitioning_allowed() const
-  {
+  bool mesh_region::is_partitioning_allowed() const{
     return partitioning_allowed;
   }
 
-  /* may be optimized .. */
-  const dal::bit_vector&  mesh_region::index() const
-  {
-    GMM_ASSERT1(p.get(), "Use from_mesh on that region before");
-    dal::bit_vector& convex_index = rp().index_.thrd_cast();
-    convex_index.clear();
-    for (const_iterator it = begin(); it != end(); ++it)
-    {
-      if (it->second.any()) convex_index.add(it->first);
+  void mesh_region::update_index() const{
+    auto& convex_index = me_is_multithreaded_now() ?
+                           rp().index_.thrd_cast() : rp().serial_index_;
+    if (convex_index.card() != 0) convex_index.clear();
+    for (auto &&pair : *this){
+      if (pair.second.any()) convex_index.add(pair.first);
     }
-    return convex_index;
   }
 
-  void mesh_region::add(const dal::bit_vector &bv)
-  {
-    for (dal::bv_visitor i(bv); !i.finished(); ++i)
-    {
-      wp().m[i].set(0,1);
+  const dal::bit_vector&  mesh_region::index() const{
+    GMM_ASSERT1(p, "Use from_mesh on that region before");
+    if (me_is_multithreaded_now()) {
+      if (!index_updated){
+        update_index();
+        index_updated = true;
+      }
+      return rp().index_;
+    }
+    else {
+      if (!serial_index_updated){
+        update_index();
+        serial_index_updated = true;
+      }
+      return rp().serial_index_;
+    }
+  }
+
+  void mesh_region::add(const dal::bit_vector &bv){
+    for (dal::bv_visitor i(bv); !i.finished(); ++i){
+      wp().m[i].set(0, 1);
     }
     touch_parent_mesh();
-    index_updated = false;
+    mark_region_changed();
   }
 
-  void mesh_region::add(size_type cv, short_type f)
-  {
-    wp().m[cv].set(short_type(f+1),1);
+  void mesh_region::add(size_type cv, short_type f){
+    wp().m[cv].set(short_type(f + 1), 1);
     touch_parent_mesh();
-    index_updated = false;
+    mark_region_changed();
   }
 
-  void mesh_region::sup_all(size_type cv)
-  {
-    map_t::iterator it = wp().m.find(cv);
-    if (it != wp().m.end()) {
+  void mesh_region::sup_all(size_type cv){
+    auto it = wp().m.find(cv);
+    if (it != wp().m.end()){
       wp().m.erase(it);
       touch_parent_mesh();
+      mark_region_changed();
     }
-    index_updated = false;
   }
 
-  void mesh_region::sup(size_type cv, short_type f)
-  {
-    map_t::iterator it = wp().m.find(cv);
+  void mesh_region::sup(size_type cv, short_type f){
+    auto it = wp().m.find(cv);
     if (it != wp().m.end()) {
-      (*it).second.set(short_type(f+1),0);
-      if ((*it).second.none()) wp().m.erase(it);
+      it->second.set(short_type(f + 1), 0);
+      if (it->second.none()) wp().m.erase(it);
       touch_parent_mesh();
+      mark_region_changed();
     }
-    index_updated = false;
   }
 
-  void mesh_region::clear()
-  {
-    wp().m.clear(); touch_parent_mesh();
-    index_updated = false;
+  void mesh_region::clear(){
+    wp().m.clear();
+    touch_parent_mesh();
+    mark_region_changed();
   }
 
-  void mesh_region::clean()
-  {
+  void mesh_region::clean(){
     for (map_t::iterator it = wp().m.begin(), itn;
-      it != wp().m.end(); it = itn)
-    {
+         it != wp().m.end(); it = itn) {
       itn = it;
       ++itn;
-      if ( !(*it).second.any() ) wp().m.erase(it);
+      if (!(*it).second.any()) wp().m.erase(it);
     }
     touch_parent_mesh();
-    index_updated = false;
+    mark_region_changed();
   }
 
-
-  void mesh_region::swap_convex(size_type cv1, size_type cv2)
-  {
-    map_t::iterator it1 = wp().m.find(cv1), it2 = wp().m.find(cv2),
-      ite = wp().m.end();
+  void mesh_region::swap_convex(size_type cv1, size_type cv2){
+    auto it1 = wp().m.find(cv1);
+    auto it2 = wp().m.find(cv2);
+    auto ite = wp().m.end();
     face_bitset f1, f2;
 
     if (it1 != ite) f1 = it1->second;
@@ -322,96 +315,81 @@ namespace getfem {
     if (!f2.none()) wp().m[cv1] = f2;
     else if (it1 != ite) wp().m.erase(it1);
     touch_parent_mesh();
-    index_updated = false;
+    mark_region_changed();
   }
 
-  bool mesh_region::is_in(size_type cv, short_type f) const
-  {
-    GMM_ASSERT1(p.get(), "Use from mesh on that region before");
-    map_t::const_iterator it = rp().m.find(cv);
+  bool mesh_region::is_in(size_type cv, short_type f) const{
+    GMM_ASSERT1(p, "Use from mesh on that region before");
+    auto it = rp().m.find(cv);
     if (it == rp().m.end() || short_type(f+1) >= MAX_FACES_PER_CV) return 
false;
     return ((*it).second)[short_type(f+1)];
   }
 
-  bool mesh_region::is_in(size_type cv, short_type f, const mesh &m) const
-  {
-    if (p.get()) {
-      map_t::const_iterator it = rp().m.find(cv);
+  bool mesh_region::is_in(size_type cv, short_type f, const mesh &m) const{
+    if (p) {
+      auto it = rp().m.find(cv);
       if (it == rp().m.end() || short_type(f+1) >= MAX_FACES_PER_CV)
         return false;
       return ((*it).second)[short_type(f+1)];
     }
-    else
-    {
+    else{
       if (id() == size_type(-1)) return true;
       else return m.region(id()).is_in(cv, f);
     }
   }
 
-
-
-  bool mesh_region::is_empty() const
-  {
+  bool mesh_region::is_empty() const{
     return rp().m.empty();
   }
 
-  bool mesh_region::is_only_convexes() const
-  {
+  bool mesh_region::is_only_convexes() const{
     return is_empty() ||
       (or_mask()[0] == true && or_mask().count() == 1);
   }
 
-  bool mesh_region::is_only_faces() const
-  {
+  bool mesh_region::is_only_faces() const{
     return is_empty() || (and_mask()[0] == false);
   }
 
-  face_bitset mesh_region::faces_of_convex(size_type cv) const
-  {
-    map_t::const_iterator it = rp().m.find(cv);
+  face_bitset mesh_region::faces_of_convex(size_type cv) const{
+    auto it = rp().m.find(cv);
     if (it != rp().m.end()) return ((*it).second) >> 1;
     else return face_bitset();
   }
 
-  face_bitset mesh_region::and_mask() const
-  {
+  face_bitset mesh_region::and_mask() const{
     face_bitset bs;
     if (rp().m.empty()) return bs;
     bs.set();
-    for (map_t::const_iterator it = rp().m.begin(); it != rp().m.end(); ++it)
+    for (auto it = rp().m.begin(); it != rp().m.end(); ++it)
       if ( (*it).second.any() )  bs &= (*it).second;
     return bs;
   }
 
-  face_bitset mesh_region::or_mask() const
-  {
+  face_bitset mesh_region::or_mask() const{
     face_bitset bs;
     if (rp().m.empty()) return bs;
-    for (map_t::const_iterator it = rp().m.begin(); it != rp().m.end(); ++it)
+    for (auto it = rp().m.begin(); it != rp().m.end(); ++it)
       if ( (*it).second.any() )  bs |= (*it).second;
     return bs;
   }
 
-  size_type mesh_region::size() const
-  {
-    size_type sz=0;
-    for (map_t::const_iterator it = begin(); it != end(); ++it)
+  size_type mesh_region::size() const{
+    size_type sz = 0;
+    for (auto it = begin(); it != end(); ++it)
       sz += (*it).second.count();
     return sz;
   }
 
-  size_type mesh_region::unpartitioned_size() const
-  {
-    size_type sz=0;
-    for (map_t::const_iterator it = rp().m.begin(); it != rp().m.end(); ++it)
+  size_type mesh_region::unpartitioned_size() const{
+    size_type sz = 0;
+    for (auto it = rp().m.begin(); it != rp().m.end(); ++it)
       sz += (*it).second.count();
     return sz;
   }
 
-
   mesh_region mesh_region::intersection(const mesh_region &a,
-                                        const mesh_region &b)
-  {
+                                        const mesh_region &b){
     GMM_TRACE4("intersection of "<<a.id()<<" and "<<b.id());
     mesh_region r;
     /* we do not allow the "all_convexes" kind of regions
@@ -421,20 +399,17 @@ namespace getfem {
     GMM_ASSERT1(a.id() !=  size_type(-1)||
                 b.id() != size_type(-1), "the 'all_convexes' regions "
                 "are not supported for set operations");
-    if (a.id() == size_type(-1))
-    {
+    if (a.id() == size_type(-1)){
       for (const_iterator it = b.begin();it != b.end(); ++it) 
r.wp().m.insert(*it);
       return r;
     }
-    else if (b.id() == size_type(-1))
-    {
+    else if (b.id() == size_type(-1)){
       for (const_iterator it = a.begin();it != a.end(); ++it) 
r.wp().m.insert(*it);
       return r;
     }
 
-    map_t::const_iterator
-      ita = a.begin(), enda = a.end(),
-      itb = b.begin(), endb = b.end();
+    auto ita = a.begin(), enda = a.end(),
+         itb = b.begin(), endb = b.end();
 
     while (ita != enda && itb != endb) {
       if (ita->first < itb->first) ++ita;
@@ -452,19 +427,16 @@ namespace getfem {
   }
 
   mesh_region mesh_region::merge(const mesh_region &a,
-                                 const mesh_region &b)
-  {
+                                 const mesh_region &b){
     GMM_TRACE4("Merger of " << a.id() << " and " << b.id());
     mesh_region r;
     GMM_ASSERT1(a.id() != size_type(-1) &&
       b.id() != size_type(-1), "the 'all_convexes' regions "
       "are not supported for set operations");
-    for (const_iterator it = a.begin();it != a.end(); ++it)
-    {
+    for (auto it = a.begin();it != a.end(); ++it){
       r.wp().m.insert(*it);
     }
-    for (const_iterator it = b.begin();it != b.end(); ++it)
-    {
+    for (auto it = b.begin();it != b.end(); ++it){
       r.wp().m[it->first] |= it->second;
     }
     return r;
@@ -472,20 +444,18 @@ namespace getfem {
 
 
   mesh_region mesh_region::subtract(const mesh_region &a,
-                                    const mesh_region &b)
-  {
+                                    const mesh_region &b){
     GMM_TRACE4("subtraction of "<<a.id()<<" and "<<b.id());
     mesh_region r;
     GMM_ASSERT1(a.id() != size_type(-1) &&
       b.id() != size_type(-1), "the 'all_convexes' regions "
       "are not supported for set operations");
-    for (const_iterator ita = a.begin();ita != a.end(); ++ita)
+    for (auto ita = a.begin();ita != a.end(); ++ita)
       r.wp().m.insert(*ita);
 
-    for (const_iterator itb = b.begin();itb != b.end(); ++itb)
-    {
-      size_type cv = itb->first;
-      map_t::iterator it = r.wp().m.find(cv);
+    for (auto itb = b.begin();itb != b.end(); ++itb){
+      auto cv = itb->first;
+      auto it = r.wp().m.find(cv);
       if (it != r.wp().m.end()){
                 it->second &= ~(itb->second);
                 if (it->second.none()) r.wp().m.erase(it);
@@ -496,7 +466,7 @@ namespace getfem {
 
   int mesh_region::region_is_faces_of(const getfem::mesh& m1,
                                       const mesh_region &rg2,
-                                      const getfem::mesh& m2) const {
+                                      const getfem::mesh& m2) const{
     if (&m1 != &m2) return 0;
     int r = 1, partially = 0;
     for (mr_visitor cv(*this, m1); !cv.finished(); cv.next())
@@ -504,34 +474,27 @@ namespace getfem {
         partially = -1;
       else
         r = 0;
-    if (r == 1) return 1; else return partially;
+    return r == 1 ? 1 : partially;
   }
 
-  size_type mesh_region::free_region_id(const getfem::mesh& m)
-  {
+  size_type mesh_region::free_region_id(const getfem::mesh& m){
     return m.regions_index().last_true()+1;
   }
 
-
-  void mesh_region::error_if_not_faces() const
-  {
+  void mesh_region::error_if_not_faces() const{
     GMM_ASSERT1(is_only_faces(), "Expecting a set of faces, not convexes");
   }
 
-  void mesh_region::error_if_not_convexes() const
-  {
+  void mesh_region::error_if_not_convexes() const{
     GMM_ASSERT1(is_only_convexes(), "Expecting a set of convexes, not faces");
   }
 
-  void mesh_region::error_if_not_homogeneous() const
-  {
+  void mesh_region::error_if_not_homogeneous() const{
     GMM_ASSERT1(is_only_faces() || is_only_convexes(), "Expecting a set "
                 "of convexes or a set of faces, but not a mixed set");
   }
 
 
-
-
 #if GETFEM_PARA_LEVEL > 1
 
   mesh_region::visitor::visitor(const mesh_region &s, const mesh &m,
@@ -568,12 +531,12 @@ namespace getfem {
 #else
 
   mesh_region::visitor::visitor(const mesh_region &s, const mesh &m, bool)
-    :cv_(size_type(-1)), f_(short_type(-1)), finished_(false)
-  {
+    :cv_(size_type(-1)), f_(short_type(-1)), finished_(false){
     if ((me_is_multithreaded_now() && s.partitioning_allowed)) {
       s.from_mesh(m);
       init(s);
-    } else {
+    }
+    else {
       if (s.id() == size_type(-1)) {
         init(m.convex_index());
       } else if (s.id() == size_type(-2) && s.p.get()) {
@@ -587,77 +550,72 @@ namespace getfem {
 
 #endif
 
-
-  bool mesh_region::visitor::next()
-  {
+  bool mesh_region::visitor::next(){
     if (whole_mesh) {
-      if (itb == iteb) { finished_ = true; return false; }
+      if (itb == iteb) {
+        finished_ = true;
+        return false;
+      }
       cv_ = itb.index();
       c = 0;
       f_ = 0;
-      ++itb; while (itb != iteb && !(*itb)) ++itb;
+      ++itb;
+      while (itb != iteb && !(*itb)) ++itb;
       return true;
     }
-    while (c.none())
-      {
-        if (it == ite) { finished_=true; return false; }
-        cv_ = it->first;
-        c   = it->second;
-        f_ = short_type(-1);
-        ++it;
-        if (c.none()) continue;
-      }
+    while (c.none()){
+      if (it == ite) { finished_=true; return false; }
+      cv_ = it->first;
+      c   = it->second;
+      f_ = short_type(-1);
+      ++it;
+      if (c.none()) continue;
+    }
     next_face();
     return true;
   }
 
   mesh_region::visitor::visitor(const mesh_region &s) :
-    cv_(size_type(-1)), f_(short_type(-1)), finished_(false)
-  {
+    cv_(size_type(-1)), f_(short_type(-1)), finished_(false){
     init(s);
   }
 
-  void mesh_region::visitor::init(const dal::bit_vector &bv)
-  {
+  void mesh_region::visitor::init(const dal::bit_vector &bv){
     whole_mesh = true;
     itb = bv.begin(); iteb = bv.end();
     while (itb != iteb && !(*itb)) ++itb;
     next();
   }
 
-  void mesh_region::visitor::init(const mesh_region &s)
-  {
+  void mesh_region::visitor::init(const mesh_region &s){
     whole_mesh = false;
     it  = s.begin();
     ite = s.end();
     next();
   }
 
-  std::ostream & operator <<(std::ostream &os, const mesh_region &w)
-  {
+  std::ostream & operator <<(std::ostream &os, const mesh_region &w){
     if (w.id() == size_type(-1))
       os << " ALL_CONVEXES";
-    else if (w.p.get())
-    {
-      for (mr_visitor cv(w); !cv.finished(); cv.next())
-        {
-          os << cv.cv();
-          if (cv.is_face()) os << "/" << cv.f();
-          os << " ";
-        }
+    else if (w.p){
+      for (mr_visitor cv(w); !cv.finished(); cv.next()){
+        os << cv.cv();
+        if (cv.is_face()) os << "/" << cv.f();
+        os << " ";
+      }
     }
-    else
-    {
+    else{
       os << " region " << w.id();
     }
     return os;
   }
 
-  struct dummy_mesh_region_ {
+  struct dummy_mesh_region_{
     mesh_region mr;
-    dummy_mesh_region_() : mr() {}
   };
 
-  const mesh_region &dummy_mesh_region()
-  { return dal::singleton<dummy_mesh_region_>::instance().mr; }
-}
+  const mesh_region &dummy_mesh_region() {
+    return dal::singleton<dummy_mesh_region_>::instance().mr;
+  }
+
+}  /* end of namespace getfem.                                             */
\ No newline at end of file
diff --git a/src/getfem_models.cc b/src/getfem_models.cc
index 288569e..b3a7831 100644
--- a/src/getfem_models.cc
+++ b/src/getfem_models.cc
@@ -24,69 +24,14 @@
 #include "gmm/gmm_solver_cg.h"
 #include "gmm/gmm_condition_number.h"
 #include "getfem/getfem_models.h"
+#include "getfem/getfem_accumulated_distro.h"
 #include "getfem/getfem_assembling.h"
 #include "getfem/getfem_derivatives.h"
 #include "getfem/getfem_interpolation.h"
 #include "getfem/getfem_generic_assembly.h"
 
-
 namespace getfem {
 
-
-/** multi-threaded distribution of a single vector or a matrix. Uses RAII 
semantics
-  (constructor/destructor)  */
-  template <class CONTAINER> class distro
-  {
-    CONTAINER& original;
-    omp_distribute<CONTAINER> distributed;
-
-    void build_distro(gmm::abstract_matrix)
-    {
-      for(size_type thread = 1; thread < num_threads(); thread++)
-      {
-        gmm::resize(distributed(thread), 
gmm::mat_nrows(original),gmm::mat_ncols(original));
-      }
-    }
-
-    void build_distro(gmm::abstract_vector)
-    {
-      //.. skipping thread 0 ..
-      for(size_type thread = 1; thread < num_threads(); thread++)
-      {
-        gmm::resize(distributed(thread), gmm::vect_size(original));
-      }
-    }
-
-    bool not_multithreaded() const { return num_threads() == 1; }
-
-  public:
-
-    distro(CONTAINER& c) : original(c)
-    {
-      if (not_multithreaded()) return;
-      build_distro(typename gmm::linalg_traits<CONTAINER>::linalg_type());
-    }
-
-    operator CONTAINER&()
-    {
-      if (not_multithreaded() || this_thread() == 0) return original;
-      else return distributed(this_thread());
-    }
-
-    ~distro()
-    {
-      if (not_multithreaded()) return;
-
-      GMM_ASSERT1(!me_is_multithreaded_now(),
-                  "List accumulation should not run in parallel");
-
-      for(size_type thread = 1; thread < num_threads(); thread++)
-      {
-        gmm::add(distributed(thread), original);
-      }
-    }
-  };
-
   model::model(bool comp_version) {
     init(); complex_version = comp_version;
     is_linear_ = is_symmetric_ = is_coercive_ = true;
@@ -515,7 +460,7 @@ namespace getfem {
                   GMM_TRACE2("Generic assembly for actualize sizes");
                   {
                     gmm::clear(rTM);
-                    distro<decltype(rTM)>  distro_rTM(rTM);
+                    accumulated_distro<decltype(rTM)>  distro_rTM(rTM);
                     GETFEM_OMP_PARALLEL(
                         ga_workspace workspace(*this);
                         for (const auto &ge : generic_expressions)
@@ -524,7 +469,7 @@ namespace getfem {
                         workspace.set_assembled_matrix(distro_rTM);
                         workspace.assembly(2);
                     );
-                  } //distro scope
+                  } //accumulated_distro scope
                   gmm::add
                     (gmm::sub_matrix(rTM, vdescr.I, multdescr.I), MM);
                   gmm::add(gmm::transposed
@@ -1752,101 +1697,6 @@ namespace getfem {
 
   void model::post_to_variables_step(){}
 
-  /**takes a list (more often it's a std::vector) of matrices
-  or vectors, creates an empty copy on each thread. When the
-  thread computations are done (in the destructor), accumulates
-  the assembled copies into the original. Note: the matrices or
-  vectors in the list are gmm::cleared, deleting the content
-  in the constructor*/
-  template <class CONTAINER_LIST> class list_distro
-  {
-    CONTAINER_LIST& original_list;
-    omp_distribute<CONTAINER_LIST> distributed_list;
-    typedef typename CONTAINER_LIST::value_type value_type;
-
-    void build_distro(gmm::abstract_matrix)
-    {
-      //intentionally skipping thread 0, as list_distro will
-      //use original_list for it
-      for(size_type thread = 1; thread < num_threads(); thread++)
-      {
-        auto it_original = original_list.begin();
-        auto it_distributed = distributed_list(thread).begin();
-        for(;it_original != original_list.end(); ++it_original, 
++it_distributed)
-        {
-          gmm::resize(*it_distributed, 
gmm::mat_nrows(*it_original),gmm::mat_ncols(*it_original));
-        }
-      }
-    }
-
-    void build_distro(gmm::abstract_vector)
-    {
-      //.. skipping thread 0 ..
-      for(size_type thread = 1; thread < num_threads(); thread++)
-      {
-        auto it_original = original_list.begin();
-        auto it_distributed = distributed_list(thread).begin();
-        for(;it_original != original_list.end(); ++it_original, 
++it_distributed)
-        {
-          gmm::resize(*it_distributed, gmm::vect_size(*it_original));
-        }
-      }
-    }
-
-    bool not_multithreaded() const { return num_threads() == 1; }
-
-  public:
-
-    list_distro(CONTAINER_LIST& l) : original_list(l)
-    {
-      if (not_multithreaded()) return;
-
-      for(size_type thread=1; thread<num_threads(); thread++)
-          distributed_list(thread).resize(original_list.size());
-
-      build_distro(typename gmm::linalg_traits<value_type>::linalg_type());
-    }
-
-    operator CONTAINER_LIST&()
-    {
-      if (not_multithreaded() || this_thread() == 0) return original_list;
-      else return distributed_list(this_thread());
-    }
-
-    ~list_distro()
-    {
-      if (not_multithreaded()) return;
-
-      GMM_ASSERT1(!me_is_multithreaded_now(),
-                  "List accumulation should not run in parallel");
-
-      using namespace std;
-      auto to_add = vector<CONTAINER_LIST*>{};
-      to_add.push_back(&original_list);
-      for (size_type thread = 1; thread < num_threads(); ++thread)
-        to_add.push_back(&distributed_list(thread));
-
-      //List accumulation in parallel.
-      //Adding, for instance, elements 1 to 0, 2 to 3, 5 to 4 and 7 to 6
-      //on separate 4 threads in case of parallelization of the assembly
-      //on 8 threads.
-      while (to_add.size() > 1)
-      {
-        #pragma omp parallel default(shared)
-        {
-          auto i = this_thread() * 2;
-          if (i + 1 < to_add.size()){
-            auto &target = *to_add[i];
-            auto &source = *to_add[i + 1];
-            for (size_type j = 0; j < source.size(); ++j) gmm::add(source[j], 
target[j]);
-          }
-        }
-        //erase every second item , as it was already added
-        for (auto it = begin(to_add); next(it) < end(to_add); it = 
to_add.erase(next(it)));
-      }
-    }
-  };
-
   void model::brick_call(size_type ib, build_version version,
                          size_type rhs_ind) const
   {
@@ -1865,29 +1715,21 @@ namespace getfem {
                                                 brick.region, version);
 
       /*distributing the resulting vectors and matrices for individual 
threads.*/
-      { //brackets are needed because list_distro has constructor/destructor
+      { //brackets are needed because accumulated_distro has 
constructor/destructor
         //semantics (as in RAII)
-        list_distro<complex_matlist> cmatlist(brick.cmatlist);
-        list_distro<complex_veclist> cveclist(brick.cveclist[rhs_ind]);
-        list_distro<complex_veclist> cveclist_sym(brick.cveclist_sym[rhs_ind]);
+        accumulated_distro<complex_matlist> cmatlist(brick.cmatlist);
+        accumulated_distro<complex_veclist> cveclist(brick.cveclist[rhs_ind]);
+        accumulated_distro<complex_veclist> 
cveclist_sym(brick.cveclist_sym[rhs_ind]);
 
         /*running the assembly in parallel*/
-        gmm::standard_locale locale;
-        open_mp_is_running_properly check;
-        thread_exception exception;
-        #pragma omp parallel default(shared)
-        {
-          exception.run([&]
-          {
-            brick.pbr->asm_complex_tangent_terms(*this, ib, brick.vlist,
-                                                 brick.dlist, brick.mims,
-                                                 cmatlist,
-                                                 cveclist,
-                                                 cveclist_sym,
-                                                 brick.region, version);
-          });
-        }
-        exception.rethrow();
+        GETFEM_OMP_PARALLEL(
+          brick.pbr->asm_complex_tangent_terms(*this, ib, brick.vlist,
+                                                brick.dlist, brick.mims,
+                                                cmatlist,
+                                                cveclist,
+                                                cveclist_sym,
+                                                brick.region, version);
+        )
       }
       brick.pbr->complex_post_assembly_in_serial(*this, ib, brick.vlist,
                                                  brick.dlist, brick.mims,
@@ -1926,9 +1768,9 @@ namespace getfem {
                                              brick.region, version);
       {
         /*distributing the resulting vectors and matrices for individual 
threads.*/
-        list_distro<real_matlist> rmatlist(brick.rmatlist);
-        list_distro<real_veclist> rveclist(brick.rveclist[rhs_ind]);
-        list_distro<real_veclist> rveclist_sym(brick.rveclist_sym[rhs_ind]);
+        accumulated_distro<real_matlist> rmatlist(brick.rmatlist);
+        accumulated_distro<real_veclist> rveclist(brick.rveclist[rhs_ind]);
+        accumulated_distro<real_veclist> 
rveclist_sym(brick.rveclist_sym[rhs_ind]);
 
         /*running the assembly in parallel*/
         GETFEM_OMP_PARALLEL(
@@ -2667,17 +2509,11 @@ namespace getfem {
       if (version & BUILD_RHS) gmm::resize(residual, gmm::vect_size(rrhs));
 
       { //need parentheses for constructor/destructor semantics of distro
-        distro<decltype(rrhs)> residual_distributed(residual);
-        distro<decltype(rTM)>  tangent_matrix_distributed(rTM);
+        accumulated_distro<decltype(rrhs)> residual_distributed(residual);
+        accumulated_distro<decltype(rTM)>  tangent_matrix_distributed(rTM);
 
         /*running the assembly in parallel*/
-        gmm::standard_locale locale;
-        open_mp_is_running_properly check;
-        thread_exception exception;
-        #pragma omp parallel default(shared)
-        {
-          exception.run([&]
-          {
+        GETFEM_OMP_PARALLEL(
             if (version & BUILD_RHS)
               GMM_TRACE2("Global generic assembly RHS");
             if (version & BUILD_MATRIX)
@@ -2691,7 +2527,7 @@ namespace getfem {
 
             for (const auto &ge : generic_expressions)
               workspace.add_expression(ge.expr, ge.mim, ge.region,
-                                      2, ge.secondary_domain);
+                                       2, ge.secondary_domain);
 
             if (version & BUILD_RHS) {
               if (is_complex()) {
@@ -2710,9 +2546,7 @@ namespace getfem {
                 workspace.assembly(2);
               }
             }
-          });//exception.run(
-        } //#pragma omp parallel
-        exception.rethrow();
+        )
       } //end of distro scope
 
       if (version & BUILD_RHS) gmm::add(gmm::scaled(residual, 
scalar_type(-1)), rrhs);
diff --git a/src/getfem_omp.cc b/src/getfem_omp.cc
index 1c0c084..c954ddc 100644
--- a/src/getfem_omp.cc
+++ b/src/getfem_omp.cc
@@ -19,156 +19,318 @@
 
 ===========================================================================*/
 
+#include "getfem/dal_singleton.h"
+#include "getfem/getfem_locale.h"
 #include "getfem/getfem_omp.h"
-#include "getfem/getfem_mesh.h"
+
+#ifdef GETFEM_HAS_OPENMP
+  #include <thread>
+  #include <omp.h>
+#endif
+
+using bgeot::scalar_type;
 
 namespace getfem{
 
-#ifdef GETFEM_HAVE_OPENMP
+#ifdef GETFEM_HAS_OPENMP
 
-  boost::recursive_mutex omp_guard::boost_mutex;
+  std::recursive_mutex omp_guard::mutex_;
 
   omp_guard::omp_guard()
-    : boost::lock_guard<boost::recursive_mutex>(boost_mutex)
+    : std::lock_guard<std::recursive_mutex>(mutex_)
   {}
 
-  local_guard::local_guard(boost::recursive_mutex& m) :
+  local_guard::local_guard(std::recursive_mutex& m) :
     mutex_(m),
-    plock_(new boost::lock_guard<boost::recursive_mutex>(m))
-  { }
-
-  local_guard::local_guard(const local_guard& guard)
-    : mutex_(guard.mutex_), plock_(guard.plock_)
-  { }
+    plock_(std::make_shared<std::lock_guard<std::recursive_mutex>>(m))
+  {}
 
-  lock_factory::lock_factory() : mutex_() {}
-  local_guard lock_factory::get_lock() const
-  {
+  local_guard lock_factory::get_lock() const{
     return local_guard(mutex_);
   }
+
+  size_type global_thread_policy::this_thread() {
+    return partition_master::get().get_current_partition();
+  }
+
+  size_type global_thread_policy::num_threads(){
+    return partition_master::get().get_nb_partitions();
+  }
+
+  size_type true_thread_policy::this_thread() {
+    return omp_get_thread_num();
+  }
+
+  size_type true_thread_policy::num_threads(){
+    return omp_get_max_threads();
+  }
+
+  void set_num_threads(int n){
+    omp_set_num_threads(n);
+    partition_master::get().check_threads();
+  }
+
+  bool me_is_multithreaded_now(){
+    // serial region
+    if(omp_get_num_threads() == 1 && omp_get_level() == 0) return false;
+    // parallel region with one thread
+    if(omp_get_num_threads() == 1 && omp_get_level() == 1) return true;
+    // parallel region with more than one thread
+    if(omp_in_parallel() == 1) return true;
+
+    return false;
+  }
+
+  bool not_multithreaded(){
+    return omp_get_max_threads() == 1;
+  }
+
+  size_type max_concurrency() {
+    return std::thread::hardware_concurrency();
+  }
+
+#else
+
+  size_type global_thread_policy::this_thread() {return 0;}
+
+  size_type global_thread_policy::num_threads(){return 1;}
+
+  size_type true_thread_policy::this_thread() {return 0;}
+
+  size_type true_thread_policy::num_threads(){return 1;}
+
+  bool me_is_multithreaded_now(){return false;}
+
+  void set_num_threads(int /*n*/){}
+
+  bool not_multithreaded(){return true;}
+
+  size_type max_concurrency() {return 1;}
+
 #endif
 
-  omp_distribute<bool> open_mp_is_running_properly::answer = false;
-  open_mp_is_running_properly::open_mp_is_running_properly()
-  {answer.all_threads()=true;}
-  open_mp_is_running_properly::~open_mp_is_running_properly()
-  {answer.all_threads()=false;}
-  bool open_mp_is_running_properly::is_it(){return answer;}
-
-  region_partition::region_partition(const region_partition& rp) :
-    pparent_mesh(rp.pparent_mesh),
-    original_region(rp.original_region),
-    partitions(rp.partitions)  {   }
-
-  void region_partition::operator=(const region_partition& rp)
-  {
-    partitions.clear();
-
-    if (!rp.pparent_mesh) return;
-    pparent_mesh->copy_from(*rp.pparent_mesh);
-    original_region = rp.original_region;
-    partitions.resize(rp.partitions.size());
-    gmm::copy(rp.partitions,partitions);
-  }
-
-
-  region_partition::region_partition(mesh* pm, size_type id) :
-    pparent_mesh(pm),original_region(0),
-    partitions(num_threads())
-  {
-    scalar_type time = gmm::uclock_sec();
-    // in case of serial Getfem nothing to partition
-    if (num_threads()==1) {partitions[0]=id; return;}
-
-    //in case mesh is not provided, also don't do anything
-    if (!pm) return;
-
-    if (id == size_type(-1)) {
-      original_region.reset(new mesh_region(pm->convex_index()));
-      original_region->set_parent_mesh(pm);
-    } else{
-      GMM_ASSERT1(pm->has_region(id),"Improper region number");
-      original_region.reset(new mesh_region(pm->region(id)));
+  /** Allows to re-throw exceptions, generated in OpemMP parallel section.
+      Collects exceptions from all threads and on destruction re-throws
+      the first one, so that
+      it can be again caught in the master thread. */
+  class thread_exception {
+    std::vector<std::exception_ptr> exceptions;
+
+    void captureException(){
+      exceptions[true_thread_policy::this_thread()] = std::current_exception();
     }
-    if (me_is_multithreaded_now())
-      GMM_WARNING0("building partitions inside parallel region");
-
-    omp_guard scoped_lock;
-    GMM_NOPERATION(scoped_lock);
-    size_type Nelems = original_region->size();
-    size_type psize = static_cast<size_type>
-      (std::ceil(static_cast<scalar_type >(Nelems)/
-       static_cast<scalar_type >(num_threads())));
-    mr_visitor mr(*original_region);
-    for(size_type thread = 0; thread<num_threads();thread++)
-    {
-      partitions[thread] =
-        
getfem::mesh_region::free_region_id(*(original_region->get_parent_mesh()));
-      mesh_region partition;
-      for(size_type i=thread*psize;i<(thread+1)*psize && 
!mr.finished();i++,++mr)
-      {
-        if (mr.is_face()) partition.add(mr.cv(),mr.f());
-        else partition.add(mr.cv());
+
+  public:
+    thread_exception()
+      : exceptions(true_thread_policy::num_threads(), nullptr)
+    {}
+
+    template <typename function, typename... parameters>
+    void run(function f, parameters... params){
+        try {f(params...);} catch (...) {captureException();}
+    }
+
+    std::vector<std::exception_ptr> caughtExceptions() const{
+      std::vector<std::exception_ptr> non_empty_exceptions;
+      for (auto &&pException : exceptions){
+        if (pException != nullptr) non_empty_exceptions.push_back(pException);
       }
-      pparent_mesh->region(partitions[thread]) = partition;
+      return non_empty_exceptions;
     }
-    GMM_TRACE2("Partitioning time: "<<gmm::uclock_sec()-time<<" s.");
-  }
 
-  size_type region_partition::
-    thread_local_partition() const {
-      if (pparent_mesh==0 && num_threads() >1 ){
-        GMM_WARNING1("partition is empty and cannot be used \
-                     this means that the brick that created it should 
partition \
-                     its domain by himself");
-        return -10;
+    void thread_exception::rethrow(){
+      for (auto &&pException : exceptions){
+        if (pException != nullptr) std::rethrow_exception(pException);
       }
-      return partitions[this_thread()];
+    }
+  };
+
+  partition_iterator::partition_iterator(
+    partition_master &m, std::set<size_type>::const_iterator it_from_set)
+    : master{m}, it{it_from_set}
+  {}
+
+  partition_iterator partition_iterator::operator++(){
+    ++it;
+    if (*this != master.end()) master.set_current_partition(*it);
+    return *this;
+  }
+
+  bool partition_iterator::operator==(const partition_iterator &it1) const {
+    return it == it1.it;
   }
 
-  void omp_distribute<bool>::all_values_proxy::operator=(const bool& x)
-  {
-    for(std::vector<BOOL>::iterator it=distro.thread_values.begin();
-      it!=distro.thread_values.end();it++) *it=x;
+  bool partition_iterator::operator!=(const partition_iterator &it1) const {
+    return !(*this == it1);
+  }
 
+  size_type partition_iterator::operator*() const{
+    return *it;
   }
 
-  thread_exception::thread_exception(): exceptions_(num_threads(), nullptr)
-  { }
+  partition_master partition_master::instance;
 
-  thread_exception::~thread_exception() { }
+  partition_master& partition_master::get(){
+    return instance;
+  }
 
-  std::vector<std::exception_ptr> thread_exception::caughtExceptions() const
-  {
-    std::vector<std::exception_ptr> exceptions;
-    for (auto &&pException : exceptions_)
-    {
-      if (pException != nullptr) exceptions.push_back(pException);
+  void partition_master::check_threads(){
+    if (nb_user_threads != true_thread_policy::num_threads()){
+      update_partitions();
+      nb_user_threads = true_thread_policy::num_threads();
     }
-    return exceptions;
   }
 
-  void thread_exception::rethrow()
-  {
-    for (auto &&pException : exceptions_)
-    {
-      if (pException != nullptr) std::rethrow_exception(pException);
+  void partition_master::set_nb_partitions(size_type n){
+    if (n > nb_partitions){
+      nb_partitions = n;
+      nb_user_threads = true_thread_policy::num_threads();
+      update_partitions();
+      dal::singletons_manager::on_partitions_change();
+    }
+    else{
+      GMM_WARNING1("Not reducing number of partitions from "
+                   << nb_partitions <<" to " << n <<
+                   " as it might invalidate global storage");
+    }
+  }
+
+  partition_iterator partition_master::begin(){
+    check_threads();
+    current_partition = *(std::begin(partitions.thrd_cast()));
+    return partition_iterator{*this, std::begin(partitions.thrd_cast())};
+  }
+
+  partition_iterator partition_master::end(){
+    check_threads();
+    return partition_iterator{*this, std::end(partitions.thrd_cast())};
+  }
+
+  void partition_master::set_behaviour(thread_behaviour b){
+    if (b != behaviour){
+      GMM_ASSERT1(!me_is_multithreaded_now(),
+                  "Cannot change thread policy in parallel section");
+      behaviour = b;
+      check_threads();
+    }
+  }
+
+  partition_master::partition_master()
+    : nb_user_threads{true_thread_policy::num_threads()},
+    nb_partitions{max_concurrency()} {
+    update_partitions();
+  }
+
+  size_type partition_master::get_current_partition() const{
+    return behaviour == thread_behaviour::partition_threads ?
+           current_partition : true_thread_policy::this_thread();
+  }
+
+  size_type partition_master::get_nb_partitions() const{
+    return behaviour == thread_behaviour::partition_threads ?
+           nb_partitions : true_thread_policy::num_threads();
+  }
+
+  void partition_master::set_current_partition(size_type p){
+    if (behaviour == thread_behaviour::partition_threads){
+      GMM_ASSERT2(partitions.thrd_cast().count(p) != 0, "Internal error: "
+                  << p << " is not a valid partitions for thread "
+                  << true_thread_policy::this_thread());
+      current_partition = p;
+    }
+  }
+
+  void partition_master::rewind_partitions(){
+    if (me_is_multithreaded_now()){
+      current_partition = *(std::begin(partitions.thrd_cast()));
+    }
+    else{
+      for (size_type t = 0; t != partitions.num_threads(); ++t){
+        current_partition(t) = *(std::begin(partitions(t)));
+      }
     }
   }
 
-  void thread_exception::captureException()
-  {
-    exceptions_[this_thread()] = std::current_exception();
+  void partition_master::update_partitions(){
+    partitions_updated = false;
+
+    auto guard = omp_guard{};
+    GMM_NOPERATION(guard);
+
+    if (partitions_updated) return;
+
+    partitions = decltype(partitions){};
+    current_partition = decltype(current_partition){};
+
+    auto n_threads = true_thread_policy::num_threads();
+    if(n_threads > nb_partitions){
+      GMM_WARNING0("Using " << n_threads <<
+                   " threads which is above the maximum number of partitions 
:" <<
+                   nb_partitions);
+    }
+    if (behaviour == thread_behaviour::partition_threads){
+      for (size_type t = 0; t != n_threads; ++t){
+        auto partition_size = static_cast<size_type>
+                                
(std::ceil(static_cast<scalar_type>(nb_partitions) /
+                                           static_cast<scalar_type 
>(n_threads)));
+        auto partition_begin = partition_size * t;
+        if (partition_begin >= nb_partitions) break;
+        auto partition_end = std::min(partition_size * (t + 1), nb_partitions);
+        auto hint_it = std::begin(partitions(t));
+        for (size_type i = partition_begin; i != partition_end; ++i){
+          hint_it = partitions(t).insert(hint_it, i);
+        }
+        current_partition(t) = partition_begin;
+      }
+    }
+    else{
+      for (size_type t = 0; t != n_threads; ++t){
+        partitions(t).insert(t);
+        current_partition(t) = t;
+      }
+    }
+
+    partitions_updated = true;
+  }
+
+  #if defined _WIN32 && !defined (__GNUC__)
+    #define GETFEM_ON_WIN
+  #endif
+
+  parallel_boilerplate::
+  parallel_boilerplate()
+  : plocale{std::make_unique<standard_locale>()},
+    pexception{std::make_unique<thread_exception>()} {
+    #ifdef GETFEM_ON_WIN
+      _configthreadlocale(_ENABLE_PER_THREAD_LOCALE);
+    #endif
   }
 
-  void parallel_execution(std::function<void(void)> lambda){
-    gmm::standard_locale locale;
-    thread_exception exception;
+  void parallel_boilerplate::run_lamda(std::function<void(void)> lambda){
+    pexception->run(lambda);
+  }
+
+  parallel_boilerplate::~parallel_boilerplate(){
+    #ifdef GETFEM_ON_WIN
+    _configthreadlocale(_DISABLE_PER_THREAD_LOCALE);
+    #endif
+    pexception->rethrow();
+  }
+
+  void parallel_execution(std::function<void(void)> lambda, bool 
iterate_over_partitions){
+    auto boilerplate = parallel_boilerplate{};
     #pragma omp parallel default(shared)
     {
-      exception.run([&]{lambda();});
+      if (iterate_over_partitions) {
+        for (auto &&partitions : partition_master::get()){
+          boilerplate.run_lamda(lambda);
+        }
+      }
+      else {
+        boilerplate.run_lamda(lambda);
+      }
     }
-    exception.rethrow();
+    if (iterate_over_partitions) partition_master::get().rewind_partitions();
   }
 
-}
\ No newline at end of file
+}  /* end of namespace getfem.                                             */
\ No newline at end of file
diff --git a/src/gmm/gmm_std.h b/src/gmm/gmm_std.h
index 40d7d19..546cbfc 100644
--- a/src/gmm/gmm_std.h
+++ b/src/gmm/gmm_std.h
@@ -56,8 +56,8 @@
 # define SECURE_NONCHAR_FSCANF fscanf_s
 # define SECURE_STRNCPY(a, la, b, lb) strncpy_s(a, la, b, lb)
 # define SECURE_FOPEN(F, filename, mode) (*(F) = 0,  fopen_s(F, filename, 
mode))
-# define SECURE_SPRINTF1(S, l, st, p1) sprintf_s(S, l, st, p1) 
-# define SECURE_SPRINTF2(S, l, st, p1, p2) sprintf_s(S, l, st, p1, p2) 
+# define SECURE_SPRINTF1(S, l, st, p1) sprintf_s(S, l, st, p1)
+# define SECURE_SPRINTF2(S, l, st, p1, p2) sprintf_s(S, l, st, p1, p2)
 # define SECURE_SPRINTF4(S, l, st, p1, p2, p3, p4) sprintf_s(S, l, st, p1, p2, 
p3, p4)
 # define SECURE_STRDUP(s) _strdup(s)
 # ifndef _SCL_SECURE_NO_DEPRECATE
@@ -70,7 +70,7 @@
 # define SECURE_FOPEN(F, filename, mode) ((*(F)) = fopen(filename, mode))
 # define SECURE_SPRINTF1(S, l, st, p1) sprintf(S, st, p1)
 # define SECURE_SPRINTF2(S, l, st, p1, p2) sprintf(S, st, p1, p2)
-# define SECURE_SPRINTF4(S, l, st, p1, p2, p3, p4) sprintf(S, st, p1, p2, p3, 
p4) 
+# define SECURE_SPRINTF4(S, l, st, p1, p2, p3, p4) sprintf(S, st, p1, p2, p3, 
p4)
 # define SECURE_STRDUP(s) strdup(s)
 #endif
 
@@ -166,75 +166,16 @@ namespace std {
     T& operator*() const { return shared_ptr<T>::operator*(); }
     T* operator->() const { return shared_ptr<T>::operator->(); }
   };
-  
+
   template <typename T> shared_array_ptr<T> make_shared_array(size_t num)
   { return shared_array_ptr<T>(new T[num]); }
 }
 
-
-
-
-#ifdef GMM_HAVE_OPENMP
-
-#include <omp.h>
-       /**number of OpenMP threads*/
-       inline size_t num_threads(){return omp_get_max_threads();}
-       /**index of the current thread*/
-       inline size_t this_thread() {return omp_get_thread_num();}
-       /**is the program running in the parallel section*/
-       inline bool me_is_multithreaded_now(){return 
static_cast<bool>(omp_in_parallel());}
-#else
-       inline size_t num_threads(){return size_t(1);}
-       inline size_t this_thread() {return size_t(0);}
-       inline bool me_is_multithreaded_now(){return false;}
-#endif
-
 namespace gmm {
 
-       using std::endl; using std::cout; using std::cerr;
+  using std::endl; using std::cout; using std::cerr;
         using std::ends; using std::cin; using std::isnan;
 
-#ifdef _WIN32
-
-       class standard_locale {
-               std::string cloc;
-               std::locale cinloc;
-       public :
-               inline standard_locale(void) : cinloc(cin.getloc())
-               {
-                       if (!me_is_multithreaded_now()){
-                                cloc=setlocale(LC_NUMERIC, 0);
-                                setlocale(LC_NUMERIC,"C");
-                       }
-               }
-
-               inline ~standard_locale() {
-                       if (!me_is_multithreaded_now())
-                                       setlocale(LC_NUMERIC, cloc.c_str());
-
-               }
-       };
-#else
-       /**this is the above solutions for linux, but it still needs to be 
tested.*/
-       //class standard_locale {
-       //      locale_t oldloc;
-       //      locale_t temploc;
-
-       //public :
-       //      inline standard_locale(void) : oldloc(uselocale((locale_t)0))
-       //      {
-       //                      temploc = newlocale(LC_NUMERIC, "C", NULL);
-    //              uselocale(temploc);
-       //      }
-
-       //      inline ~standard_locale()
-       //      {
-       //                  uselocale(oldloc);
-       //                      freelocale(temploc);
-       //      }
-       //};
-
-
   class standard_locale {
     std::string cloc;
     std::locale cinloc;
@@ -247,9 +188,6 @@ namespace gmm {
     { setlocale(LC_NUMERIC, cloc.c_str()); cin.imbue(cinloc); }
   };
 
-
-#endif
-
   class stream_standard_locale {
     std::locale cloc;
     std::ios &io;
@@ -260,9 +198,6 @@ namespace gmm {
     inline ~stream_standard_locale() { io.imbue(cloc); }
   };
 
-
-
-
   /* ******************************************************************* */
   /*       Clock functions.                                              */
   /* ******************************************************************* */
[Prev in Thread]
Current Thread
[Next in Thread]
[Getfem-commits] [getfem-commits] branch consistent_partitioning_for_open_mp created (now df43516), Andriy Andreykiv, 2018/12/21
- [Getfem-commits] (no subject), Andriy Andreykiv, 2018/12/21
- [Getfem-commits] (no subject), Andriy Andreykiv, 2018/12/21
- [Getfem-commits] (no subject), Andriy Andreykiv <=
Prev by Date: [Getfem-commits] (no subject)
Next by Date: [Getfem-commits] [getfem-commits] consistent_partitioning_for_open_mp updated (df43516 -> 9770d9d)
Previous by thread: [Getfem-commits] (no subject)
Next by thread: [Getfem-commits] [getfem-commits] consistent_partitioning_for_open_mp updated (df43516 -> 9770d9d)
Index(es):
- Date
- Thread