Dear
Axel,
Thank
you very much indeed for the detailed and very useful explanation and
recommendations.
The
simulation on the cluster with single core runs without any error
messages. Strangely enough! it works with 3 cores as well. I am going
to increase the number of cores on single node up to 12 again.
I
checked the kinetic energy and it is in a proper range (both the 6
cores desktop and single core cluster outputs are in agreement).
Actually
I have used the individual force cap for warmup in this simulation
and I set the cap radius for each interaction. I have tried a very
long warmup as well; something around 1 million steps, but still I get
the same error.
I will
post the latest updates soon after I finished the test simulations.
Thank
you very much,
Best
regards,
Arash
Arash Azari
________________________________
From: Axel Arnold <address@hidden>
To: Arash Azari <address@hidden>
Cc: "address@hidden" <address@hidden>
Sent: Wednesday, August 7, 2013 11:01 PM
Subject: Re: [ESPResSo-users] different simulation scenario on desktop or
cluster
Hi!
Bonded interactions are only computed for particles that are
located on the same CPU. If you increase the number of cores, the
range over which a bond can be computed, gets shorted. However,
any reasonable bond is much shorter than your box dimensions, so
that bond broken errors definitely point to excessive forces. That
it "works" on your desktop probably simply means that on that
machine, the long bonds can still be accommodated, but it is very
likely that your simulation is still aphysical. In particular, I
doubt that the problem is due to MPI or the machine, but rather a
problem of your setup. An easy check would be to run just with 6
cores or even one core on the cluster, just as on your desktop.
To check the physics of your simulation, just write out the
energies, and check that in particular the kinetic energy
fluctuates around 1/2 N k_BT, where N is the number of degrees of
freedom. Although, there should be no other strong energy drift.
Under most circumstances, it is very likely that you actually need
a warmup phase. However, when combined with walls, capping the
wall forces is usually not a good idea, since particles are then
not hindered from penetrating the hard core of the wall
constraint. Therefore, you should use the individual force cap
feature, and only set a cap radius for the particle-particle
interactions, see the User's guide for details on "inter forcecap
individual".
Cheers,
Axel
On 07.08.13 12:36, Arash Azari wrote:
Hello
everyone,
I have a very strange situation and I cannot find any proper solution for it; I
highly appreciate any recommendations.
Here is the problem:
I have a simulation system (polymer and ions with repulsive wall) and when I
run this simulation on my desktop everything is fine (run on single CPU with 6
cores) regardless of skin parameter (0.4) and the warm up steps; it works even
with a very short warm up.
When I try to run this simulation on a cluster, a few steps after warm up it
crashes with the sometimes bond broken error or wall constraint violation
error. I tried different nodes and cores combinations and even on a single node
with 2 CPUs (run on 12 cores) it crashes. I have changed the skin parameter up
to 2.0 and very long warm up and still I get the same error messages a few
steps after the warm up.
I should mention that I did not cap the interactions between the particles
(polymer or ion) and the walls during the warm up.
I am not sure whether it is because of the MPI settings on the cluster or not,
but the cluster administrators are not helpful at all to ask anything about the
system settings and configuration.
I attached the config.log file if it helps.
Thank
you very much,
Best regards,
Arash
Arash Azari