[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [ESPResSo-devel] Cluster Hardware Requirements
From: |
Axel Arnold |
Subject: |
Re: [ESPResSo-devel] Cluster Hardware Requirements |
Date: |
Wed, 11 Jul 2012 17:26:46 +0200 |
User-agent: |
KMail/4.7.2 (Linux/3.1.10-1.9-default; KDE/4.7.2; x86_64; ; ) |
On Tuesday 10 July 2012 17:11:22 Mingyang Hu wrote:
> Dear all,
>
> I hope to get some suggestion from some of you who have experience in
> running Espresso on different types of hardwares.
>
> The new AMD Opterons (12 or 16 core/chip) seem to have a very good balance
> between number of cores and the amount of cache. Has anyone who's using
> this CPU ever met some problem when running Espresso or other MD programs?
> How's the scalability of Espresso running on ~50ish cores?
Hi!
If the interconnect is reasonably good, Espresso scales weakly quite ok with
about 2000 particles/core. On BlueGene/P, e.g., with a very good interconnect
but slow processors, simulations could go down to 500 particles/core. On a
Cray XE6, someone at our institute was running simple polymer melts with 128
processors and 1000 particles/core.
> Also, I have a more general question regarding the cache as a guidance for
> future simulations. As far as I understand, one advantage of doing MD
> simulation is that during calculation, the amount of information stored on
> a local processor is moderate so that one can possibly fit the data into
> the cache of the processors. So I wonder how much cache do we normally need
> for typical situations where we simulate 1-100 thousands of particles (e.g.
> to hold the r,v,f, lists and so on)? How many particles can 1MB/core L2+L3
> cache support?
It depends a bit on how many features are switched on, rotation e.g. costs 32
byte just for the quaternions. With just a few standard features, a particle
has around 140 byte (p+v+f = 9*8 =72, old position, type, id, charge, ...).
That means that 1MB cache is good for about 5000 particles, since also the
cell structures and other infrastructure should be cached. On the other hand,
the loops are organized such a particle typically only needs to be loaded once
per time step at most, so even there is no dramatic performance drop when you
have more particles than cache.
The bigger concern is actually, that if there are too few particles on a core,
that means that the ghost frames become quite big, causing a lot of
communication overhead. That is why it is more problematic to have too few
particles per core compared to having too many.
Axel
--
JP Dr. Axel Arnold
ICP, Universität Stuttgart
Pfaffenwaldring 27
70569 Stuttgart, Germany
Email: address@hidden
Tel: +49 711 685 67609