[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [h5md-user] Particle tracking

From: Felix Höfling
Subject: Re: [h5md-user] Particle tracking
Date: Mon, 02 Sep 2013 10:10:26 +0200
User-agent: Opera Mail/12.15 (Linux)

Am 01.09.2013, 22:23 Uhr, schrieb Pierre de Buyl <address@hidden>:

Peter Colberg <address@hidden> a écrit :
On Sun, Sep 01, 2013 at 02:15:41PM -0400, Pierre de Buyl wrote:
Did you consider variable length datatypes? It would allow to keep
the first dimension of datasets the time.

The same applies as for Array data types, it does not allow slicing.


Is it worth losing slicing?

As time-depedent data requires chunking anyway, what about using
H5S_UNLIMITED for the N (where N is the number of particles)
dimension of the datasets in the particles group. This would allow
for the storage of a (potentially) growing number of particles,
without having to (i) having to know N_max and (ii) storing unused

Coming back to my previous message and the use of "id", one could
set id=-1 where there is no particle. If the fill value [1] of "id"
is set to -1, this is achieved at no cost for empty slots and it
would also not impose any specific memory arrangement for the empty

Counting the number of >=0 items in "id" would give the number of
particles at a given time.

[1] http://www.hdfgroup.org/HDF5/doc/UG/10_Datasets.html#Allocation

Just thinking out loud :-)


I'm not sure whether I got the idea right:

1) the first 2 dimensions for time and the particle index are set to unlimited. (The above chapter of the HDF5 manual says that this is possible: "Each dimension can be extended up to its maximum or unlimited." in the section "changing dataset dimensions".)

2) If a sample with particle data is written that exceeds the previously reached maximum particle number, the HDF5 library takes note of that and increases the actual size of the 2nd dimension. But it does not copy the data that have already been written (i.e., not like std::vector). If the missing values are read, the fill value is just returned.

3) The reader has to cross-check with "id" whether the returned values refer to real particles or not. Of course, there is also a fill value for the position, but its more difficult to define a default "invalid" position. BTW, the fill value of "id" should be customisable and the reader should get it from the dataset proporties/attributes.

If the HDF5 library works in such a way, I think this would be the most natural solution to accommodate grand-canoncial particle data (i.e. from a subsystem coupled to a particle reservoir).

Two more suggestions:

To make life easier for the reader, I suggest an attribute to the particles/subgroup indicating such a behaviour. This would also improve performance in case of a fixed particle number.

Further, I dislike the idea of "counting the nonzero elements" which is an O(N) operation. Instead there could be a scalar data group for the actual particle number in each time step. (Which is optional as always and, of course, has to be consistent with the data found in "id".)

@Pierre: What about extending pyh5md by an example for grand-canonical MC simulations of an ideal gas (no interactions, but fluctuating particle number). This should be simple to implement and would give us a simple means of testing the ideas.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]