[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [h5md-user] Particle tracking

From: Peter Colberg
Subject: Re: [h5md-user] Particle tracking
Date: Mon, 2 Sep 2013 13:31:07 -0400
User-agent: Mutt/1.5.21 (2010-09-15)

On Mon, Sep 02, 2013 at 10:10:26AM +0200, Felix Höfling wrote:
> 1) the first 2 dimensions for time and the particle index are set to
> unlimited. (The above chapter of the HDF5 manual says that this is
> possible: "Each dimension can be extended up to its maximum or
> unlimited." in the section "changing dataset dimensions".)

Thank you for explicitly mentioning this. I did not know that one
can use more than one unlimited dimension, but of course chunks
allow extending the data in any dimension. An HDF5 C example [1]
demonstrates a dataset with two unlimited dimensions.


> 2) If a sample with particle data is written that exceeds the
> previously reached maximum particle number, the HDF5 library takes
> note of that and increases the actual size of the 2nd dimension. But
> it does not copy the data that have already been written (i.e., not
> like std::vector). If the missing values are read, the fill value is
> just returned.
> 3) The reader has to cross-check with "id" whether the returned
> values refer to real particles or not. Of course, there is also a
> fill value for the position, but its more difficult to define a
> default "invalid" position. BTW, the fill value of "id" should be
> customisable and the reader should get it from the dataset
> proporties/attributes.
> If the HDF5 library works in such a way, I think this would be the
> most natural solution to accommodate grand-canoncial particle data
> (i.e. from a subsystem coupled to a particle reservoir).

This is how I read the proposal as well now.

I like the solution of leaving the fill value unspecified, and
instead retrieving the fill value from the dataset properties.

> Two more suggestions:
> To make life easier for the reader, I suggest an attribute to the
> particles/subgroup indicating such a behaviour. This would also
> improve performance in case of a fixed particle number.

We could couple the fixed/variable nature of the particle number to
the limited/unlimited size of the respective dataspace dimension.

> Further, I dislike the idea of "counting the nonzero elements" which
> is an O(N) operation. Instead there could be a scalar data group for
> the actual particle number in each time step. (Which is optional as
> always and, of course, has to be consistent with the data found in
> "id".)

There seems to be no way to get around an O(N) operation, since the
elements of the id dataset have to be iterated due to the possible
presence of holes along the dimension of the particle number (see the
remarks on MPI-parallel simulations).

A trajectory subgroup dataset containing the instantaneous particle
counts would help with memory allocation though, when buffers are to
be allocated according to the instantaneous particle number, rather
than the maximum particle number.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]