Re: [h5md-user] Variable-size particle groups

h5md-user

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [h5md-user] Variable-size particle groups

From:	Felix Höfling
Subject:	Re: [h5md-user] Variable-size particle groups
Date:	Tue, 29 May 2012 10:15:08 +0200
User-agent:	Opera Mail/11.64 (Linux)

Hi Peter,

Am 26.05.2012, 15:55 Uhr, schrieb Peter Colberg<address@hidden>:

Dear H5MD community,

Let's break the silence with a new extension for H5MD :-).

While finishing the support of particle groups in HALMD, which allow
selection of a subset of particles of the system for observation,
I am pondering how to store variable-size trajectory data in H5MD.

This would become necessary once I track, e.g., particles in the
neighbourhood of a particle, while avoiding to sample an entire
system of millions of solvent particles (or, at least, with a
significantly lower frequency).

One idea I had in mind was to use the existing trajectory dataset
structure, and fill empty placeholders with some invalid value (NaN).
While the storage overhead should be negligible due to compression,
this has a serious disadvantage: The number of placeholders must be
chosen wisely, otherwise a lengthy simulation may have to abort due
to an overflow of particles.

Instead, I propose a better scheme:

H5MD implements an optional dataset “range” inside each trajectory
subgroup, next to the other datasets groups “step” and “time”.

The dataset “range” is two-dimensional, with the first dimension
as the [variable] dimension (in H5MD lingo “to accumulate time steps”),
and the second dimension equal to 2. The dataset stores an array of
ranges [first, last), which reference the variable dimension of the
datasets position/sample, velocity/sample, …

The datasets position/sample, velocity/sample, … are reduced by one
dimension, i.e. [variable][N][D] are reduced to [variable × N][D].

For readers, this will add an additional indirection when looking up
particle data, e.g. to look up the position sample at step s, the
reader first looks up the range [first, last) at step s, and then
selects this range from the position/sample dataset.

As an example, a lookup by range [first, last) could be implemented
with ease using NumPy's array indexing, array[first:last], e.g.

  first, last = range[step]
  sample = position[first:last]


Of course, with a fluctuating number of particles, one would probably
also store a trajectory subgroup “tag” to identify particles, but this
is a separate issue from my proposal.


What do you think of this proposal?

Should such an extension be optional, or mandatory? Do you see even
more complex use cases which could not be handled by this scheme?

In the majority of MD simulations, the particle number is fixed; thisincludes even semi-grandcanonical simulations, where the particletype/species changes but the total number of particles is preserved. I'mnot sure whether people actually store particle configurations in truegrandcanonical Monte-Carlo simulations since mostly averages matter in theend.

The trajectory group is at the heart of the H5MD format and shall be assimple as possible. On the other hand, it shall be as flexible aspossible, of course. I think the current scheme, which we all have agreedon some time ago, fulfills this aim, and I would like to stick to thisdirect and straightforward scheme.

Your suggestion is taylored for your specific application, a change of theH5MD structure would have an impact for _all_ users. The indirect lookupor several formats for the trajectory would make things more complicatedand, I believe, will effectively discourage people from using H5MD.

A trajectory describes the time evolution of a given set of particles.Snapshots of a changing subset of particles are, strictly speaking, not atrajectory.Probably you also do not want to resume a simulation from such a partialdataset. Hence, the right place for the data you need to store is aspecific H5MD subgroup (e.g., in "structure/..."?), and there, theproposed format would be perfectly fine.

I strongly favour separate subgroups for application-specific datastructures, this keeps general groups like "trajectory/" clean and simple.Nevertheless, the structure of commonly used subgroups may be defined bythe H5MD format (as we have done for "observables/" already).


Finally a technical point: the slicing of a large HDF5 dataset,
        position[first:last],

may be much less efficient than using a dataset with appopriately formeddimensions and accessing a full snapshat via a single index,

        position[step].

This has to be checked. I expect that the performance sensitively dependson the way slicing is implemented, i.e., on the backend used for HDF5access.


Best wishes,

Felix

[Prev in Thread]

Current Thread

[Next in Thread]

[h5md-user] Variable-size particle groups, Peter Colberg, 2012/05/27
- Re: [h5md-user] Variable-size particle groups, Felix Höfling <=
  - Re: [h5md-user] Variable-size particle groups, Olaf Lenz, 2012/05/29
    - Re: [h5md-user] Variable-size particle groups, Olaf Lenz, 2012/05/29
    - Re: [h5md-user] Variable-size particle groups, Peter Colberg, 2012/05/29
    - Re: [h5md-user] Variable-size particle groups, Peter Colberg, 2012/05/29
    - Re: [h5md-user] Variable-size particle groups, Olaf Lenz, 2012/05/29
    - Re: [h5md-user] Variable-size particle groups, Olaf Lenz, 2012/05/29
    - Re: [h5md-user] Variable-size particle groups, Peter Colberg, 2012/05/29
    - Re: [h5md-user] Variable-size particle groups, Olaf Lenz, 2012/05/29
    - Re: [h5md-user] Variable-size particle groups, Peter Colberg, 2012/05/29
    - Re: [h5md-user] Variable-size particle groups, Olaf Lenz, 2012/05/29

Prev by Date: [h5md-user] Variable-size particle groups
Next by Date: Re: [h5md-user] Variable-size particle groups
Previous by thread: [h5md-user] Variable-size particle groups
Next by thread: Re: [h5md-user] Variable-size particle groups
Index(es):
- Date
- Thread