[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [h5md-user] The Box Story

From: Felix Höfling
Subject: Re: [h5md-user] The Box Story
Date: Mon, 30 Sep 2013 11:21:48 +0200
User-agent: Opera Mail/12.15 (Linux)

Am 27.09.2013, 21:36 Uhr, schrieb Pierre de Buyl <address@hidden>:


Thanks Konrad for proposing this "poll" method. Sorry for not following Peter's
thread but the propositions were lost.

On Thu, Sep 26, 2013 at 09:19:36AM +0200, Konrad Hinsen wrote:
Proposition 1: Store a single time series with box information for the
whole trajectory. It must cover at least those steps for which any
position information is stored. The box information for a given step
must be retrieved by binary search for random-access step
retrieval. For sequential traversal of the trajectory, more efficient
methods are available.

 + Simplicity. Easy to understand, easy to check.

 + Efficient storage: no duplication of box data.

 - Box information retrieval is less efficient.

 - Parallel writing (in the sense of parallel I/O) of independent
   position time series requires coordination between processes.

   - Cannot accomodate parallel tempering simulations.

Proposition 2: With every position time series, store a box time
series at exactly the same step numbers. If multiple such box time
series are identical, links can be used to avoid duplicating the data.

+ Efficient random read access to positions with matching box information.

   + Can accomodate parallel tempering simulations.

   + Allows to separate easily a sub-trajectory.

 - Efficient writing (without data duplication) requires some effort
   and careful thought.

Let me add to prop 2:

    + fosters modular design

Only with separate box groups, the subgroups are truly independent. This would allow them to come from completely different sources, to use very different box geometries, to overlay different boxes in one simulation, etc.

As Peter noted, prop 2 makes sense only if the sampling intervals (step/time) of the positions and the box are hard-linked. Otherwise, validating the file would be expensive as one would need to retrieve and compare the complete datasets. There was the alternative idea (long time ago) to drop the step/time datasets from the box (and to include the box in the positions group), but the idea was quickly discarded to keep the triple "value/step/time" intact.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]