h5md-user
[Top][All Lists]

## Re: [h5md-user] The Box Story

 From: Felix Höfling Subject: Re: [h5md-user] The Box Story Date: Mon, 30 Sep 2013 11:21:48 +0200 User-agent: Opera Mail/12.15 (Linux)

```
```
```Hi,

```
Thanks Konrad for proposing this "poll" method. Sorry for not following Peter's
```thread but the propositions were lost.

On Thu, Sep 26, 2013 at 09:19:36AM +0200, Konrad Hinsen wrote:
```
```Proposition 1: Store a single time series with box information for the
whole trajectory. It must cover at least those steps for which any
position information is stored. The box information for a given step
must be retrieved by binary search for random-access step
retrieval. For sequential traversal of the trajectory, more efficient
methods are available.

+ Simplicity. Easy to understand, easy to check.

+ Efficient storage: no duplication of box data.

- Box information retrieval is less efficient.

- Parallel writing (in the sense of parallel I/O) of independent
position time series requires coordination between processes.
```
```
- Cannot accomodate parallel tempering simulations.

```
```Proposition 2: With every position time series, store a box time
series at exactly the same step numbers. If multiple such box time
series are identical, links can be used to avoid duplicating the data.

```
```
+ Can accomodate parallel tempering simulations.

+ Allows to separate easily a sub-trajectory.

```
```
- Efficient writing (without data duplication) requires some effort
and careful thought.
```
```
```
```Let me add to prop 2:

+ fosters modular design

```
Only with separate box groups, the subgroups are truly independent. This would allow them to come from completely different sources, to use very different box geometries, to overlay different boxes in one simulation, etc.
```
```
As Peter noted, prop 2 makes sense only if the sampling intervals (step/time) of the positions and the box are hard-linked. Otherwise, validating the file would be expensive as one would need to retrieve and compare the complete datasets. There was the alternative idea (long time ago) to drop the step/time datasets from the box (and to include the box in the positions group), but the idea was quickly discarded to keep the triple "value/step/time" intact.
```
Felix

```