Re: [h5md-user] Another HDF5-based trajector format

h5md-user

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [h5md-user] Another HDF5-based trajector format

From:	Felix Höfling
Subject:	Re: [h5md-user] Another HDF5-based trajector format
Date:	Tue, 02 Sep 2014 23:46:34 +0200
User-agent:	Opera Mail/12.16 (Linux)

Am 02.09.2014, 10:59 Uhr, schrieb <address@hidden>:

Hi Konrad,

On Mon, Sep 01, 2014 at 11:58:27AM +0200, Konrad Hinsen wrote:
I found this description somewhat accidentally:

  http://mdtraj.org/latest/hdf5_format.html

It looks like a complete definition of an HDF5-based trajectory
format for biomolecular systems, but it also looks like tailor-made
for the needs of a particular library.
This is very interesting indeed. From browsing the history (I just lovethat youcan do that) [1] the HDF5 support is from a bit more than a year ago,that islater than my extensive web searches. I had found H5Part [2] but it wasnot
satifactory. I'll have a closer look at mdtraj.
As you mention, it is program-specific. Also, the structure seems reallyrigid (no
groups, for instance).
The topology [their own word, we use connectivity now I think :-) ]storage is abit awkward, as it is a json text files embedded in a HDF5 dataset. Butat least
they have connectivity information.
With respect to MOSAIC, it also seems more rigid for the same reasons asabove:
one single entity in the file.
An interesting aspect is the use of compression. Did anyone try this
in H5MD?
Only a few times, without much gain (from memory, less than 10% gainwith gzip).
I don't have a good automated strategy to test this.

Pierre
[1]https://github.com/rmcgibbo/mdtraj/commits/master/docs/hdf5_format.rst
[2] http://vis.lbl.gov/Research/H5Part/


The MDtraj project is indeed interesting. It misses a conversion tool to
and from H5MD :-)

Compression was one of the major criteria to choose HDF5 when Peter and I
started with MD simulations. In HALMD it is enabled by default (but the
parameters are not optimised). The underlying HDF5 concept are filters,
which require a chunked dataspace layout. The relevant filters are
"shuffle" in combination with "deflate" (GZIP) or szip. You can nicely
play around with

    h5repack -f SHUF -f GZIP=6 input.h5 output.h5

and check the result with "h5dump -Hp" or "h5ls -v". For some arbitrary
file with 205k particles (2 snapshots, float32), I get 13% compression
with GZIP alone and 39% with shuffle plus GZIP. (I can't compare with SZIP
which is missing in my h5repack build.) Using GZIP=9 improves the ratio
only marginally by 0.2%.

Dataset layout and the chunk size are relatively important. Switching from
chunks of 1xNxD to 1xNx1 by

    h5repack -l particles/A/position:CHUNK=1x204800x1 input.h5 output.h5

packs all x-coordinates separately etc. and improves the compression ratio
considerably from 39% to 59% (for my specific example).

I find these numbers really encouraging to enable the compression features
of HDF5 (and thus H5MD).

The HDF5 library also helps to get rid of the high floating-point bits
(which are almost incompressible white noise and thus irrelevant). If the
memory and file datatypes are different, the conversion should be done by
the library (but I haven't tried this myself.)

Cheers,

Felix

[Prev in Thread]

Current Thread

[Next in Thread]

[h5md-user] Another HDF5-based trajector format, Konrad Hinsen, 2014/09/01
- Re: [h5md-user] Another HDF5-based trajector format, pdebuyl, 2014/09/02
  - Re: [h5md-user] Another HDF5-based trajector format, Felix Höfling <=
    - Re: [h5md-user] Another HDF5-based trajector format, Konrad Hinsen, 2014/09/03

Prev by Date: Re: [h5md-user] File format versioning
Next by Date: [h5md-user] lammps h5md dump style
Previous by thread: Re: [h5md-user] Another HDF5-based trajector format
Next by thread: Re: [h5md-user] Another HDF5-based trajector format
Index(es):
- Date
- Thread