h5md-user
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [h5md-user] String encoding


From: Pierre de Buyl
Subject: Re: [h5md-user] String encoding
Date: Thu, 20 Feb 2014 11:06:02 +0100
User-agent: Mutt/1.5.21 (2010-09-15)

On Thu, Feb 20, 2014 at 09:30:26AM +0100, Felix Höfling wrote:
> Am 20.02.2014, 01:06 Uhr, schrieb Peter Colberg
> <address@hidden>:
> 
> >I wish we would have a test suite for H5MD.

That would be nice to have. Although the concept of test suite should not rely
on a specific implementation, pyh5md could be a nice platform for a test suite.

Konrad already requested this [1], but until now there have been no advances on
this. 

> >Recently we decided to use fixed-length string datatypes for the
> >attributes defined in H5MD. Unfortunately, with h5py, this forces a
> >string to be encoded in ASCII; reading a fixed-length string encoded
> >in UTF-8 will raise an error.
> >
> >http://docs.h5py.org/en/latest/strings.html#how-to-store-text-strings
> >
> >Since it is crucial for Fortran programs to *conveniently* write H5MD
> >metadata such as the author’s name, but less crucial to *conveniently*
> >read said metadata, I suggest we follow Felix’ earlier proposal, with
> >the following refinement:
> >
> >The attributes in h5md/ may be stored using either a fixed-length
> >ASCII string datatype, or a variable-length UTF-8 string datatype.
> >
> I consider such behaviour a bug in h5py. It should be able to read
> fixed-length strings in either encoding. No fix is needed for H5MD 1.0.

Or a missing feature :-). The problem is known from the author [2] but there is
no fix yet, although in the issue I link to there is a workaround.

> Some more general thoughts: I would say that whatever kind of strings can
> possibly be found in an HDF5 file is legal. Our reference is the HDF5
> standard and not an arbitrary set of (low- or high-level) API libraries
> (e.g., we didn't check string handling in Mathematica or Matlab). If such
> libraries are not able to read or write the stuff (conveniently), they
> need to be improved.
>
> As an overall rule I would say that simple statements in the spec shall be
> preferred over listings of certain cases in detail.

In general, what you write is true and I agree that the limitations of specific
implementations should not decide what we do in H5MD. Without actual 
implementations (and our experiments with what is possible or not with them) 
H5MD would be completely useless, so there is always a balance to keep in mind.

In this case, I am confident that h5py will be fixed to handle such situations 
and that the broader compatibility of fixed-length strings is worth the 
restriction.

P

[1] https://github.com/pdebuyl/pyh5md/issues/5
[2] https://github.com/h5py/h5py/issues/289




reply via email to

[Prev in Thread] Current Thread [Next in Thread]