h5md-user
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [h5md-user] String encoding


From: Felix Höfling
Subject: Re: [h5md-user] String encoding
Date: Thu, 20 Feb 2014 09:30:26 +0100
User-agent: Opera Mail/12.16 (Linux)

Am 20.02.2014, 01:06 Uhr, schrieb Peter Colberg
<address@hidden>:

Hi all,

I wish we would have a test suite for H5MD.

Recently we decided to use fixed-length string datatypes for the
attributes defined in H5MD. Unfortunately, with h5py, this forces a
string to be encoded in ASCII; reading a fixed-length string encoded
in UTF-8 will raise an error.

http://docs.h5py.org/en/latest/strings.html#how-to-store-text-strings

Since it is crucial for Fortran programs to *conveniently* write H5MD
metadata such as the author’s name, but less crucial to *conveniently*
read said metadata, I suggest we follow Felix’ earlier proposal, with
the following refinement:

The attributes in h5md/ may be stored using either a fixed-length
ASCII string datatype, or a variable-length UTF-8 string datatype.

Regards,
Peter


I consider such behaviour a bug in h5py. It should be able to read
fixed-length strings in either encoding. No fix is needed for H5MD 1.0.

Some more general thoughts: I would say that whatever kind of strings can
possibly be found in an HDF5 file is legal. Our reference is the HDF5
standard and not an arbitrary set of (low- or high-level) API libraries
(e.g., we didn't check string handling in Mathematica or Matlab). If such
libraries are not able to read or write the stuff (conveniently), they
need to be improved.

As an overall rule I would say that simple statements in the spec shall be
preferred over listings of certain cases in detail.

Regards,

Felix



reply via email to

[Prev in Thread] Current Thread [Next in Thread]