h5md-user
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [h5md-user] Specifying the data type


From: Olaf Lenz
Subject: Re: [h5md-user] Specifying the data type
Date: Thu, 29 Aug 2013 12:40:14 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130801 Thunderbird/17.0.8

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi!

On 08/29/2013 10:35 AM, Pierre de Buyl wrote:
>> Capitalised identifiers remind me of Fortran ... how ugly, but 
>> HDF5 uses them as well :-(

That is why I have used them.

>> More seriously, there is not just one integer or float type in 
>> HDF5. For this reason, the H5MD spec just states "of integer data
>>  type" or "real-valued".

That is why I haven't specified it more precisely. Yes, there are
several datatypes, but the HDF5 docs[1] state: "The source and
destination may have different (but compatible) layouts, in which case
the data elements are automatically transformed during the transfer."

To me, this means that you do not have to specify the exact datatype
layout, but only the "Datatype class" as it is termed in the HDF5 docs.
Of these, only the "Atomic" datatypes need to be specified, i.e. String,
Integer and Float (in our case). However, some properties might have to
be specified.

>> And for the most interesting datasets, the actual data in
>> "value", the data type is unspecified at all as it depends on the
>> specific data stored.

That's what I used the <type> for, to keep it open. However, in many
cases, we have to specify a "datatype class", otherwise writing any
tools that can use h5md as interchange format are impossible.

>> I feel that expressing all that by the tree graphs would overload
>>  them, the focus should be on the tree structure. And for the 
>> details, the user is encouraged to read the text, not just to
>> look at the pictures ;-)

Still it would be way easier to see what is expected. I do not think it
is bloated, and furthermore it clearly points out when we have forgotten
to specify it where it is needed.

> Actually, the only datatype that is really important is the integer
>  datatype for "step" and for some data such as "id". HDF5 is 
> otherwise flexible and I would avoid (i) clobbering the
> specification and (ii) putting constraints where it is not needed.

I do not agree. If we do not specify datatypes for the datasets that
actually carry semantics, the specification is useless. How am I able to
interpret a h5md file if I do not know whether the positions are stored
as floats, integers or strings? A specification defines how to interpret
data, and to do so it often also has to put constraints.

Even in the specification it is obvious that most of the datasets have a
well-defined datatype class.

file root
 \-- h5md
     +-- version : String[variable]
     \-- author
     |   +-- name : String[variable]
     |   +-- (email : String[variable])
     \-- creator
         +-- name : String[variable]
         +-- version : String[variable]
 \-- (particles)
     \-- <group1>
         \-- box
         \-- (position)
             \-- value : <type>[variable][N][D]
             \-- step : Integer[variable]
             \-- time : Float[variable]
         \-- (species : Integer[N])
             \--
.
.
.

I think adding the type class information to the notation would make it
significantly more readable and easier to understand.

Olaf

[1] http://www.hdfgroup.org/HDF5/doc1.6/UG/11_Datatypes.html

- -- 
Dr. rer. nat. Olaf Lenz
Institut für Computerphysik, Allmandring 3, D-70569 Stuttgart
Phone: +49-711-685-63607
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.19 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iEYEARECAAYFAlIfJQ4ACgkQtQ3riQ3oo/oceACdFdSRZeq1IuPAzTMMjdcjbdwa
rMsAnAyD43lA2EqwKYPfwoKOJyuz1Kt3
=g/6w
-----END PGP SIGNATURE-----



reply via email to

[Prev in Thread] Current Thread [Next in Thread]