Re: [h5md-user] Specifying the data type

h5md-user

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [h5md-user] Specifying the data type

From:	Felix Höfling
Subject:	Re: [h5md-user] Specifying the data type
Date:	Thu, 29 Aug 2013 13:46:31 +0200
User-agent:	Opera Mail/12.15 (Linux)

Am 29.08.2013, 12:40 Uhr, schrieb Olaf Lenz <address@hidden>:

More seriously, there is not just one integer or float type in
HDF5. For this reason, the H5MD spec just states "of integer data
 type" or "real-valued".


That is why I haven't specified it more precisely. Yes, there are
several datatypes, but the HDF5 docs[1] state: "The source and
destination may have different (but compatible) layouts, in which case
the data elements are automatically transformed during the transfer."

To me, this means that you do not have to specify the exact datatype
layout, but only the "Datatype class" as it is termed in the HDF5 docs.
Of these, only the "Atomic" datatypes need to be specified, i.e. String,
Integer and Float (in our case). However, some properties might have to
be specified.

And for the most interesting datasets, the actual data in
"value", the data type is unspecified at all as it depends on the
specific data stored.


That's what I used the <type> for, to keep it open. However, in many
cases, we have to specify a "datatype class", otherwise writing any
tools that can use h5md as interchange format are impossible.


Hi Olaf,

I overlooked these abstract datatype classes. Indeed, it might be usefulto specify them in the HDF5 terminology. The datatype would be any ofInteger, Float, String, Bitfield, Time, or Opaque. But in general, we wantto allow also (custom) data of Composite type, namely Array, Enumeration,Variable Length, Compound. Quite a long list ...

I feel that expressing all that by the tree graphs would overload
 them, the focus should be on the tree structure. And for the
details, the user is encouraged to read the text, not just to
look at the pictures ;-)


Still it would be way easier to see what is expected. I do not think it
is bloated, and furthermore it clearly points out when we have forgotten
to specify it where it is needed.

Actually, the only datatype that is really important is the integer
 datatype for "step" and for some data such as "id". HDF5 is
otherwise flexible and I would avoid (i) clobbering the
specification and (ii) putting constraints where it is not needed.


I do not agree. If we do not specify datatypes for the datasets that
actually carry semantics, the specification is useless. How am I able to
interpret a h5md file if I do not know whether the positions are stored
as floats, integers or strings? A specification defines how to interpret
data, and to do so it often also has to put constraints.

Does it make a difference with respect to reading whether positions arestored as Float[N][D] or as Array[N], where the Array type is aD-dimensional vector?

The Array type, however, does not work for dimensions > 4 and not fortensor-valued data. Hence we may exclude it from H5MD. Implicitly, thecurrent draft says that the Float version is to be used, see thedescription of "value" in

http://nongnu.org/h5md/draft.html#time-dependent-data

Even in the specification it is obvious that most of the datasets have a
well-defined datatype class.

file root
 \-- h5md
     +-- version : String[variable]
     \-- author
     |   +-- name : String[variable]
     |   +-- (email : String[variable])
     \-- creator
         +-- name : String[variable]
         +-- version : String[variable]
 \-- (particles)
     \-- <group1>
         \-- box
         \-- (position)
             \-- value : <type>[variable][N][D]
             \-- step : Integer[variable]
             \-- time : Float[variable]
         \-- (species : Integer[N])
             \--


I think adding the type class information to the notation would make it
significantly more readable and easier to understand.

Olaf

Actually, your example revealed an ambiguity: in the current notation itis not clear whether "box" is a scalar dataset or a group.


Some more remarks:

- I would not put a restriction on the type of String, whether fixed-sizeor variable size. The reader has to handle both cases (although this meanssome extra effort on the reader).

- the type of the position is actually restricted to be Atomic (see above)and to the domain of real numbers, i.e., Float or Integer.

- something similar happens with the particle species: it could be Integeror Enumeration

- typographic things to improve readibility: close the parenthesesdirectly after the identifier (before the colon), drop the space beforethe colon and insert a space after the data type.

In conclusion, putting the HDF5 datatype class to the graphs may help todetect possible issues. But is should happen typographically in a modestform and allow for multiple datatype classes. I suggest to introduce ourown abbreviations, which can easily be combined:


A=Atomic, C=Composite (the generic cases)
I=Integer, F=Float, S=String, B=Bitfield, T=Time, O=Opaque
A=Array, E=Enumeration, V=Variable Length, C=Compound.

Note the clash for Atomic/Array and Composite/Compound which needs to beresolved.

A general dataset would then be of type "AC", and the particle group wouldlook like:


  <particle_group>
       \-- box
       \-- (position)
       |   \-- value: FI [variable][N][D]
       |   \-- step: I [variable]
       |   \-- time: F [variable]
       \-- (species): IE [N]


Felix

[Prev in Thread]

Current Thread

[Next in Thread]

[h5md-user] Specifying the data type, Olaf Lenz, 2013/08/29
- Re: [h5md-user] Specifying the data type, Felix Höfling, 2013/08/29
  - Re: [h5md-user] Specifying the data type, Pierre de Buyl, 2013/08/29
    - Re: [h5md-user] Specifying the data type, Olaf Lenz, 2013/08/29
    - Re: [h5md-user] Specifying the data type, Felix Höfling <=
    - Re: [h5md-user] Specifying the data type, Olaf Lenz, 2013/08/29
    - Re: [h5md-user] Specifying the data type, Peter Colberg, 2013/08/29

Prev by Date: Re: [h5md-user] Specifying the data type
Next by Date: Re: [h5md-user] Specifying the data type
Previous by thread: Re: [h5md-user] Specifying the data type
Next by thread: Re: [h5md-user] Specifying the data type
Index(es):
- Date
- Thread