h5md-user
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [h5md-user] Writing vs reading


From: Pierre de Buyl
Subject: Re: [h5md-user] Writing vs reading
Date: Tue, 14 Jan 2014 14:10:11 +0100
User-agent: Mutt/1.5.21 (2010-09-15)

Hi all,

On Mon, Jan 13, 2014 at 03:48:53PM +0100, Konrad Hinsen wrote:
> Olaf Lenz writes:
> 
>  > Ultimately, the problem that seems to reoccur in many of the discussions 
> is the
>  > question who will have to do the most effort: the writer of h5md, or the 
> reader?
> 
> That's indeed an important aspect, together with the related one of
> who is responsible for verifying that the rules are respected: the
> writer, the reader, an intermediate instance (a validation program),
> or nobody (i.e. anarchy).
> 
> Many informally defined data formats (and that means practically
> everything used in science) are based on anarchy: the standard is a
> statement of intention, to which every program conforms as much as its
> authors consider useful, leading to many "dialects". While better than
> no standard, a standard with an anarchy attitude usually causes lots
> of frustrations among users.
> 
> A formally defined standard (i.e. XML formats with a DTD or schema)
> provides a way to validate the correctness of a data file, even though
> this validation rarely covers all possible non-conformities. The
> existence of a "neutral arbiter" (the validation tool) encourages
> writers to respect the standard and readers not to accept invalid
> files. In the long run this works a lot better.
> 
> There is a huge gray zone in between these two extremes, and that's
> where I think H5MD belongs. Certain "hard core" features should be
> respected strictly, whereas less central features should be soft
> constraints open to extension and reinterpretation.
> 
>  > 1. Reader-friendly
>  > In a "reader-friendly" approach, we would specify exactly how positions in 
> periodic
>  > boundary conditions have to be stored in h5md, e.g. "image" has to always 
> exist,
>  > and "position" always has to be within the primary box. This makes reading 
> the
>  > positions from a h5md file simpler.
>  > However, it comes at the cost that the writer of the file will have to 
> prepare the
>  > data exactly as h5md specifies it.
>  > 
>  > 2. Writer-friendly
>  > In a "writer-friendly" approach, we would allow any possible case how to 
> store the
>  > positions as long as it is unique (with image, without image, inside the 
> primary
>  > box, outside the primary box, whatever).
>  > This comes at the cost that reading the file is more complex.
> 
> The most important feature for me is that any given combination of
> data arrays has a clear and unambiguous meaning. That criterion still
> leaves a lot of freedom, where as you say the question is whose life
> we want to simplify most.
> 
> My personal preference would be for the simplest rules for data
> interpretation.  That's close to your "reader-friendly" but not
> exactly the same. It's neither simplicity of reader implementation nor
> minimization of operations in the reader that matters for me, but the
> simplicity of the rules that the reader has to apply. The goal is to
> keep readers and writers easy to understand for humans. I do realize
> that this criterion does not necessarily lead to a unique best
> solution of course.

In a way, this is "reader-program friendly" in the sense that the job of the
reading program should be the clearest that is possible.

>  >   * When we go the reader-friendly way, we will not be able to prevent 
> people to
>  >     write files that do not conform to the specs anyway, so a good reader 
> will
>  >     either have to throw an error in that case, or he will have to handle 
> it.
> 
> For the reasons stated above, readers should be encouraged to throw
> errors when presented with invalid files.

Invalid files should break a reader loudly. Else, it'll be too easy to work with
invalid files.

>  >   * When we choose the writer-friendly way, we can not guarantee that all 
> readers
>  >     can actually handle all possible cases. Library functions that support 
> a reader
>  >     may help, but it will not be possible to cover all possible cases in 
> such a
>  >     function.
> 
> That's where a validation tool comes in handy: a reader that fails to read
> a file that passes validation is considered buggy.
> 
>  >   * I would expect that more people will program tools that read
>  >     h5md files than people that program tools to write h5md
>  >     files. Furthermore, people that create such files are probably
>  >     more used to stick to specs than people that read them.
>  >     Insofar it might make sense to make h5md reader-friendly rather
>  >     than writer-friendly.
> 
> That's indeed a good pragmatic principle.

Yes, but writing should not be "too" hard. For now, the possibility that we have
to write only what interests us is very good! You can use plain HDF5 to write
only what you need in a file.

All in all, I have the feeling (please correct me!) that we've done a good job
at defining H5MD while taking into account implementation (read/write) problems
and "ease of understanding". H5MD is still in a gray zone because it lacks a
file validator but I prefer to have H5MD as it is now (designed with already
many constraints), that people can start to work with it, and that a validator
enters the game before H5MD has become mainstream (i.e. in use by all major
simulation programs, which will happen [1]).

P

[1] Well, I can dream of it no?





reply via email to

[Prev in Thread] Current Thread [Next in Thread]