Hi,
On Wed, May 15, 2013 at 12:10:13PM +0200, Felix Höfling wrote:
sometimes, one wants to store pre-averaged observables, i.e.
accumulated over a certain time span. For example, compute the
pressure every 1000 steps and compute the mean from 10 values, i.e.
writing the data only every 10000 steps. Such a functionality is
provided by LAMMPS and recently also by HALMD.
http://lammps.sandia.gov/doc/fix_ave_time.html
http://halmd.org/modules/observables/utility/accumulator.html
The idea is interesting.
Now my question: how shall such data be stored in the H5MD
observables group? Along with the mean value, one would like to
store also the standard error (or the variance) and the number of
accumulated values. One scheme would be to distribute this
information over several groups under the roof of the observable's
name:
obs1
\-- mean
| +-- count
| \-- value
| \-- step
| \-- time
|
\-- error_of_mean
| +-- count
| \-- value
| \-- step
| \-- time
|
\-- count
+-- count
\-- value
\-- step
\-- time
The obvious drawback is that the structure is pretty nested and that
pre-averaged observables have a disjoint structure from plain
observables, e.g., the mean value is obs1/mean/value in one case and
obs1/value in the other. Further, the step/time fields show up
repeatedly (although they may link each other.)
A second scheme would extend the existing value/step/time triple to
include the error and the number:
obs1
+-- count
\-- value
\-- error
\-- count/number/samples ???
\-- step
\-- time
This scheme appears more natural to me and I would prefer it. In
addition, one may add "variance" and "standard_deviation". There is,
however, a naming clash between the attribute or dataset "count" for
the number of particles and the number of accumulated
values/samples.
Nicolas Höft noted on the halmd-devel mailing list that "count" for
the number of particles is not very descriptive, may we change it to
"size" or "number"?
http://article.gmane.org/gmane.science.simulation.halmd.devel/292
The whole issue may be beyond the current release candidate. I
mainly would like to hear your opinion at an early stage.
It seems premature to me also. Anyway, as far as early opinions are
concerned, I
prefer the second scheme in which all of that can be optional and one
can read
the step/time/value as usual and query the other datasets if
appropriate. I
think that all "extra" features should leave the basic organization
untouched.
Regards,
Pierre