Re: [h5md-user] [EXTERNAL] Re: Units Module Question

From: Hart, David Blaine
Subject: Re: [h5md-user] [EXTERNAL] Re: Units Module Question
Date: Mon, 12 May 2014 17:06:02 +0000

> -----Original Message-----
> From: address@hidden [mailto:h5md-
> address@hidden On Behalf Of Felix Höfling
> Sent: Monday, May 12, 2014 2:33 AM
> To: address@hidden
> Subject: [EXTERNAL] Re: [h5md-user] Units Module Question
> Am 10.05.2014, 19:10 Uhr, schrieb Pierre de Buyl
> <address@hidden>:
> > On Sat, May 10, 2014 at 02:46:04PM +0200, Konrad Hinsen wrote:
> >> Hart, David Blaine writes:
> >>
> >>  > I want to convert “Kcal mol-1 Å-1” into “J mol-1 m-1”, which leads
> >> me to want to  > use the unit string “4.184 10+10 J mol-1 m-1”. But
> >> I’m not sure that’s valid. But  > is seems weird that it would be
> >> invalid since there are so many numeric conversion  > factors that
> >> are multiplied by a factor of ten, like “1.602e-19 C”
> >> written as
> >>  > “1.602 10-19 C”.
> >>  >
> >>  > This isn’t a big deal, since the SI prefixes can usually make it
> >> so that the  > decimal only has to be shifted one or two places in
> >> the numeric factor, but I  > thought I’d mention it.
> >>
> >> How about “4.184e10 J mol-1 m-1” ?
> >
> > I did think of that but this possibility is only implicit in the
> > specification (both Mosaic and H5MD). It seems however logical to
> > accept the scientific notation for the numeric factor.
> >
> >
> An explicit syntax grammar would be very helpful here. I had one
> interpretation of how a unit string should look like and I thought that
> there is no doubt about it. But now I realise that many other
> interperations are possible:
> The first (optional) part is a number in non-scientific notation (integer
> or decimal fraction—what is the decimal sign? "." in English, "," in
> German). Thus "4.184e10" is excluded although it seems very natural.
> It is unclear from the wording of the spec whether the number can be
> followed by an exponent, e.g., whether 4.184-10 would be allowed
> (evaluating to pow(4.184, -10)).
> It is probably intended that the string "1.602 10-19 C" is valid. On first
> reading, however, I thought it has 2 numeric factors and is not covered.
> It seems that the second part is a "numeric (unit) factor", to be
> distinguished from the leading "number".

I actually think you are right, and that the string "1.602 10-19 C" is not 
valid. I would take this reading because it is a "unit" and not a conversion 
factor, which means I was misusing the "unit" metadata when I was trying to 
give a conversion factor in my units. But it would still make sense to have a 
single numeric unit factor, especially a power of ten, as it is descriptive of 
the units. It would then make sense not to allow scientific notation, since 
that should probably apply to the data itself, while the units module only 
defines the unit, not a conversion factor which should be applied to the data 
first, if so desired. I'm not sure if this was the intent with Mosaic and the 
Units module, but it would make sense to me, now.

I spent a lot of time reading the English version of the SI units brochure last 
Friday, and I finally understood what they meant by "coherent units". For 
example, the "coherent units" for dynamic viscosity is "Pa s". The brochure 
says that it is acceptable to use the SI prefixes, but that means they are no 
longer "coherent units" -- i.e., using "kPa s" means that it is no longer clear 
that you are talking about viscosity, but could be talking about something 
completely different, but using "10+3 Pa s" specifies the magnitude of the 
measurements while keeping the coherent SI units.

I think that @Pierre confirms the reading of only a single number/numeric unit. 
Which would mean that @Peter was right, and the following would be clearer:

>>> Would the sentence be unambiguous if it were reworded as follows?
>>> “There may be at most one numeric unit factor, which must be the first one.”

> If  "1.602 10-19 C" would be valid, then the reading of the spec would
> also allow for  "10-19 C 1.602" (because nothing is said about the
> position of the "number").
> To avoid this kind of confusion I suggest to either include "10" as the
> only possible numeric unit symbol. (What about 2, positive integers?) Or
> we call "1.602 10-19" the (leading) number and make its format explicit.
> @David: in the course of the development of the units module, there was
> also support for non-SI units which were dropped later on. I believe
> because there was no concensus of what should be included. Your use case
> involving "kcal" would probably be covered more naturally by such an
> extension of the units module, see the udunits2 library:
> http://nongnu.org/h5md/implementation.html#using-udunits2-for-units-
> interpretation        

I realized this as I was reading the SI Brochure, and it seems the units module 
has already accounted for this by defining the "system" metadata within the 
/h5md/modules/units/ group. If I wanted to propose a, for example, "CGS-ESU" 
system, that would be easily defined as a different "system". And if I want to 
convert my energy from Kcal to J, I should multiply through the 4.184 J / Kcal 
conversion factor on my data first, then set the unit to "J", rather than set 
the unit to "4.184 J", since that isn't a unit, it's an equation. :-)

Thanks, and sorry for muddying the waters on what the "unit" was.


