lmi
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [lmi] change file formats to XML


From: Greg Chicares
Subject: Re: [lmi] change file formats to XML
Date: Sat, 06 Mar 2010 00:16:06 +0000
User-agent: Thunderbird 2.0.0.23 (Windows/20090812)

On 2010-02-26 14:56Z, Vaclav Slavik wrote:
> 
> I uploaded the XML file formats patch to Savannah:
> 
>    https://savannah.nongnu.org/patch/index.php?7101

At first, I thought there was an anomaly in the backward-compatibility
code. When I tried to load a previously-saved '.ill' or '.cns' file
that uses a "Policy" other than 'sample', I got:

  File 'C:/opt/lmi/data/xyz.xpol' is required but could not be found.
  Try reinstalling.
  [file /lmi/src/lmi/ihs_proddata.cpp, line 219]

But it all seems to work just fine if, after 'make install', I do:

cd /opt/lmi/data
for z in *.db4 *.fnd *.pol *.rnd *.tir; do mv $z ${z/./.x}; done
for z in *.xpol; do sed -i $z \
  -e'/DatabaseFilename/s/\.db4/.xdb4/' \
  -e'/FundFilename/s/\.fnd/.xfnd/' \
  -e'/RoundingFilename/s/\.rnd/.xrnd/' \
  -e'/TierFilename/s/\.tir/.xtir/'; done

Explanation: for now, we still have some non-published code that uses
the published product-file code; it writes files in the new format,
but with the old extensions. [Some of those files contain literal
ampersands, which must be replaced with xml '&' entities; after
replacement in the non-published source, we see:
  Income & Growth
  M&E
rather than:
  Income & Growth
  M&E
.]

Here's a (non-GUI) timing benchmark:

/lmi/src/lmi[0]$time make system_test >../log 2>&1
make system_test > ../log 2>&1  84.73s user 67.96s system 45% cpu 5:21.33 total

I'm using msw, so 'total' time is the most relevant measure, and it's
not materially different from HEAD:
  5:19.62 http://lists.nongnu.org/archive/html/lmi/2010-02/msg00001.html
  5:21.33 today with patch above
That's misleading, though, because the 1400 or so cells in the system
test are largely sorted by "Product", and IIRC the most-recently-read
'.*db4' file read is cached. Furthermore, in an automated test, the time
it takes to load product files is only a small proportion of the total;
but in the GUI, the delay in switching products is perceptible. To
scroll through seventy-six products (start at the top, and hold down the
down-arrow key) takes four seconds in HEAD, but thirty-five seconds
with the above patch. However, it takes only six seconds with my local
tree, in which I'm using the patch with this modification:

         static void to_xml(xml::node& out, T const& in)
         {
+            out.set_content(value_cast<std::string>(in).c_str());
- [everything else]
         }

         static void from_xml(T& out, xml::node const& in)
         {
+            out = value_cast<T>(xml_lmi::get_content(in));
- [everything else]
         }

As 'numeric_io_cast.hpp' says of stream I/O (versus standard C):

/// [...] as this discussion:
///   http://www.gotw.ca/publications/mill19.htm
/// observes, it is generally much slower, probably because of memory-
/// allocation overhead and inefficient implementations of arithmetic
/// inserters and extractors. See the accompanying unit test for a way
/// to measure the speed difference.

The arithmetic inserters and extractors really are glacially slow, at
least in MinGW gcc-3.4.5, and avoiding them makes the GUI adequately
responsive again. If we want to make it even faster, we might cache
all product databases (and perhaps even preload all of them).

However, function template numeric_io_cast() behaves differently than
an inserter with maximal precision: it's more like {FLT|DBL|LDBL}_DIG
(but adapted to each particular value). Thus, 1.0/1.00246627 is
represented with only fifteen decimal digits, which is less close to
the real value than binary storage permits--the representation error
  (.997539797523562-.9975397975235616)/.9975397975235616
in this case being slightly more than two times DBL_EPSILON. That's
enough to perturb regression tests. We can avoid such perturbations by
hand-editing the xml, e.g. with 'sed':
  /997539797523562/s/997539797523562</9975397975235616</
Saving a '.*db4' file in the product editor passes all numbers through
numeric_io_cast<>() anyway, so that workaround isn't totally tasteless;
but it would really be much better to store 'MaxMonthlyCoiRate' and
'NAARDiscount' as reciprocals anyway because that's what they normally
are (e.g., store 1.00246627 or even .00246627 in the example above).
Probably every field except those two notionally contains only decimal
numbers of modest precision anyway, and numeric_io_cast<>() suppresses
noise as in this representation:
        
<item>0.01000000000000000020816681711721685132943093776702880859375</item>
of one-tenth (which isn't ideal in the xml).




reply via email to

[Prev in Thread] Current Thread [Next in Thread]