lmi
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [lmi] MinGW-w64 anomaly?


From: Greg Chicares
Subject: Re: [lmi] MinGW-w64 anomaly?
Date: Sun, 18 Dec 2016 22:18:07 +0000
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Icedove/45.4.0

On 2016-12-18 20:54, Vadim Zeitlin wrote:
> On Sun, 18 Dec 2016 14:28:46 +0000 Greg Chicares <address@hidden> wrote:
> 
> GC> Vadim--Could you please take a look at this? It's an extremely simplified
> GC> test case, but it's important for the way lmi performs rounding.
> GC> 
> GC> Apply the patch below [0] and run:
> GC>   make unit_tests unit_test_targets=sandbox_test.exe
> GC> 
> GC> Cross-compiling for msw on debian with MinGW-w64 gcc-4.9.1, I see:
> 
> [Let me start with a digression not affecting lmi directly, so please feel
>  free to skip it completely]
> 
>  Interestingly enough, this discrepancy doesn't appear if -std=c++11 option
> is not used: I started testing this outside of lmi initially and saw the
> same results in all 4 cases. I had to add -std=c++11 to see the same thing
> as in lmi unit test. This was surprising, so I decided to check why was it
> the case and it turned out that <cmath> has a std::pow() overload for
> "(long double, int)" using __builtin_powil() in pre-C++11 mode, but uses
> __builtin_powl() for all types of arguments for long double in C++11. And
> according to http://en.cppreference.com/w/cpp/numeric/math/pow this is
> correct, however I think it might be one of a few cases in which this
> reference is wrong. It's quite understandable in this particular case
> because it seems that the standard went through several iterations and, at
> some point, these functions were indeed removed (see e.g. the proposal at
> http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2011/n3286.html#550).
> But the answer at http://stackoverflow.com/a/5627278/15275, written by
> Howard Hinnant himself, says that this overload finally hasn't been
> removed, so MinGW-w64 shouldn't have this "#if __cplusplus < 201103L" check
> in its <cmath> -- even though it does in the version we're using and still
> seems to be the case in the very latest version of libstdc++, see
> https://github.com/gcc-mirror/gcc/blob/6514c52372bb36e555843b0419d71cf55eda0409/libstdc++-v3/include/c_global/cmath#L395
> So I opened https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78851 to see if it
> could/should be fixed.
> 
> [End of the digression]

Thank you very much. Then the anomaly really is upstream, in libstdc++.

At first reading, I was saddened by this:

  http://stackoverflow.com/a/5627278/15275
| So while the freedom is there to shuffle pow(double, int) off to a
| separate algorithm, most implementors I'm aware of have given up on
| that strategy

until I realized that his example really showed this expression
  pow(.1, 20)
being "optimized" as
  0.1 * 0.1 * 0.1 ...
which does harm accuracy. However, if we rewrite the expression as
the presumably intended
  pow(10.0L, -20)
and optimize it as
  1.0L / (10.0L * 10.0L * 10.0L ...)
then accuracy improves.

IOW, instead of an optimized overload for
  long double std::pow(long double, int)
I'm in effect looking for one like this:
  long double std::pow(int, int)
although of course C++ doesn't overload on return type.

> GC> Running sandbox_test:
> GC> f2():
> GC>    0.0000000001000000000000000000018377930496 1/10^10
> GC>    0.0000000001000000000000000000397031165002  10^-10
> 
>  Just for the reference, here are the binary (i.e. exact) representations
> of these numbers in 128 bit IEEE format:
> 
>  1.b7cdfd9d7bdba b7e0000000005b1p-34
>  1.b7cdfd9d7bdba b8a0000000002b5p-34
> 
> where I added a space after the first 52 bits which is all we have in the
> in-memory double representation. This at least confirms that both values
> are rounded to the same 1.b7cdfd9d7bdbbp-34 when using 64 bit doubles.

Yet I really do want long doubles.

> GC> f3():
> GC>    0.0000000001000000000000000000018377930496 1/10^10
> GC>    0.0000000001000000000000000000018377930496  10^-10
> GC> 
> GC> What concerns me is the second of those four numbers: it's less accurate
> GC> than the other three. If an 80-byte register is being spilled to 64-byte
> GC> storage, I'd expect the result to be less accurate than this. I just
> GC> can't see any way to explain it.
> 
>  FWIW I could confirm, using my recently found knowledge of
> -fexcess-precision option, that this is not related to spilling x87
> registers to memory. If I convert your example to the sensibly equivalent C
> version:

[...]

> and compile it with -fexcess-precision=standard (and still with -O2, of
> course), it doesn't change anything and I still get the same results.
> 
>  But in the C version it's very simple to see what goes on however: f3()
> basically just prints the constant value, while f2() calls _powl()
> function. So what you see is basically that the results of compile-time
> computation are not exactly the same as the results of run-time
> computation. And I can even explain this further: I'm pretty sure that
> compile-time results are computed using the same __builtin_powil() that is
> used in C++98 code and it's not really surprising that its much simpler
> algorithm using just multiplications and inversion gives different results
> from the general-purpose powl(). Notice that __builtin_powil() is, of
> course, fully inlined and you get efficiently computed x^-10 by
> successively computing, using the single multiplication instruction, x,
> x^2, x^3, x^5, x^10 and then inverting it using just one other instruction,
> so it's clearly not something powl() can do -- it's already nice that it
> gives the same result as compile-time calculation for x^10 itself.
> 
>  BTW, if you add -fwhole-program (or -flto, but I can't examine its results
> as easily as the assembler code) option to gcc command line, then f2()
> would be reduced to the same trivial code (and both functions will be
> inlined directly into main() too).

This discussion leads me to think that instead of hoping for std::pow()
to do exactly what I want, I might try [untested sketch]:

long double lmi_powli(long double base, int iexp)
{
    if(i < 0)
      return 1.0L / lmi_powli(base, -iexp);
    else
      return std::pow(base, iexp);
}

Then, when 'base' has an exact integer value, a standard-conforming
implementation is at least highly likely to do what I want.

> GC> I can work around this by taking the reciprocal of the product of N tens,
> GC> which seems always to be as accurate as the best case with std::pow(),
> GC> but I'd rather avoid that.

It's probably time to reveal my motivation, which is to modernize and
simplify 'round_to.hpp'. I didn't want to state that earlier because
that file will soon become smaller and much clearer.

>  I'm not sure what exactly is worth working around here. Both results are
> correct for the precision of standard double type and you would get the
> same output if you stored them in a "volatile double". But if you really
> want to get best performance and identical output in both cases, you could
> use __builtin_powil() manually in f2().

I think the lmi_powli() idea above accomplishes *almost* the same thing
without relying on gcc internals (so it's more portable across compilers
and across versions of gcc). I say *almost* because I think lmi_powli()
as sketched above depends on the FP rounding direction. I could of course
set the rounding direction explicitly, though that may be costly. But I
could build a cache at compile time, which might implicitly call the
builtin.

The motivation is to round a double to 'iexp' decimals, with various
rounding directions, e.g.:
  floor(r * scale_fwd_) * scale_back_
  trunc(r * scale_fwd_) * scale_back_
  ceil (r * scale_fwd_) * scale_back_
where 'scale_fwd_' and 'scale_back_' are reciprocals that are both
long double powers of ten. It's similar to this idea:

http://florian.loitsch.com/publications/dtoa-pldi2010.pdf?attredirects=0
| Grisu needs a cache of precomputed powers-of-ten. The cache must be
| precomputed using high-precision integer arithmetic.

except that I'm trying to use 'long double' as my higher-precision
arithmetic.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]