lmi
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [lmi] numeric_io_traits problem under Valgrind


From: Vadim Zeitlin
Subject: Re: [lmi] numeric_io_traits problem under Valgrind
Date: Tue, 27 Feb 2018 03:07:03 +0100

On Mon, 26 Feb 2018 19:14:09 +0000 Greg Chicares <address@hidden> wrote:

GC> >  Finally, to explain the use of plural in the title, there is another 
weird
GC> > problem in this code: I've also tried using valgrind while testing the bug
GC> > above and my fix for it and, somehow, using valgrind results in
GC> > 
GC> >   ???? test failed:   '15' == '16'
GC> >   [file numeric_io_test.cpp, line 148]
GC> > 
GC> > when running this test, while it passes without it. I guess this must be
GC> > due to a bug in valgrind itself because I don't see why would this 
function
GC> > call give a different result when running under it otherwise, but it's
GC> > going to be difficult to fix it there, so I'd like to find some way of
GC> > skipping this test, either only when using valgrind or completely under
GC> > Linux. What do you think?
GC> 
GC> As soon as I saw that diagnostic, I remembered:
GC> 
GC>     // The following test failed for como with mingw (although with a
GC>     // value of 0.45036 it unsurprisingly succeeded). It was observed
GC>     // to fail also with x86_64-linux-gnu, but only because of a
GC>     // mistake that was found before committing, i.e., using log10()
GC>     // instead of std::log10() in the implementation caused C function
GC>     // log10(double) to be called instead of log10l().
GC>     BOOST_TEST_EQUAL(15, floating_point_decimals(0.4503599627370497));
GC> 
GC> It failed for como; now we turn a blind eye to that because como's
GC> compiler is no more.
GC> 
GC> Then it failed for x86_64-linux-gnu, and we tracked down and fixed
GC> the actual problem, as documented above.
GC> 
GC> Now it's failing again. I wouldn't want to
GC> 
GC>   if(some_compiler) goto turn_a_blind_eye_yet_again;
GC>     BOOST_TEST_EQUAL(15, floating_point_decimals(0.4503599627370497));
GC>   turn_a_blind_eye_yet_again:
GC> 
GC> without trying really hard to get to the root of the problem.

 Unfortunately I think that I did get to the root of the problem and it
wasn't very hard -- but neither really useful. To quote Valgrind book from
http://www.network-theory.co.uk/docs/valgrind/valgrind_27.html

        > As of version 3.0.0, Valgrind has the following limitations in
        > its implementation of x86/AMD64 floating point relative to
        > IEEE754. Precision: There is no support for 80 bit arithmetic.
        > Internally, Valgrind represents all such “long double” numbers in
        > 64 bits, and so there may be some differences in results. Whether
        > or not this is critical remains to be seen. Note, the x86/amd64
        > fldt/fstpt instructions (read/write 80-bit numbers) are correctly
        > simulated, using conversions to/from 64 bits, so that in-memory
        > images of 80-bit numbers look correct if anyone wants to see. The
        > impression observed from many FP regression tests is that the
        > accuracy differences aren't significant. Generally speaking, if a
        > program relies on 80-bit precision, there may be difficulties
        > porting it to non x86/amd64 platforms which only support 64-bit
        > FP precision. Even on x86/amd64, the program may get different
        > results depending on whether it is compiled to use SSE2
        > instructions (64-bits only), or x87 instructions (80-bit). The
        > net effect is to make FP programs behave as if they had been run
        > on a machine with 64-bit IEEE floats, for example PowerPC.

 So this looks like a direct consequence of limitations in the current
Valgrind FP support implementation, because log10l() of the number above is
very close to 1e-16 and the loss/extra precision makes it go just
above/below it.

 Of course, personally I think that lmi should have switched to using SSE
(and hence 64 bit precision) anyhow since ages, so for me the logical
conclusion would be to just remove this test completely -- and not do it
conditionally for Valgrind. But as long as we cling to 80 bit precision,
I'll just have to live with this unit test failing when run under Valgrind.
This doesn't matter that much now, but I'd still like to set up continuous
integration for lmi on GitHub one of these days, and run all the unit tests
on each commit and, ideally, do it under Valgrind too -- and then it would
start to matter. But considering that I plan to do this since many years
already, it's probably not extremely urgent to fix this.

 Regards,
VZ


reply via email to

[Prev in Thread] Current Thread [Next in Thread]