lmi
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [lmi] feholdexcept()


From: Vadim Zeitlin
Subject: Re: [lmi] feholdexcept()
Date: Mon, 9 Jan 2017 15:27:35 +0100

On Mon, 9 Jan 2017 04:29:31 +0000 Greg Chicares <address@hidden> wrote:

GC> On 2017-01-08 23:53, Vadim Zeitlin wrote:
...
GC> >  However -- and, again, I'm very sorry for not understanding nor testing 
it
GC> > before -- feholdexcept() doesn't actually affect setting the bits of the
GC> > control word at all. So, as can be seen by running this simple program:
GC> 
GC> [...extensively snipped...]
GC> 
GC> > int main(int argc, char const* argv[])
GC> > {
GC> [...]
GC> >     float x = argc / 193.0f;
GC> > 
GC> > which outputs
GC> > 
GC> > ---------------------------------- >8 
--------------------------------------
GC> > Initially:      00000000
GC> > After div:      00000020
GC> > ---------------------------------- >8 
--------------------------------------
GC> 
GC> What 'argv' did you use? (I guess anything other than an exact multiple
GC> of 193.)

 Yes. Later I also tested with

        float x = argc*0.85f;

and then the FE_INEXACT bit is set only if more than 7 arguments are
specified:

---------------------------------- >8 --------------------------------------
% ./a.out 1 2 3 4 5 6 7
Initially:      00000000
After mul:      00000000
% ./a.out 1 2 3 4 5 6 7 8
Initially:      00000000
After mul:      00000020
---------------------------------- >8 --------------------------------------

GC> Just to rule out any gcc defect...does that program behave the same way
GC> with msvc?

 Yes, except that the flag values are different and FE_INEXACT is 1. But
the behaviour is exactly the same and, in fact, I had originally debugged
this with MSVC before reproducing it with gcc.

GC> > They will still be set irrespectively of whether feholdexcept() was called
GC> > or not.
GC> 
GC> But that's okay. The flags in the status word are often raised,
GC> especially FE_INEXACT. Masking them in the control word doesn't keep
GC> them from being raised; it just prevents them from causing interrupts.

 Yes, this is what I failed to understand.

GC> >  And this means that my changes were completely wrong because we can't
GC> > check that 0 == std::fetestexcept(FE_ALL_EXCEPT) in fenv_is_valid() as 
this
GC> > check fails all the time and, in fact, I've implemented a nice DoS attack
GC> > against myself in this way, because I can't even close lmi currently as it
GC> > just keeps spawning message box after message box... This is a problem in
GC> > its own right, I think we should have some limit on the number of message
GC> > boxes shown (per minute?)
GC> 
GC> I'm not sure that's a good idea. Suppose we've used up the present minute's
GC> quota of messageboxes with non-urgent diagnostics, and then a really urgent
GC> one (like "out of memory") comes along...and is deferred, potentially for
GC> fifty-nine seconds.

 I was speaking specifically of these message boxes complaining about the
floating point control word changes and I still think we could want to rate-
limit those.

GC> > but in the meanwhile we need to just remove this
GC> > check for fetestexcept() and, probably, feholdexcept() call as well as 
it's
GC> > useless. Here is the trivial patch doing just this:
GC> 
GC> I'm not yet convinced about all of this:
GC> 
GC> > ---------------------------------- >8 
--------------------------------------
GC> > diff --git a/fenv_lmi.cpp b/fenv_lmi.cpp
GC> > index 351400d..db5e9af 100644
GC> > --- a/fenv_lmi.cpp
GC> > +++ b/fenv_lmi.cpp
GC> > @@ -65,8 +65,6 @@ void fenv_initialize()
GC> >  #if defined LMI_X87
GC> >      x87_control_word(default_x87_control_word());
GC> >  #else  // !defined LMI_X87
GC> > -    std::fenv_t saved_env;
GC> > -    LMI_ASSERT(0 == std::feholdexcept(&saved_env));
GC> >      LMI_ASSERT(0 == std::fesetround(FE_TONEAREST));
GC> 
GC> Here, the intention is to ensure that all exceptions are masked.

 Yes, but AFAICS this is always the case by default anyhow, so while this
call is harmless, it also seems useless.

GC> Maybe we could test that by calling fegetenv(), obtaining an fenv_t
GC> that "represents the entire floating-point environment" [7.6/3]. However
GC> (as I think we recently discussed), we don't necessarily know whether
GC> "entire" means the limited entirety contemplated by the standard, or
GC> the plenary entirety of the actual hardware;

 And this is not obvious to discover even from reading the sources, e.g.
looking at glibc /usr/include/x86_64-linux-gnu/bits/fenv.h it would seem
that fenv_t doesn't store SSE control word in 32 bit builds:
---------------------------------- >8 --------------------------------------
/* Type representing floating-point environment.  This structure
   corresponds to the layout of the block written by the `fstenv'
   instruction and has additional fields for the contents of the MXCSR
   register as written by the `stmxcsr' instruction.  */
typedef struct
  {
    unsigned short int __control_word;
    unsigned short int __glibc_reserved1;
    unsigned short int __status_word;
    unsigned short int __glibc_reserved2;
    unsigned short int __tags;
    unsigned short int __glibc_reserved3;
    unsigned int __eip;
    unsigned short int __cs_selector;
    unsigned int __opcode:11;
    unsigned int __glibc_reserved4:5;
    unsigned int __data_offset;
    unsigned short int __data_selector;
    unsigned short int __glibc_reserved5;
#ifdef __x86_64__
    unsigned int __mxcsr;
#endif
  }
fenv_t;
---------------------------------- >8 --------------------------------------
but according to https://sourceware.org/bugzilla/show_bug.cgi?id=16064 it
actually does store MXCSR in the (otherwise unused) __eip field of this
struct since version 2.20 of glibc! And, indeed, testing shows that 32 bit
applications compiled using "-msse -mfpmath=sse" preserve MXCSR correctly
when using glibc 2.24 from Debian Sid, but not with 2.19 from Jessie.

GC> and, because fenv_t is utterly opaque, we don't know whether it has the
GC> Comparable nature.

 Knowing that this is a POD C struct which can't require any dynamic
allocation, it seems impossible for it to not be memcmp()-able.
 
GC> (If it does, then it's only in a limited sense: cygwin's FE_DFL_ENV
GC> is zero, so we can't compare to *FE_DFL_ENV.)

 But this is because it's supposed to be different from any valid pointer,
I think (FWIW, under Linux FE_DFL_ENV is -1 (cast to the appropriate
pointer type) while FE_NOMASK_ENV is -2). As long as we only do per-byte
comparison of our own fenv_t structs filled by fegetenv(), we should be
fine -- but it doesn't solve our problem.

GC> Perhaps we should unit-test this. E.g., using asm to talk directly
GC> to the hardware where required:
GC>  - mask all exceptions
GC>  - call fegetenv() and store what it gives us
GC>  - use asm to un-mask exceptions
GC>  - call fegetenv() again
GC>  - attempt to compare the "before" and "after" fenv_t objects.
GC> If we can prove that this works for a particular compiler on a particular
GC> platform, then we can write production code with standard functions only,
GC> confining asm to a unit test that proves the production code is valid.

 It does "work" in the strict sense that the two fenv_t objects are indeed
different (except with glibc < 2.20...). But it doesn't help because they
are also different if FE_INEXACT bit is set. We can't distinguish between
the change to the "PE" bit of MXCSR and "PM" bit by just comparing fenv_t
objects, unfortunately (see figure 10-3 of Intel documentation at
http://www.intel.com/Assets/en_US/PDF/manual/253665.pdf for the notations).

 So I'm afraid you were right and we do need to resort to inline assembly
and compare just the 10 higher bits of MXCSR for change (which would also
detect FZ and DAZ changes, that we didn't mention at all so far, but which
are, as Intel documentation states, "incompatible with IEEE Standard 754"
and so which we probably do want to check). C99/C++11 is of no help :-(

 Of course, the alternative solution of not doing anything (at least for
now, ie until we have any reason to believe this problem exists in 64 bit
builds in the first place in practice) is also still possible and even
seems to be more and more attractive...

 Please let me know if you'd like me to make a PR with inline assembly for
SSE-based environments or if we should just leave the code in its current
state which is, IMO, acceptable (after applying my patch from the previous
message to make 64 bit builds usable).

 Thanks,
VZ


reply via email to

[Prev in Thread] Current Thread [Next in Thread]