lmi
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [lmi] rint() vs. nearbyint()


From: Vadim Zeitlin
Subject: Re: [lmi] rint() vs. nearbyint()
Date: Wed, 3 Feb 2021 02:04:25 +0100

On Wed, 3 Feb 2021 00:02:28 +0000 Greg Chicares <gchicares@sbcglobal.net> wrote:

GC> Vadim--C99 (N1256, which is probably good enough for this purpose) says:
GC> 
GC> | F.9.6.4 The rint functions
GC> | The rint functions differ from the nearbyint functions only in that they 
do raise the
GC> | "inexact" floating-point exception if the result differs in value from 
the argument.
GC> 
GC> Does this mean anything in practice--i.e., do library implementations
GC> really differ? If so, is rint() implemented in terms of nearbyint(),
GC> or vice versa?

 In theory, the answer depends on the implementors choice. And in a rare
show of unity between the practice and the theory, in practice it does
actually differ between the implementations.

GC> Here's why I ask. I really do want to call one of them with an infinite
GC> argument. If either is faster, faster is preferable; and here I'm
GC> interested in establishing a general principle, so I don't care if I'm
GC> only saving a single machine cycle, as long as the savings is positive
GC> or at least nonnegative.
GC> 
GC> My guess is that rint() is implemented as nearbyint() with added
GC> guards that conditionally raise FE_INEXACT.

 This guess seems to be correct for MSVS standard library. Of course, I
don't have the sources for it, but looking at the disassembly of both
functions, they both are implemented in terms of an internal helper, but
nearbyint() basically just forwards to it after checking that it's dealing
with a valid number and not one of the floating point oddities, while
rint() does the same, but with extra code before calling the helper and
after it to raise the exception if necessary.

 Now I have no idea _why_ does it do all this, but looking at this code I
can't help wondering if MinGW definition is not suspiciously too simple...

GC> But wait--I can just check the glibc sources myself:
GC> 
GC> 
https://github.com/bminor/glibc/blob/be9b0b9a012780a403a266c90878efffb9a5f3ca/sysdeps/ieee754/dbl-64/s_rint.c
GC> 
https://github.com/bminor/glibc/blob/be9b0b9a012780a403a266c90878efffb9a5f3ca/sysdeps/ieee754/dbl-64/s_nearbyint.c
GC> 
GC> and it looks like my guess was wrong: nearbyint() is implemented as
GC> a copy of the rint() code, surrounded by additional code to save and
GC> restore fenv_t.

 Sorry, but I don't think this shows anything. You're probably looking at
the generic code but it's (almost?) never used, AFAIK all normal platforms
have the built-in versions of these functions. Unfortunately I don't know
where are those defined (I couldn't find them quickly in gcc sources,
although they definitely must be there somewhere), but then it doesn't
really matter because MinGW-w64 doesn't use GNU libc anyhow, but its own
version.

 And its sources are more straightforward to read:

https://github.com/mirror/mingw-w64/blob/fc2b4752ac61670d6d4940959a78da5ad8a9ebc4/mingw-w64-crt/math/x86/rint.c
https://github.com/mirror/mingw-w64/blob/fc2b4752ac61670d6d4940959a78da5ad8a9ebc4/mingw-w64-crt/math/x86/nearbyint.S

and I've confirmed that this is the code which actually gets used when you
call these functions, i.e. rint() is just a frndint in disguise, while
nearbyint() contains a few more instructions.

GC> Of course, some other standard library may do differently, but this
GC> is good enough for me: rint() is terser, and is likely to be faster.

 So this conclusion is still correct for the standard library we use, but
not necessarily for the other ones.

 I'd also like to note that if you really want the absolute best
performance, you might want to just inline rint() definition as an extra
function call probably accounts for a non-negligible part of this function
execution.

 And, finally, as always, I suspect that using SSE ROUNDSS instruction
(https://www.felixcloutier.com/x86/roundss), a.k.a. mm_round_ss()
intrinsic, which should be usable with gcc AFAIK, could result in an even
better performance but I didn't run any benchmarks.

 Regards,
VZ

Attachment: pgpI8lwF72qV2.pgp
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]