bug-gnu-utils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: gawk 3.1.2g problems with NaNs and substr etc.


From: Paul Eggert
Subject: Re: gawk 3.1.2g problems with NaNs and substr etc.
Date: 07 Jul 2003 11:08:06 -0700
User-agent: Gnus/5.09 (Gnus v5.9.0) Emacs/21.3

Stepan Kasal <address@hidden> writes:

> One more comment:
> >     Be careful when comparing double to SIZE_MAX, as
> >     the comparison might return the "wrong" answer when
> >     (double) SIZE_MAX is a number that is not equal to
> >     SIZE_MAX.
> 
> > @@ -1275,7 +1275,7 @@ do_substr(NODE *tree)
> > -           if (d_length <= SIZE_MAX)
> > +           if (d_length < SIZE_MAX)
> >                     length = d_length;
> >             else
> >                     length = SIZE_MAX;
> 
> It took me some time to understand this.

Yes, floating-point arithmetic is tricky that way.

Just for the record, there is a similar problem in
gawk-3.1.2g/builtin.c:1246 that I couldn't easily fix, so I gave up on
the principle that it will probably never occur in practice and my
time is limited (sigh).  The problem is here:

        if (d_index <= SIZE_MAX)
                indx = d_index - 1;
        else
                indx = SIZE_MAX;

Suppose d_index is 2**64 and we are on a 64-bit host where SIZE_MAX is
2**64 - 1.  Then "d_index <= SIZE_MAX" will succeed, because ((double)
SIZE_MAX) equals 2**64.  But "indx = d_index - 1" will have undefined
behavior, because d_index - 1 also equals 2**64 due to rounding error.
On 64-bit SPARC the result happens to be correct, but on other hosts
indx will be set to zero in this case, leading to bizarre behavior.

Here, we can't simply replace "<=" with "<" because that will
mishandle the case where d_index is 2**31 - 1 on a 32-bit host.

Fixing this is low priority, I suppose, since gawk already has
zillions of bugs dealing with strings containing more than 2**51
characters or so, due to rounding errors.  These bugs aren't likely to
occur in practical applications for quite some time.  The simplest
fix, perhaps, is for gawk to switch to "long double" if available and
if sizeof (size_t) == sizeof (double); that isn't guaranteed to work
in general but it works on all platforms that I know of.

> > gawk-3.1.2g: cmd. line:1: warning: substr: non-integer length 0.1 will be 
> > truncated
> 
> this is OK.  Have I missed something?

It's OK, but I think it's simpler and more consistent to say that all
values less than 1 are treated as 1, rather than to have a special
case for non-integer values.  It's not a huge deal either way.

> substr() is one situation, but we should in fact audit the whole
> source.

Absolutely; my patch was just a first step.  I audited only the builtins
file.

> In cases like string indexing, I'd like to see a warning whenever NaN is
> encountered, even though the --lint option has not been switched on.

Unix nawk doesn't warn, which is perhaps why gawk doesn't warn unless
you ask.

> As GNU tools advertise "no-limits,"

Well, it's really "no arbitrary limits".  gawk shouldn't have to use
an infinite-precision arithmetic package.  Here the limits aren't that
arbitrary.

A more general question is whether gawk should faithfully reflect the
quirks of the underyling floating point, or whether it should impose
its own opinion.  Personally I favor the former, as IEEE floating
point is nearly universal and is well understood.  But gawk does the
latter, most likely for historical reasons.

For example, gawk refuses to let me compute 1.0 / 0.0, even though the
computation is perfectly well-defined with IEEE floating point.  In
contrast, mawk does the right thing (in my opinion): it quietly
returns infinity.

I could submit patches along these lines if the Gawk maintainer is
interested.  (Obviously gawk should still be portable to non-IEEE
hosts; I wouldn't change that.)




reply via email to

[Prev in Thread] Current Thread [Next in Thread]