bug-gawk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [bug-gawk] Overflow to Infinity


From: arnold
Subject: Re: [bug-gawk] Overflow to Infinity
Date: Sun, 15 Jul 2018 00:32:03 -0600
User-agent: Heirloom mailx 12.4 7/29/08

Hi Eli.

Eli Zaretskii <address@hidden> wrote:

> > From: address@hidden
> > Date: Fri, 13 Jul 2018 06:28:17 -0600
> > Cc: address@hidden, address@hidden,
> >         address@hidden
> > 
> > > So the behavior will not really be like C, where NaN == NaN yields
> > > zero, right?  I think this part of the C behavior makes no sense in
> > > Awk.
> > 
> > Exactly the opposite. Right now, NaN values compare equal to each
> > other, both for array sort ordering, and for ==.  That NaN and Infinity
> > values in awk do not act like C doubles for the regular comparison
> > operators (==, !=, <, etc.) is a source of confusion to users, and the
> > questions about it have started becoming more frequent.
>
> I don't understand what do you mean by "like C doubles for the regular
> comparison operators".  I can tell what I was talking about: the fact
> that _any_ comparison with a NaN always yields zero (a.k.a. "false").
> So a < NaN => false, but also a >= NaN => false, and NaN == NaN => false. 

Except for !=, where the result is true when one side is a NaN.

> How can this make sense?

It clearly made sense to the IEEE guys; something that isn't a number
can't be meaningfully compared to one, which is why all the other
operators return false.  (I'm not qualified to judge if having NaN
in the hardware makes intrinsic sense. But I am stuck with users who
expect numeric comparisons to behave in a specific fashion.)

Right now, gawk's comparison of numeric values involving NaN and
Infinity is intended to make array sorting work correctly. In IEEE
math, NaN != NaN, but if we allowed that when trying to sort by numeric
indices, the sort would never work.  But the routine (as cited by Andy,
below) yields results when doing comparisons that are very different
from what the hardware yields.

> What kind of users will be confused by deviations from this craziness?

Those who expect comparisons on floating point numbers to work as
specified by IEEE 754.  And given that we document that numbers in
awk are the C type double, this isn't (IMHO) such an unreasonable
expectation. (I acknowledge that in your terms, they expect the craziness.
And personally, I wish this would all just go away.  But it's gotten
to the point that I can no longer ignore it.)

> In C, we just get what the underlying machine instructions (codified
> by IEEE) give us, but Awk is a language that isn't required to follow
> that, AFAIU.  E.g., we don't have QNaN and SNaN, and I hope never
> will.  On top of that, Awk sometimes doesn't interpret strings like
> numbers, so we have even more reasons to deviate from IEEE here.

So, this is a delicate balance, and it's why I require 'nan' or 'inf' to have
a leading sign to be interpreted as a number.  It avoids stuff like

        $ echo Nancy Reagan | gawk --posix '{ print $1 + 0 }'
        nan

> Date: Fri, 13 Jul 2018 11:25:01 -0400
> From: "Andrew J. Schorr" <address@hidden>
> To: Eli Zaretskii <address@hidden>
> Cc: Arnold Robbins <address@hidden>, address@hidden,
>         address@hidden
> Subject: Re: [bug-gawk] Overflow to Infinity
>
> I must confess that I have never grappled with this issue before. I generally
> try to avoid calculations that generate NaN or Inf. But I really do not 
> see why gawk should treat NaN differently than C does. Why would we want
> to deviate from the standard NaN and Inf handling? I guess part of the problem
> is in this logic in node.c:
>
> int
> cmp_awknums(const NODE *t1, const NODE *t2)
> {
>       /*
>        * This routine is also used to sort numeric array indices or values.
>        * For the purposes of sorting, NaN is considered greater than
>        * any other value, and all NaN values are considered equivalent and 
> equal.
>        * This isn't in compliance with IEEE standard, but compliance w.r.t. 
> NaN
>        * comparison at the awk level is a different issue, and needs to be 
> dealt
>        * with in the interpreter for each opcode seperately.
>        */
>
>       if (isnan(t1->numbr))
>               return ! isnan(t2->numbr);
>       if (isnan(t2->numbr))
>               return -1;
>       /* don't subtract, in case one or both are infinite */
>       if (t1->numbr == t2->numbr)
>               return 0;
>       if (t1->numbr < t2->numbr)
>               return -1;
>       return 1;
> }
>
> The obvious fix seems to be:
>
> int
> cmp_awknums(const NODE *t1, const NODE *t2)
> {
>       return (t1->numbr == t2->numbr) ? 0 :
>              ((t1->numbr < t2->numbr) ? -1 : 1);
> }

As I explained above, we can't do this because of array index sorting.

The 'each opcode' bit refers to interpret.h and eval.c, which handle
the 6 comparison operators.  It is there that I am working on a change.

Believe me, I know how messy this all is. There's no perfect answer
here. I'm trying to strike the right balance.

Thanks,

Arnold



reply via email to

[Prev in Thread] Current Thread [Next in Thread]