help-gawk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: ensure numeric comparison


From: Neil R. Ormos
Subject: Re: ensure numeric comparison
Date: Sun, 8 May 2022 20:56:51 -0500 (CDT)

Peng Yu wrote:
> Neil R. Ormos wrote:
>> Peng Yu wrote:
>>> david kerns wrote:

>>>> [...] see 6.3.2.1 String Type versus Numeric Type in
>>>> https://www.gnu.org/software/gawk/manual/gawk.html

>>> But my question involves empty string which is
>>> "unassigned". It is not explained in the 3x3
>>> table in that section.

>> The table in the manual does appear to answer
>> your question because the cited section of the
>> manual also states, "Uninitialized variables
>> also have the strnum attribute."

>> See also Sec. 4.2 Examining Fields:

>> | If you try to reference a field beyond the last
>> | one (such as $8 when the record has only seven
>> | fields), you get the empty string. (If used in a
>> | numeric operation, you get zero.)

>> I don't know if there are any test cases that
>> satisfy the "[...] can not compare $1 and $2
>> numerically [...]" constraint.

>> However, if you do not trust that missing
>> elements of $N are "uninitialized variables"
>> that have the strnum attribute, you can resolve
>> the question at a practical level by following
>> David Kerns' suggestion of explicitly coercing
>> the comparands to numeric type.

>> As a simplification to David Kern's suggestion,
>> I would use the unary "+" operator:

>>   print ( +$1 < +$2 )

> But this does not answer my question. If I have
> to use "+" or "+0" to ensure numeric comparison,
> what would cause the vanilla comparison "<" to
> fail under the condition "the input of the 1st
> and 2nd fields are legitimate numbers or empty
> strings (which are considered as 0)."? I can not
> see a test case for this to fail, so I don't see
> "+" or "+0" as necessary under this condition.

You have, in effect, asked two questions having a corrollary relationship.

The first question was (slightly paraphrased),

> | [How can] I [...] [be] sure that the following
> | code
> | 
> | awk -e '{ print ($1 < $2) }' < input
> | 
> | is guaranteed to compare $1 and $2 numerically
> | as long as the input of the 1st and 2nd fields
> | are legitimate numbers or empty strings (which
> | are considered as 0)?

The second question was (again paraphrased),

> | [Are there] corner cases [or] test cases [that
> | would demonstrate the failure of] the vanilla
> | comparison "<" [...]  under the condition 'the
> | input of the 1st and 2nd fields are legitimate
> | numbers or empty strings (which are considered
> | as 0).'

The cited sections of the manual, including (a) the manual's description of how 
Gawk treats comparisons of uninitialized variables, and (b) the manual's 
explanation of how Gawk treats fields beyond $NF, which is consistent with its 
treatment of uninitialized variables, appear to answer both questions.

(I won't assert the nonexistence of special cases that might cause the 
comparison to produce a surprise result because I can't prove it.)

Further, it was suggested, as a practical matter, that (c) if you want to be 
absolutely sure that a comparison treat comparands as numeric, you may coerce 
the comparands to a numeric type.  So armoring the user code moots both 
questions, at least at a practical level.

A user who finds the manual's explanation to be insufficiently authoritative, 
and who objects to the practical solution of armoring the user code by coercing 
the comparands to numeric type, might consult as the final authority the Gawk 
interpreter[*] source code, which is available for anyone to inspect.


[*] Or to the extent it's more than an interpreter, whatever you want to call 
the whole shebang.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]