bug-gawk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [bug-gawk] can we get a warning for undefined behavior?


From: Ed Morton
Subject: Re: [bug-gawk] can we get a warning for undefined behavior?
Date: Tue, 24 Jul 2018 08:12:20 -0500
User-agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.9.1

wrt this POSIX spec - http://pubs.opengroup.org/onlinepubs/9699919799/

On 7/24/2018 7:36 AM, address@hidden wrote:
Ed Morton <address@hidden> wrote:

I notice that these:

$ echo 'foo bar' | awk --lint '{NF--}1'        # UB: decrementing NF
foo
Not undefined. Cite chapter and verse.

It's not defined so there's nothing to cite. What happens when you increment NF is defined:
References to nonexistent fields (that is, fields after $NF), shall evaluate to the uninitialized value. Such references shall not create new fields. However, assigning to a nonexistent field (for example, $(NF+2)=5) shall increase the value of NF; create any intervening fields with the uninitialized value; and cause the value of $0 to be recomputed, with the fields being separated by the value of OFS.
and I'd expect if what happens when you decrement NF were defined it'd be in that section but it's not there nor is it anywhere else in the spec. Some other evidence that it's undefined and so different implementations behave differently with that code is that if you run the above script on BSD/OSX awk (which I'm using just because it's the only non-gawk awk I have handy) the output is "foo bar", i.e. decrementing NF does nothing.


$ echo 'foo bar' | awk --lint 'END{print $0}'    # UB: using $0 in END
foo bar
Possibly unspecified by POSIX, but I don't remember. Again, cite
chapter and verse please.

Again, it's undefined in the standard so there's nothing to cite in the standard but from the gawk manual:
The POSIX standard specifies that NF is available in an END rule. It contains the number of fields from the last input record. Most probably due to an oversight, the standard does not say that $0 is also preserved
With some awks the above code prints a null string (I've seen it but I don't remember which awk).

$ echo 'foo bar' | awk --lint '/\o/'    # UB: backslash before literal char
foo bar

$ echo 'foo bar' | awk --lint '{print > "file" 1}'    # UB: unparenthesized 
right side of I/O redirection
Also on these two.

For the escaped literal, from the gawk manual:
If you place a backslash in a string constant before something that is not one of the characters previously listed, POSIX awk purposely leaves what happens as undefined.

For the parens on redirection I couldn't find that specifically discussed in the POSIX spec or in the awk manual but I did find this in the getline section of POSIX about it:
Since in most cases such constructs are not (or at least should not) be used (because they have a natural ambiguity for which there is no conventional parsing), the meaning of these constructs has been made explicitly unspecified. (The effect is that a conforming application that runs into the problem must parenthesize to resolve the ambiguity.)
and if you try running the above print command in BSD/OSX awk then you'll get a syntax error.


Interestingly enough, gawk warns about unknown escapes in strings:

$ ./gawk --lint '"\q"' /dev/null
gawk: cmd. line:1: warning: escape sequence `\q' treated as plain `q'

I will look into something similar for regexps.
Thanks.

Any chance of getting a warning when run with --lint (or some other
--report-ub flag?) if anything in a given gawk script was undefined by
POSIX so a user has something they can run to tell them if their script
is portable to all POSIX awks or not.
Running with --posix will usually fatal on things that are gawk
extensions.  It's pretty strict.
It flags things that are defined by POSIX but not things that are undefined by POSIX. So, for example, it'll report if you're calling gensub() since POSIX defines what to do when an undefined function is called but it won't report if you're using a null string as the 3rd arg to split(). So it'll tell you if your script WILL break on a POSIX-only awk, but not if it just might break on some POSIX awks.


With respect to undefined / unspecified behaviors, if you would be kind
enough to collect a full list of them, with links / quotes from the POSIX
standard, I will look at adding warnings into gawk.
It's difficult to collect quotes from the POSIX spec for things that aren't defined by the POSIX spec (sometimes they explicitly state them but other times they simply don't - the quintessential "undefined"!) but I'll see what I can do. Let me know if you need any more info on the items above that I'm claiming are undefined.

Thanks for looking into this.

    Ed.

Thanks,

Arnold



reply via email to

[Prev in Thread] Current Thread [Next in Thread]