|
From: | Ed Morton |
Subject: | Re: [bug-gawk] can we get a warning for undefined behavior? |
Date: | Tue, 24 Jul 2018 08:12:20 -0500 |
User-agent: | Mozilla/5.0 (Windows NT 6.1; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.9.1 |
wrt this POSIX spec -
http://pubs.opengroup.org/onlinepubs/9699919799/ On 7/24/2018 7:36 AM, address@hidden
wrote:
Ed Morton <address@hidden> wrote:I notice that these: $ echo 'foo bar' | awk --lint '{NF--}1' # UB: decrementing NF fooNot undefined. Cite chapter and verse. It's not defined so there's nothing to cite. What happens when you increment NF is defined: References to nonexistent fields (that is, fields after $NF), shall evaluate to the uninitialized value. Such references shall not create new fields. However, assigning to a nonexistent field (for example, $(NF+2)=5) shall increase the value of NF; create any intervening fields with the uninitialized value; and cause the value of $0 to be recomputed, with the fields being separated by the value of OFS.and I'd expect if what happens when you decrement NF were defined it'd be in that section but it's not there nor is it anywhere else in the spec. Some other evidence that it's undefined and so different implementations behave differently with that code is that if you run the above script on BSD/OSX awk (which I'm using just because it's the only non-gawk awk I have handy) the output is "foo bar", i.e. decrementing NF does nothing. $ echo 'foo bar' | awk --lint 'END{print $0}' # UB: using $0 in END foo barPossibly unspecified by POSIX, but I don't remember. Again, cite chapter and verse please. Again, it's undefined in the standard so there's nothing to cite in the standard but from the gawk manual: The POSIX standard specifies thatWith some awks the above code prints a null string (I've seen it but I don't remember which awk). $ echo 'foo bar' | awk --lint '/\o/' # UB: backslash before literal char foo bar $ echo 'foo bar' | awk --lint '{print > "file" 1}' # UB: unparenthesized right side of I/O redirectionAlso on these two. For the escaped literal, from the gawk manual: If you place a backslash in a string constant before something that is not one of the characters previously listed, POSIX For the parens on redirection I couldn't find that specifically discussed in the POSIX spec or in the awk manual but I did find this in the getline section of POSIX about it: Since in most cases such constructs are not (or at least should not) be used (because they have a natural ambiguity for which there is no conventional parsing), the meaning of these constructs has been made explicitly unspecified. (The effect is that a conforming application that runs into the problem must parenthesize to resolve the ambiguity.)and if you try running the above print command in BSD/OSX awk then you'll get a syntax error. Thanks.Interestingly enough, gawk warns about unknown escapes in strings: $ ./gawk --lint '"\q"' /dev/null gawk: cmd. line:1: warning: escape sequence `\q' treated as plain `q' I will look into something similar for regexps. It flags things that are defined by POSIX but not things that are undefined by POSIX. So, for example, it'll report if you're calling gensub() since POSIX defines what to do when an undefined function is called but it won't report if you're using a null string as the 3rd arg to split(). So it'll tell you if your script WILL break on a POSIX-only awk, but not if it just might break on some POSIX awks.Any chance of getting a warning when run with --lint (or some other --report-ub flag?) if anything in a given gawk script was undefined by POSIX so a user has something they can run to tell them if their script is portable to all POSIX awks or not.Running with --posix will usually fatal on things that are gawk extensions. It's pretty strict. It's difficult to collect quotes from the POSIX spec for things that aren't defined by the POSIX spec (sometimes they explicitly state them but other times they simply don't - the quintessential "undefined"!) but I'll see what I can do. Let me know if you need any more info on the items above that I'm claiming are undefined.With respect to undefined / unspecified behaviors, if you would be kind enough to collect a full list of them, with links / quotes from the POSIX standard, I will look at adding warnings into gawk. Thanks for looking into this. Ed. Thanks, Arnold |
[Prev in Thread] | Current Thread | [Next in Thread] |