wrt this POSIX spec - http://pubs.opengroup.org/onlinepubs/9699919799/
On 7/24/2018 7:36 AM, address@hidden wrote:
Ed Morton <address@hidden> wrote:
I notice that these:
$ echo 'foo bar' | awk --lint '{NF--}1' # UB: decrementing NF
foo
Not undefined. Cite chapter and verse.
It's not defined so there's nothing to cite. What happens when you increment NF
is defined:
References to nonexistent fields (that is, fields after $*NF*), shall
evaluate to the uninitialized value. Such references shall not create new
fields. However, assigning to a nonexistent field (for example,
$(*NF*+2)=5)
shall increase the value of *NF*; create any intervening fields with the
uninitialized value; and cause the value of $0 to be recomputed, with the
fields being separated by the value of *OFS*.
and I'd expect if what happens when you decrement NF were defined it'd be in
that section but it's not there nor is it anywhere else in the spec. Some other
evidence that it's undefined and so different implementations behave differently
with that code is that if you run the above script on BSD/OSX awk (which I'm
using just because it's the only non-gawk awk I have handy) the output is "foo
bar", i.e. decrementing NF does nothing.
$ echo 'foo bar' | awk --lint 'END{print $0}' # UB: using $0 in END
foo bar
Possibly unspecified by POSIX, but I don't remember. Again, cite
chapter and verse please.
Again, it's undefined in the standard so there's nothing to cite in the standard
but from the gawk manual:
The POSIX standard specifies that |NF| is available in an |END| rule. It
contains the number of fields from the last input record. Most probably due
to an oversight, the standard does not say that |$0| is also preserved
With some awks the above code prints a null string (I've seen it but I don't
remember which awk).
$ echo 'foo bar' | awk --lint '/\o/' # UB: backslash before literal char
foo bar
$ echo 'foo bar' | awk --lint '{print > "file" 1}' # UB: unparenthesized
right side of I/O redirection
Also on these two.
For the escaped literal, from the gawk manual:
If you place a backslash in a string constant before something that is not
one of the characters previously listed, POSIX |awk| purposely leaves what
happens as undefined.
For the parens on redirection I couldn't find that specifically discussed in the
POSIX spec or in the awk manual but I did find this in the getline section of
POSIX about it:
Since in most cases such constructs are not (or at least should not) be
used
(because they have a natural ambiguity for which there is no conventional
parsing), the meaning of these constructs has been made explicitly
unspecified. (The effect is that a conforming application that runs into
the
problem must parenthesize to resolve the ambiguity.)
and if you try running the above print command in BSD/OSX awk then you'll get a
syntax error.
Interestingly enough, gawk warns about unknown escapes in strings:
$ ./gawk --lint '"\q"' /dev/null
gawk: cmd. line:1: warning: escape sequence `\q' treated as plain `q'
I will look into something similar for regexps.
Thanks.
Any chance of getting a warning when run with --lint (or some other
--report-ub flag?) if anything in a given gawk script was undefined by
POSIX so a user has something they can run to tell them if their script
is portable to all POSIX awks or not.
Running with --posix will usually fatal on things that are gawk
extensions. It's pretty strict.
It flags things that are defined by POSIX but not things that are undefined by
POSIX. So, for example, it'll report if you're calling gensub() since POSIX
defines what to do when an undefined function is called but it won't report if
you're using a null string as the 3rd arg to split(). So it'll tell you if your
script WILL break on a POSIX-only awk, but not if it just might break on some
POSIX awks.
With respect to undefined / unspecified behaviors, if you would be kind
enough to collect a full list of them, with links / quotes from the POSIX
standard, I will look at adding warnings into gawk.
It's difficult to collect quotes from the POSIX spec for things that aren't
defined by the POSIX spec (sometimes they explicitly state them but other times
they simply don't - the quintessential "undefined"!) but I'll see what I can do.
Let me know if you need any more info on the items above that I'm claiming are
undefined.
Thanks for looking into this.
Ed.
Thanks,
Arnold