bug-gawk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [bug-gawk] can we get a warning for undefined behavior?


From: Ed Morton
Subject: Re: [bug-gawk] can we get a warning for undefined behavior?
Date: Tue, 24 Jul 2018 09:30:09 -0500
User-agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.9.1

Arnold - attached are the ones I either already knew about or found in the POSIX spec by looking for "undefined". There are more but either gawk produces an error so the script can't be assumed to be portable, or they're far less common cases so I personally don't care if they get reported or not (and since I'm compiling the list ... :-).

Regards,

    Ed.

On 7/24/2018 8:47 AM, address@hidden wrote:
Hi Ed.

In "standards-ese" both "undefined" and "unspecified" are specific
technical terms.  I thought you were saying that the particular things
were *explicitly* undefined, and that is why I wanted chapter and verse.

Indeed, there are some things that have been simply left out of the
standard. They are thus "undefined", but de-facto and not de-jure. :-)

A full list of said things is by definition harder to create.  However,
using the index in the gawk manual for "dark corners" would probably be
a good way to start.

W.R.T. to NF--, it is my humble opinion that the behavior should be
well defined; however Brian Kernighan's awk totally fumbles it. Just
about every other awk handles it properly.

If you can provide a list of issues to warn about, it will help; I don't
have the time to do that.  I will then figure out reasonable language
for --lint / --posix when such things are encountered.

As I mentioned earlier, --posix will catch things that are definitely
gawk extensions.

Thanks,

Arnold

Ed Morton <address@hidden> wrote:

wrt this POSIX spec - http://pubs.opengroup.org/onlinepubs/9699919799/

On 7/24/2018 7:36 AM, address@hidden wrote:
Ed Morton <address@hidden> wrote:

I notice that these:

$ echo 'foo bar' | awk --lint '{NF--}1'        # UB: decrementing NF
foo
Not undefined. Cite chapter and verse.
It's not defined so there's nothing to cite. What happens when you increment NF
is defined:

     References to nonexistent fields (that is, fields after $*NF*), shall
     evaluate to the uninitialized value. Such references shall not create new
     fields. However, assigning to a nonexistent field (for example, 
$(*NF*+2)=5)
     shall increase the value of *NF*; create any intervening fields with the
     uninitialized value; and cause the value of $0 to be recomputed, with the
     fields being separated by the value of *OFS*.

and I'd expect if what happens when you decrement NF were defined it'd be in
that section but it's not there nor is it anywhere else in the spec. Some other
evidence that it's undefined and so different implementations behave differently
with that code is that if you run the above script on BSD/OSX awk (which I'm
using just because it's the only non-gawk awk I have handy) the output is "foo
bar", i.e. decrementing NF does nothing.

$ echo 'foo bar' | awk --lint 'END{print $0}'    # UB: using $0 in END
foo bar
Possibly unspecified by POSIX, but I don't remember. Again, cite
chapter and verse please.
Again, it's undefined in the standard so there's nothing to cite in the standard
but from the gawk manual:

     The POSIX standard specifies that |NF| is available in an |END| rule. It
     contains the number of fields from the last input record. Most probably due
     to an oversight, the standard does not say that |$0| is also preserved

With some awks the above code prints a null string (I've seen it but I don't
remember which awk).
$ echo 'foo bar' | awk --lint '/\o/'    # UB: backslash before literal char
foo bar

$ echo 'foo bar' | awk --lint '{print > "file" 1}'    # UB: unparenthesized
right side of I/O redirection
Also on these two.
For the escaped literal, from the gawk manual:

     If you place a backslash in a string constant before something that is not
     one of the characters previously listed, POSIX |awk| purposely leaves what
     happens as undefined.


For the parens on redirection I couldn't find that specifically discussed in the
POSIX spec or in the awk manual but I did find this in the getline section of
POSIX about it:

     Since in most cases such constructs are not (or at least should not) be 
used
     (because they have a natural ambiguity for which there is no conventional
     parsing), the meaning of these constructs has been made explicitly
     unspecified. (The effect is that a conforming application that runs into 
the
     problem must parenthesize to resolve the ambiguity.)

and if you try running the above print command in BSD/OSX awk then you'll get a
syntax error.

Interestingly enough, gawk warns about unknown escapes in strings:

$ ./gawk --lint '"\q"' /dev/null
gawk: cmd. line:1: warning: escape sequence `\q' treated as plain `q'

I will look into something similar for regexps.
Thanks.
Any chance of getting a warning when run with --lint (or some other
--report-ub flag?) if anything in a given gawk script was undefined by
POSIX so a user has something they can run to tell them if their script
is portable to all POSIX awks or not.
Running with --posix will usually fatal on things that are gawk
extensions.  It's pretty strict.
It flags things that are defined by POSIX but not things that are undefined by
POSIX. So, for example, it'll report if you're calling gensub() since POSIX
defines what to do when an undefined function is called but it won't report if
you're using a null string as the 3rd arg to split(). So it'll tell you if your
script WILL break on a POSIX-only awk, but not if it just might break on some
POSIX awks.

With respect to undefined / unspecified behaviors, if you would be kind
enough to collect a full list of them, with links / quotes from the POSIX
standard, I will look at adding warnings into gawk.
It's difficult to collect quotes from the POSIX spec for things that aren't
defined by the POSIX spec (sometimes they explicitly state them but other times
they simply don't - the quintessential "undefined"!) but I'll see what I can do.
Let me know if you need any more info on the items above that I'm claiming are
undefined.

Thanks for looking into this.

      Ed.
Thanks,

Arnold


Attachment: undefinedBehavior.xlsx
Description: application/vnd.openxmlformats-officedocument.spreadsheetml.sheet


reply via email to

[Prev in Thread] Current Thread [Next in Thread]