bug-gawk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [bug-gawk] can we get a warning for undefined behavior?


From: arnold
Subject: Re: [bug-gawk] can we get a warning for undefined behavior?
Date: Tue, 24 Jul 2018 07:47:56 -0600
User-agent: Heirloom mailx 12.4 7/29/08

Hi Ed.

In "standards-ese" both "undefined" and "unspecified" are specific 
technical terms.  I thought you were saying that the particular things
were *explicitly* undefined, and that is why I wanted chapter and verse.

Indeed, there are some things that have been simply left out of the
standard. They are thus "undefined", but de-facto and not de-jure. :-)

A full list of said things is by definition harder to create.  However,
using the index in the gawk manual for "dark corners" would probably be
a good way to start.

W.R.T. to NF--, it is my humble opinion that the behavior should be
well defined; however Brian Kernighan's awk totally fumbles it. Just
about every other awk handles it properly.

If you can provide a list of issues to warn about, it will help; I don't
have the time to do that.  I will then figure out reasonable language
for --lint / --posix when such things are encountered.

As I mentioned earlier, --posix will catch things that are definitely
gawk extensions.

Thanks,

Arnold

Ed Morton <address@hidden> wrote:

> wrt this POSIX spec - http://pubs.opengroup.org/onlinepubs/9699919799/
>
> On 7/24/2018 7:36 AM, address@hidden wrote:
> > Ed Morton <address@hidden> wrote:
> >
> >> I notice that these:
> >>
> >> $ echo 'foo bar' | awk --lint '{NF--}1'        # UB: decrementing NF
> >> foo
> > Not undefined. Cite chapter and verse.
>
> It's not defined so there's nothing to cite. What happens when you increment 
> NF 
> is defined:
>
>     References to nonexistent fields (that is, fields after $*NF*), shall
>     evaluate to the uninitialized value. Such references shall not create new
>     fields. However, assigning to a nonexistent field (for example, 
> $(*NF*+2)=5)
>     shall increase the value of *NF*; create any intervening fields with the
>     uninitialized value; and cause the value of $0 to be recomputed, with the
>     fields being separated by the value of *OFS*.
>
> and I'd expect if what happens when you decrement NF were defined it'd be in 
> that section but it's not there nor is it anywhere else in the spec. Some 
> other 
> evidence that it's undefined and so different implementations behave 
> differently 
> with that code is that if you run the above script on BSD/OSX awk (which I'm 
> using just because it's the only non-gawk awk I have handy) the output is 
> "foo 
> bar", i.e. decrementing NF does nothing.
>
> >
> >> $ echo 'foo bar' | awk --lint 'END{print $0}'    # UB: using $0 in END
> >> foo bar
> > Possibly unspecified by POSIX, but I don't remember. Again, cite
> > chapter and verse please.
>
> Again, it's undefined in the standard so there's nothing to cite in the 
> standard 
> but from the gawk manual:
>
>     The POSIX standard specifies that |NF| is available in an |END| rule. It
>     contains the number of fields from the last input record. Most probably 
> due
>     to an oversight, the standard does not say that |$0| is also preserved
>
> With some awks the above code prints a null string (I've seen it but I don't 
> remember which awk).
> >
> >> $ echo 'foo bar' | awk --lint '/\o/'    # UB: backslash before literal char
> >> foo bar
> >>
> >> $ echo 'foo bar' | awk --lint '{print > "file" 1}'    # UB: unparenthesized
> >> right side of I/O redirection
> > Also on these two.
>
> For the escaped literal, from the gawk manual:
>
>     If you place a backslash in a string constant before something that is not
>     one of the characters previously listed, POSIX |awk| purposely leaves what
>     happens as undefined.
>
>
> For the parens on redirection I couldn't find that specifically discussed in 
> the 
> POSIX spec or in the awk manual but I did find this in the getline section of 
> POSIX about it:
>
>     Since in most cases such constructs are not (or at least should not) be 
> used
>     (because they have a natural ambiguity for which there is no conventional
>     parsing), the meaning of these constructs has been made explicitly
>     unspecified. (The effect is that a conforming application that runs into 
> the
>     problem must parenthesize to resolve the ambiguity.)
>
> and if you try running the above print command in BSD/OSX awk then you'll get 
> a 
> syntax error.
>
> >
> > Interestingly enough, gawk warns about unknown escapes in strings:
> >
> > $ ./gawk --lint '"\q"' /dev/null
> > gawk: cmd. line:1: warning: escape sequence `\q' treated as plain `q'
> >
> > I will look into something similar for regexps.
> Thanks.
> >
> >> Any chance of getting a warning when run with --lint (or some other
> >> --report-ub flag?) if anything in a given gawk script was undefined by
> >> POSIX so a user has something they can run to tell them if their script
> >> is portable to all POSIX awks or not.
> > Running with --posix will usually fatal on things that are gawk
> > extensions.  It's pretty strict.
> It flags things that are defined by POSIX but not things that are undefined 
> by 
> POSIX. So, for example, it'll report if you're calling gensub() since POSIX 
> defines what to do when an undefined function is called but it won't report 
> if 
> you're using a null string as the 3rd arg to split(). So it'll tell you if 
> your 
> script WILL break on a POSIX-only awk, but not if it just might break on some 
> POSIX awks.
>
> >
> > With respect to undefined / unspecified behaviors, if you would be kind
> > enough to collect a full list of them, with links / quotes from the POSIX
> > standard, I will look at adding warnings into gawk.
> It's difficult to collect quotes from the POSIX spec for things that aren't 
> defined by the POSIX spec (sometimes they explicitly state them but other 
> times 
> they simply don't - the quintessential "undefined"!) but I'll see what I can 
> do. 
> Let me know if you need any more info on the items above that I'm claiming 
> are 
> undefined.
>
> Thanks for looking into this.
>
>      Ed.
> >
> > Thanks,
> >
> > Arnold
> >
>



reply via email to

[Prev in Thread] Current Thread [Next in Thread]