[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [bug-gawk] can we get a warning for undefined behavior?
From: |
arnold |
Subject: |
Re: [bug-gawk] can we get a warning for undefined behavior? |
Date: |
Tue, 24 Jul 2018 07:47:56 -0600 |
User-agent: |
Heirloom mailx 12.4 7/29/08 |
Hi Ed.
In "standards-ese" both "undefined" and "unspecified" are specific
technical terms. I thought you were saying that the particular things
were *explicitly* undefined, and that is why I wanted chapter and verse.
Indeed, there are some things that have been simply left out of the
standard. They are thus "undefined", but de-facto and not de-jure. :-)
A full list of said things is by definition harder to create. However,
using the index in the gawk manual for "dark corners" would probably be
a good way to start.
W.R.T. to NF--, it is my humble opinion that the behavior should be
well defined; however Brian Kernighan's awk totally fumbles it. Just
about every other awk handles it properly.
If you can provide a list of issues to warn about, it will help; I don't
have the time to do that. I will then figure out reasonable language
for --lint / --posix when such things are encountered.
As I mentioned earlier, --posix will catch things that are definitely
gawk extensions.
Thanks,
Arnold
Ed Morton <address@hidden> wrote:
> wrt this POSIX spec - http://pubs.opengroup.org/onlinepubs/9699919799/
>
> On 7/24/2018 7:36 AM, address@hidden wrote:
> > Ed Morton <address@hidden> wrote:
> >
> >> I notice that these:
> >>
> >> $ echo 'foo bar' | awk --lint '{NF--}1' # UB: decrementing NF
> >> foo
> > Not undefined. Cite chapter and verse.
>
> It's not defined so there's nothing to cite. What happens when you increment
> NF
> is defined:
>
> References to nonexistent fields (that is, fields after $*NF*), shall
> evaluate to the uninitialized value. Such references shall not create new
> fields. However, assigning to a nonexistent field (for example,
> $(*NF*+2)=5)
> shall increase the value of *NF*; create any intervening fields with the
> uninitialized value; and cause the value of $0 to be recomputed, with the
> fields being separated by the value of *OFS*.
>
> and I'd expect if what happens when you decrement NF were defined it'd be in
> that section but it's not there nor is it anywhere else in the spec. Some
> other
> evidence that it's undefined and so different implementations behave
> differently
> with that code is that if you run the above script on BSD/OSX awk (which I'm
> using just because it's the only non-gawk awk I have handy) the output is
> "foo
> bar", i.e. decrementing NF does nothing.
>
> >
> >> $ echo 'foo bar' | awk --lint 'END{print $0}' # UB: using $0 in END
> >> foo bar
> > Possibly unspecified by POSIX, but I don't remember. Again, cite
> > chapter and verse please.
>
> Again, it's undefined in the standard so there's nothing to cite in the
> standard
> but from the gawk manual:
>
> The POSIX standard specifies that |NF| is available in an |END| rule. It
> contains the number of fields from the last input record. Most probably
> due
> to an oversight, the standard does not say that |$0| is also preserved
>
> With some awks the above code prints a null string (I've seen it but I don't
> remember which awk).
> >
> >> $ echo 'foo bar' | awk --lint '/\o/' # UB: backslash before literal char
> >> foo bar
> >>
> >> $ echo 'foo bar' | awk --lint '{print > "file" 1}' # UB: unparenthesized
> >> right side of I/O redirection
> > Also on these two.
>
> For the escaped literal, from the gawk manual:
>
> If you place a backslash in a string constant before something that is not
> one of the characters previously listed, POSIX |awk| purposely leaves what
> happens as undefined.
>
>
> For the parens on redirection I couldn't find that specifically discussed in
> the
> POSIX spec or in the awk manual but I did find this in the getline section of
> POSIX about it:
>
> Since in most cases such constructs are not (or at least should not) be
> used
> (because they have a natural ambiguity for which there is no conventional
> parsing), the meaning of these constructs has been made explicitly
> unspecified. (The effect is that a conforming application that runs into
> the
> problem must parenthesize to resolve the ambiguity.)
>
> and if you try running the above print command in BSD/OSX awk then you'll get
> a
> syntax error.
>
> >
> > Interestingly enough, gawk warns about unknown escapes in strings:
> >
> > $ ./gawk --lint '"\q"' /dev/null
> > gawk: cmd. line:1: warning: escape sequence `\q' treated as plain `q'
> >
> > I will look into something similar for regexps.
> Thanks.
> >
> >> Any chance of getting a warning when run with --lint (or some other
> >> --report-ub flag?) if anything in a given gawk script was undefined by
> >> POSIX so a user has something they can run to tell them if their script
> >> is portable to all POSIX awks or not.
> > Running with --posix will usually fatal on things that are gawk
> > extensions. It's pretty strict.
> It flags things that are defined by POSIX but not things that are undefined
> by
> POSIX. So, for example, it'll report if you're calling gensub() since POSIX
> defines what to do when an undefined function is called but it won't report
> if
> you're using a null string as the 3rd arg to split(). So it'll tell you if
> your
> script WILL break on a POSIX-only awk, but not if it just might break on some
> POSIX awks.
>
> >
> > With respect to undefined / unspecified behaviors, if you would be kind
> > enough to collect a full list of them, with links / quotes from the POSIX
> > standard, I will look at adding warnings into gawk.
> It's difficult to collect quotes from the POSIX spec for things that aren't
> defined by the POSIX spec (sometimes they explicitly state them but other
> times
> they simply don't - the quintessential "undefined"!) but I'll see what I can
> do.
> Let me know if you need any more info on the items above that I'm claiming
> are
> undefined.
>
> Thanks for looking into this.
>
> Ed.
> >
> > Thanks,
> >
> > Arnold
> >
>