[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: variable assignment with -v behavior changed for string that looks l
From: |
arnold |
Subject: |
Re: variable assignment with -v behavior changed for string that looks like a strongly typed regexp |
Date: |
Fri, 16 Apr 2021 01:34:58 -0600 |
User-agent: |
Heirloom mailx 12.5 7/5/10 |
$ gawk --version
GNU Awk 5.1.0, API: 3.0 (GNU MPFR 4.0.1, GNU MP 6.1.2)
$ gawk -v s='\x40/str/' 'BEGIN {
> print s, typeof(s)
> }'
@/str/ string
'nuff said.
Ed Morton <mortoneccc@comcast.net> wrote:
> I just came across this behavior which I think is odd if not plain
> wrong. Lets say I have input data separated by `/<tab>/` and I want to
> find lines that contain one such separator (I know there are other ways
> to do that, that's not the point):
>
> printf 'foo/\t/bar\n'
> foo/ /bar
>
> printf 'foo/\t/bar\n' | awk -v str='/\t/' 'BEGIN{print typeof(str)
> ": <" str ">"} index($0,str)'
> string: </ />
> foo/ /bar
>
> Now lets say the separator is `@/<tab>/`:
>
> printf 'foo@/\t/bar\n'
> foo@/ /bar
>
> printf 'foo@/\t/bar\n' | awk -v str='@/\t/' 'BEGIN{print
> typeof(str), str} index($0,str)'
> regexp \t
>
> printf 'foo@/\t/bar\n' | awk -v str='@/\\t/' 'BEGIN{print
> typeof(str), str} index($0,str)'
> regexp \\t
>
> printf 'foo@/\t/bar\n' | awk -v str=$'@/\t/' 'BEGIN{print
> typeof(str), str} index($0,str)'
> regexp
> foo@/ /bar
>
> None of that is intuitive, especially if you're not even aware of the
> strongly typed regexp gawk extension, none of it functions as you'd
> expect if you simply wanted to use a string that happened to be `@/\t/`,
> and it would break existing code that relied on a string simply being a
> string and `-v` interpreting escape sequences.
>
> There are existing constructs that don't interpret escape sequences
> (e.g. populating a variable from ARGV[] or ENVIRON[]) so it's not clear
> why the behavior of `-v` changed to NOT interpret them when awk thought
> the string being passed is a strongly typed regexp. I also don't see an
> obvious way to turn off that behavior, e.g. by escaping the `@` (one
> escape functions but gives a warning while 2 escapes don't give a
> warning but don't function):
>
> printf 'foo@/\t/bar\n' | awk -v str='\@/\t/' 'BEGIN{print
> typeof(str), str} index($0,str)'
> awk: warning: escape sequence `\@' treated as plain `@'
> string @/ /
> foo@/ /bar
>
> printf 'foo@/\t/bar\n' | awk -v str='\\@/\t/' 'BEGIN{print
> typeof(str), str} index($0,str)'
> string \@/ /
>
> I would think that if you wanted to allow assignment of variables to be
> strongly typed regexp constants using `-v` then using `-v
> str='\@/.../'`, i.e. starting with an escaped `@`, would be a better
> way to go since there wont be any existing scripts that start with `\@`
> (because that would have produced the usual escape sequence warning) so
> the extension is something people can turn on if they want it rather
> than something that's on by default, has surprising effects like
> disabling escape sequence interpretation, and breaks existing behavior.
>
> Ed.