[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
variable assignment with -v behavior changed for string that looks like
From: |
Ed Morton |
Subject: |
variable assignment with -v behavior changed for string that looks like a strongly typed regexp |
Date: |
Thu, 15 Apr 2021 11:23:12 -0500 |
User-agent: |
Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Thunderbird/78.9.1 |
I just came across this behavior which I think is odd if not plain
wrong. Lets say I have input data separated by `/<tab>/` and I want to
find lines that contain one such separator (I know there are other ways
to do that, that's not the point):
printf 'foo/\t/bar\n'
foo/ /bar
printf 'foo/\t/bar\n' | awk -v str='/\t/' 'BEGIN{print typeof(str)
": <" str ">"} index($0,str)'
string: </ />
foo/ /bar
Now lets say the separator is `@/<tab>/`:
printf 'foo@/\t/bar\n'
foo@/ /bar
printf 'foo@/\t/bar\n' | awk -v str='@/\t/' 'BEGIN{print
typeof(str), str} index($0,str)'
regexp \t
printf 'foo@/\t/bar\n' | awk -v str='@/\\t/' 'BEGIN{print
typeof(str), str} index($0,str)'
regexp \\t
printf 'foo@/\t/bar\n' | awk -v str=$'@/\t/' 'BEGIN{print
typeof(str), str} index($0,str)'
regexp
foo@/ /bar
None of that is intuitive, especially if you're not even aware of the
strongly typed regexp gawk extension, none of it functions as you'd
expect if you simply wanted to use a string that happened to be `@/\t/`,
and it would break existing code that relied on a string simply being a
string and `-v` interpreting escape sequences.
There are existing constructs that don't interpret escape sequences
(e.g. populating a variable from ARGV[] or ENVIRON[]) so it's not clear
why the behavior of `-v` changed to NOT interpret them when awk thought
the string being passed is a strongly typed regexp. I also don't see an
obvious way to turn off that behavior, e.g. by escaping the `@` (one
escape functions but gives a warning while 2 escapes don't give a
warning but don't function):
printf 'foo@/\t/bar\n' | awk -v str='\@/\t/' 'BEGIN{print
typeof(str), str} index($0,str)'
awk: warning: escape sequence `\@' treated as plain `@'
string @/ /
foo@/ /bar
printf 'foo@/\t/bar\n' | awk -v str='\\@/\t/' 'BEGIN{print
typeof(str), str} index($0,str)'
string \@/ /
I would think that if you wanted to allow assignment of variables to be
strongly typed regexp constants using `-v` then using `-v
str='\@/.../'`, i.e. starting with an escaped `@`, would be a better
way to go since there wont be any existing scripts that start with `\@`
(because that would have produced the usual escape sequence warning) so
the extension is something people can turn on if they want it rather
than something that's on by default, has surprising effects like
disabling escape sequence interpretation, and breaks existing behavior.
Ed.
- variable assignment with -v behavior changed for string that looks like a strongly typed regexp,
Ed Morton <=