bug-gawk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

variable assignment with -v behavior changed for string that looks like


From: Ed Morton
Subject: variable assignment with -v behavior changed for string that looks like a strongly typed regexp
Date: Thu, 15 Apr 2021 11:23:12 -0500
User-agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Thunderbird/78.9.1

I just came across this behavior which I think is odd if not plain wrong. Lets say I have input data separated by `/<tab>/` and I want to find lines that contain one such separator (I know there are other ways to do that, that's not the point):

   printf 'foo/\t/bar\n'
   foo/    /bar

   printf 'foo/\t/bar\n' | awk -v str='/\t/' 'BEGIN{print typeof(str)
   ": <" str ">"} index($0,str)'
   string: </      />
   foo/    /bar

Now lets say the separator is `@/<tab>/`:

   printf 'foo@/\t/bar\n'
   foo@/   /bar

   printf 'foo@/\t/bar\n' | awk -v str='@/\t/' 'BEGIN{print
   typeof(str), str} index($0,str)'
   regexp \t

   printf 'foo@/\t/bar\n' | awk -v str='@/\\t/' 'BEGIN{print
   typeof(str), str} index($0,str)'
   regexp \\t

   printf 'foo@/\t/bar\n' | awk -v str=$'@/\t/' 'BEGIN{print
   typeof(str), str} index($0,str)'
   regexp
   foo@/   /bar

None of that is intuitive, especially if you're not even aware of the strongly typed regexp gawk extension, none of it functions as you'd expect if you simply wanted to use a string that happened to be `@/\t/`, and it would break existing code that relied on a string simply being a string and `-v` interpreting escape sequences.

There are existing constructs that don't interpret escape sequences (e.g. populating a variable from ARGV[] or ENVIRON[]) so it's not clear why the behavior of `-v` changed to NOT interpret them when awk thought the string being passed is a strongly typed regexp. I also don't see an obvious way to turn off that behavior, e.g. by escaping the `@` (one escape functions but gives a warning while 2 escapes don't give a warning but don't function):

   printf 'foo@/\t/bar\n' | awk -v str='\@/\t/' 'BEGIN{print
   typeof(str), str} index($0,str)'
   awk: warning: escape sequence `\@' treated as plain `@'
   string @/       /
   foo@/   /bar

   printf 'foo@/\t/bar\n' | awk -v str='\\@/\t/' 'BEGIN{print
   typeof(str), str} index($0,str)'
   string \@/      /

I would think that if you wanted to allow assignment of variables to be strongly typed regexp constants using `-v` then using `-v str='\@/.../'`, i.e. starting with an escaped `@`,  would be a better way to go since there wont be any existing scripts that start with `\@` (because that would have produced the usual escape sequence warning) so the extension is something people can turn on if they want it rather than something that's on by default, has surprising effects like disabling escape sequence interpretation, and breaks existing behavior.

    Ed.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]