bug-gawk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: variable assignment with -v behavior changed for string that looks l


From: arnold
Subject: Re: variable assignment with -v behavior changed for string that looks like a strongly typed regexp
Date: Fri, 16 Apr 2021 01:34:58 -0600
User-agent: Heirloom mailx 12.5 7/5/10

$ gawk --version
GNU Awk 5.1.0, API: 3.0 (GNU MPFR 4.0.1, GNU MP 6.1.2)

$ gawk -v s='\x40/str/' 'BEGIN {
> print s, typeof(s)
> }'
@/str/ string

'nuff said.

Ed Morton <mortoneccc@comcast.net> wrote:

> I just came across this behavior which I think is odd if not plain 
> wrong. Lets say I have input data separated by `/<tab>/` and I want to 
> find lines that contain one such separator (I know there are other ways 
> to do that, that's not the point):
>
>     printf 'foo/\t/bar\n'
>     foo/    /bar
>
>     printf 'foo/\t/bar\n' | awk -v str='/\t/' 'BEGIN{print typeof(str)
>     ": <" str ">"} index($0,str)'
>     string: </      />
>     foo/    /bar
>
> Now lets say the separator is `@/<tab>/`:
>
>     printf 'foo@/\t/bar\n'
>     foo@/   /bar
>
>     printf 'foo@/\t/bar\n' | awk -v str='@/\t/' 'BEGIN{print
>     typeof(str), str} index($0,str)'
>     regexp \t
>
>     printf 'foo@/\t/bar\n' | awk -v str='@/\\t/' 'BEGIN{print
>     typeof(str), str} index($0,str)'
>     regexp \\t
>
>     printf 'foo@/\t/bar\n' | awk -v str=$'@/\t/' 'BEGIN{print
>     typeof(str), str} index($0,str)'
>     regexp
>     foo@/   /bar
>
> None of that is intuitive, especially if you're not even aware of the 
> strongly typed regexp gawk extension, none of it functions as you'd 
> expect if you simply wanted to use a string that happened to be `@/\t/`, 
> and it would break existing code that relied on a string simply being a 
> string and `-v` interpreting escape sequences.
>
> There are existing constructs that don't interpret escape sequences 
> (e.g. populating a variable from ARGV[] or ENVIRON[]) so it's not clear 
> why the behavior of `-v` changed to NOT interpret them when awk thought 
> the string being passed is a strongly typed regexp. I also don't see an 
> obvious way to turn off that behavior, e.g. by escaping the `@` (one 
> escape functions but gives a warning while 2 escapes don't give a 
> warning but don't function):
>
>     printf 'foo@/\t/bar\n' | awk -v str='\@/\t/' 'BEGIN{print
>     typeof(str), str} index($0,str)'
>     awk: warning: escape sequence `\@' treated as plain `@'
>     string @/       /
>     foo@/   /bar
>
>     printf 'foo@/\t/bar\n' | awk -v str='\\@/\t/' 'BEGIN{print
>     typeof(str), str} index($0,str)'
>     string \@/      /
>
> I would think that if you wanted to allow assignment of variables to be 
> strongly typed regexp constants using `-v` then using `-v 
> str='\@/.../'`, i.e. starting with an escaped `@`,  would be a better 
> way to go since there wont be any existing scripts that start with `\@` 
> (because that would have produced the usual escape sequence warning) so 
> the extension is something people can turn on if they want it rather 
> than something that's on by default, has surprising effects like 
> disabling escape sequence interpretation, and breaks existing behavior.
>
>      Ed.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]