bug-gawk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: RS='.^' apparently ignores the RS setting


From: arnold
Subject: Re: RS='.^' apparently ignores the RS setting
Date: Tue, 13 Jul 2021 07:09:06 -0600
User-agent: Heirloom mailx 12.5 7/5/10

Ed Morton <mortoneccc@comcast.net> wrote:

> The question is - why, given `RS='.^'` (a regexp that cannot match 
> anywhere in the input), does gawk seem to ignore the RS and act as if I 
> had `RS='\n'` when given `RS='x^'` (a different but similar regexp that 
> also cannot match anywhere in the input) awk just reads the whole input 
> in at once as you'd expect it to given a regexp that doesn't match the 
> input?
>
> $ printf 'ax^b\nax^b\n' | gawk 'BEGIN{RS="x^"}{print NR, $0}'
> 1 ax^b
> ax^b
>
> $ printf 'a.^b\na.^b\n' | gawk 'BEGIN{RS=".^"}{print NR, $0}'
> 1 a.^b
> 2 a.^b

Good question.

$ cat data
a.^b
a.^b

$ ./gawk 'BEGIN { RS = ".^" } ; { gsub(/.^/, ">&<") ; print NR, $0
> print "RT=<" RT ">" }' < data
1 a.^b
RT=<
>
2 a.^b
RT=<
>

The matched character for the record separator is the newline before
the subsequent begininng of the line that follows it.

This is likely a bug. I need to think about how to deal with it.
It's a truly weird corner case.

Arnold



reply via email to

[Prev in Thread] Current Thread [Next in Thread]