[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: inconstancy with RS = "(\r?\n){2}"

From: Ed Morton
Subject: Re: inconstancy with RS = "(\r?\n){2}"
Date: Sun, 25 Jul 2021 06:49:09 -0500
User-agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Thunderbird/78.12.0

On 7/25/2021 4:47 AM, arnold@skeeve.com wrote:

Thank you for taking the time to make a bug report. In the future please
send a concise description of the problem with a test program and data.
It was hard for me to determine what you really think is the bug.

It looks like your concern is with the need to enter EOF more than
once from the terminal.

Gawk is designed mainly for batch processing (from files or a pipe).
Reading from a terminal with a complicated regexp as RS isn't the
normal use case.  When RS is a regexp gawk may have to do lookahead in
the input stream to be sure that the regexp has matched, and thus
the need for multiple EOFs.

In any case, I don't think there is an actual bug:

$ od -c data
0000000   a  \n  \n  \n   b  \n  \n  \n  \n   c  \n  \n  \n  \n   d  \n
$ ./gawk -v RS='(\r?\n){2}' -v ORS='|\n' '{ print }' < data


This looks right to me.



The problem occurs when reading from a terminal:

Good (no \r? in RS), every pair of `\n`s is recognized:
$ gawk -v RS='(\n){2}' '{print "<"$0":"RT">"}'







Bad (with \r? in RS), no RS is every recognized:
$ gawk -v RS='(\r?\n){2}' '{print "<"$0":"RT">"}'


Meanwhile if the input was coming from a pipe the RS including `\r?` would be recognized:
$ printf '\n\n\n\n\n' | gawk -v RS='(\r?\n){2}' '{print "<"$0":"RT">"}'





reply via email to

[Prev in Thread] Current Thread [Next in Thread]