[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: inconstancy with RS = "(\r?\n){2}"
From: |
Alex fxmbsw7 Ratchev |
Subject: |
Re: inconstancy with RS = "(\r?\n){2}" |
Date: |
Sun, 25 Jul 2021 15:06:44 +0200 |
hm thank you to all of you
sad that it can not where it should
i mean the general way i'd do is stack up chars one by one and check
on every run for RS matches, .. no problem so, no idea what gawk does
:))
On Sun, Jul 25, 2021 at 3:04 PM Wolfgang Laun <wolfgang.laun@gmail.com> wrote:
>
> I have been looking at the code in io.c and re.c.
>
> gawk lets you specify an arbitrary regex as RS, the record separator. But in
> an environment (terminal, socket) where the input data is not yet available
> to the gawk code looking for a match with RS, it is in general impossible to
> decide whether the full RS has been encountered or not unless some more input
> has been entered. Of course, there are regexes where you can tell, e.g.
> /ab?c/. But this becomes more and more difficult, e.g., when you have
> parentheses and repetitions making the analysis rather complex. So, to be on
> the safe side, gawk reads yet another line from the input source and then
> passes another record to the user's code.
>
> gawk is not a (soft) real time program and cannot react to all RS immediately
> after they have been typed in on a TTY or sent over a line.
>
> If you need this behavior, leave the default RS and implement a simple FSM
> which is better equipped to handle RS like /(\r?\n){2}/.
>
> The GAWK user manual might contain a paragraph describing what I have tried
> to say in a previous paragraph, perhaps better formulated.
>
> -W
>
>
>
> On Sun, 25 Jul 2021 at 13:55, Alex fxmbsw7 Ratchev <fxmbsw7@gmail.com> wrote:
>>
>> thank you for the true and detailed analyzement
>>
>> On Sun, Jul 25, 2021, 13:49 Ed Morton <mortoneccc@comcast.net> wrote:
>>>
>>>
>>>
>>> On 7/25/2021 4:47 AM, arnold@skeeve.com wrote:
>>>
>>> Greetings.
>>>
>>> Thank you for taking the time to make a bug report. In the future please
>>> send a concise description of the problem with a test program and data.
>>> It was hard for me to determine what you really think is the bug.
>>>
>>> It looks like your concern is with the need to enter EOF more than
>>> once from the terminal.
>>>
>>> Gawk is designed mainly for batch processing (from files or a pipe).
>>> Reading from a terminal with a complicated regexp as RS isn't the
>>> normal use case. When RS is a regexp gawk may have to do lookahead in
>>> the input stream to be sure that the regexp has matched, and thus
>>> the need for multiple EOFs.
>>>
>>> In any case, I don't think there is an actual bug:
>>>
>>> $ od -c data
>>> 0000000 a \n \n \n b \n \n \n \n c \n \n \n \n d \n
>>> 0000020
>>> $ ./gawk -v RS='(\r?\n){2}' -v ORS='|\n' '{ print }' < data
>>> a|
>>>
>>> b|
>>> |
>>> c|
>>> |
>>> d
>>> |
>>>
>>> This looks right to me.
>>>
>>> Thanks,
>>>
>>> Arnold
>>>
>>>
>>> The problem occurs when reading from a terminal:
>>>
>>> Good (no \r? in RS), every pair of `\n`s is recognized:
>>> ------------
>>> $ gawk -v RS='(\n){2}' '{print "<"$0":"RT">"}'
>>>
>>>
>>>
>>> <:
>>>
>>> >
>>>
>>>
>>> <:
>>>
>>> >
>>>
>>>
>>> <:
>>>
>>> >
>>> -----------------
>>>
>>> Bad (with \r? in RS), no RS is every recognized:
>>> --------------
>>> $ gawk -v RS='(\r?\n){2}' '{print "<"$0":"RT">"}'
>>>
>>>
>>>
>>>
>>>
>>>
>>> -------------------
>>>
>>> Meanwhile if the input was coming from a pipe the RS including `\r?` would
>>> be recognized:
>>> ---------
>>> $ printf '\n\n\n\n\n' | gawk -v RS='(\r?\n){2}' '{print "<"$0":"RT">"}'
>>> <:
>>>
>>> >
>>> <:
>>>
>>> >
>>> <
>>> :>
>>> -----------
>>>
>>> Regards,
>>>
>>> Ed.
>
>
>
> --
> Wolfgang Laun
>
- Re: Fwd: inconstancy with RS = "(\r?\n){2}", (continued)
- Re: Fwd: inconstancy with RS = "(\r?\n){2}", Alex fxmbsw7 Ratchev, 2021/07/26
- Re: Fwd: inconstancy with RS = "(\r?\n){2}", Alex fxmbsw7 Ratchev, 2021/07/26
- Re: Fwd: inconstancy with RS = "(\r?\n){2}", Alex fxmbsw7 Ratchev, 2021/07/26
- Re: Fwd: inconstancy with RS = "(\r?\n){2}", Alex fxmbsw7 Ratchev, 2021/07/26
- Re: Fwd: inconstancy with RS = "(\r?\n){2}", Alex fxmbsw7 Ratchev, 2021/07/26
- Re: Fwd: inconstancy with RS = "(\r?\n){2}", Alex fxmbsw7 Ratchev, 2021/07/26
- Re: Fwd: inconstancy with RS = "(\r?\n){2}", arnold, 2021/07/27
- Re: inconstancy with RS = "(\r?\n){2}", arnold, 2021/07/25
- Re: inconstancy with RS = "(\r?\n){2}", Ed Morton, 2021/07/25
- Re: inconstancy with RS = "(\r?\n){2}", Wolfgang Laun, 2021/07/25
- Re: inconstancy with RS = "(\r?\n){2}",
Alex fxmbsw7 Ratchev <=