bug-gawk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [bug-gawk] Computed regex and getline bug / issue


From: Aharon Robbins
Subject: Re: [bug-gawk] Computed regex and getline bug / issue
Date: Mon, 05 May 2014 09:04:23 +0300
User-agent: Heirloom mailx 12.5 6/20/10

Thanks Andy for figuring out where to look.

> I don't understand the interaction between io.c:get_a_record and 
> io.c:rsrescan well
> enough to see what's going wrong.  If I comment out the line in rsrescan that
> returns TERMNEAREND, it seems to fix this problem.  But I assume it must
> break something else.
>
> The potential patch is attached.
>
> To my surprise, "make check" passes with this patch.

Surprises me too.

> But there must be some
> reason for returning TERMNEAREND.  Does anybody have any insight into 
> the logic here?  Why is TERMNEAREND useful?

It is a heuristic. Consider an RS like what we have: RS = ",+".  Here,
we want as many commas as we can possibly slurp up.  Now consider a file
like so, where the | indicates a file block boundary:

        .... ,,, | ,, ...

rsrescan has seen the first three commas, but it doesn't know if the
next block starts with a comma, or with something else.  So it tells
get_a_record, "read some more data and retry", in case there's more
stuff that could be matched.

This was done to solve a real problem I encountered, where something
like   foo(bar)*  was the RS and the "foo" fell exactly on the end of
the block boundary; even though there was a "bar" at the beginning of
the next block, gawk wasn't picking it up.

This is code that I had considered to be pretty golden. :-( I will spend
some quality time in a debugger on it. I suspect that there needs to
be a little more smarts somewhere about the fact that not only was
the match near the end of the buffer, but also that we've seen EOF,
or else the EOF is too aggressive and shouldn't be picked up yet.
(Feel free to beat me to it - it will be educational. :-)

Thanks,

Arnold



reply via email to

[Prev in Thread] Current Thread [Next in Thread]