[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug in gawk 3.1.0 regex code
From: |
laura_fairhead |
Subject: |
bug in gawk 3.1.0 regex code |
Date: |
Fri, 10 May 2002 03:38:42 GMT+01:00 |
I believe I've just found a bug in gawk3.1.0 implementation of
extended regular expressions. It seems to be down to the alternation
operator; when using an end anchor '$' as a subexpression in an
alternation and the entire matched RE is a nul-string it fails
to match the end of string, for example;
gsub(/$|2/,"x")
print
input = 12345
expected output = 1x345x
actual output = 1x345
The start anchor '^' always works as expected;
gsub(/^|2/,"x")
print
input = 12345
expected output = x1x345
actual output = x1x345
This was with POSIX compliance enabled althought that doesn't
effect the result.
I checked on gawk3.0.6 and got exactly the same results however
gawk2.15.6 gives the expected results.
All the follow platforms produced the same results;
gawk3.0.6 / Win98 / i386
gawk3.1.0 / Win98 / i386
gawk3.0.5 / Linux2.2.16 / i386
Complete test results were as follows;
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
regex input expected actual bug?
-------------------------------------------------------------
(^) 12345 x12345 x12345
($) 12345 12345x 12345x
(^)|($) 12345 x12345x x12345x
($)|(^) 12345 x12345x x12345x
2 12345 1x345 1x345
(^)|2 12345 x1x345 x1x345
2|(^) 12345 x1x345 x1x345
($)|2 12345 1x345x 1x345 **BUG**
2|($) 12345 1x345x 1x345 **BUG**
(2)|(^) 12345 x1x345 x1x345
(^)|(2) 12345 x1x345 x1x345
(2)|($) 12345 1x345x 1x345 **BUG**
($)|(2) 12345 1x345x 1x345 **BUG**
((2)|(^)). 12345 xx45 xx45
((^)|(2)). 12345 xx45 xx45
.((2)|($)) 12345 x34x x34x
.(($)|(2)) 12345 x34x x34x
(^)|6 12345 x12345 x12345
6|(^) 12345 x12345 x12345
($)|6 12345 12345x 12345x
6|($) 12345 12345x 12345x
2|6|(^) 12345 x1x345 x1x345
2|(^)|6 12345 x1x345 x1x345
6|2|(^) 12345 x1x345 x1x345
6|(^)|2 12345 x1x345 x1x345
(^)|6|2 12345 x1x345 x1x345
(^)|2|6 12345 x1x345 x1x345
2|6|($) 12345 1x345x 1x345 **BUG**
2|($)|6 12345 1x345x 1x345 **BUG**
6|2|($) 12345 1x345x 1x345 **BUG**
6|($)|2 12345 1x345x 1x345 **BUG**
($)|6|2 12345 1x345x 1x345 **BUG**
($)|2|6 12345 1x345x 1x345 **BUG**
2|4|(^) 12345 x1x3x5 x1x3x5
2|(^)|4 12345 x1x3x5 x1x3x5
4|2|(^) 12345 x1x3x5 x1x3x5
4|(^)|2 12345 x1x3x5 x1x3x5
(^)|4|2 12345 x1x3x5 x1x3x5
(^)|2|4 12345 x1x3x5 x1x3x5
2|4|($) 12345 1x3x5x 1x3x5 **BUG**
2|($)|4 12345 1x3x5x 1x3x5 **BUG**
4|2|($) 12345 1x3x5x 1x3x5 **BUG**
4|($)|2 12345 1x3x5x 1x3x5 **BUG**
($)|4|2 12345 1x3x5x 1x3x5 **BUG**
($)|2|4 12345 1x3x5x 1x3x5 **BUG**
x{0}((2)|(^)) 12345 x1x345 x1x345
x{0}((^)|(2)) 12345 x1x345 x1x345
x{0}((2)|($)) 12345 1x345x 1x345 **BUG**
x{0}(($)|(2)) 12345 1x345x 1x345 **BUG**
x*((2)|(^)) 12345 x1x345 x1x345
x*((^)|(2)) 12345 x1x345 x1x345
x*((2)|($)) 12345 1x345x 1x345 **BUG**
x*(($)|(2)) 12345 1x345x 1x345 **BUG**
x{0}^ 12345 x12345 x12345
x{0}$ 12345 12345x 12345x
(x{0}^)|2 12345 x1x345 x1x345
(x{0}$)|2 12345 1x345x 1x345 **BUG**
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Here's the test program I used, a few of the cases use ERE {n[,[m]]}
operators so need '-W posix', (although the same results minus
those tests came out without POSIX compliance enabled)
[ Invocation was 'gawk -W posix -f tregex.awk' ]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
tregex.awk
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
BEGIN{
print
_=sprintf("%-20s%-10s%-10s%-10s%-10s\n","regex","input","expected","actual","bug?")
OFS="-"
$(length(_)+1)=""
print $0
while(getline <"testre.dat")
{
RE=$1;IN=$2;OUT=$3
$0=IN
gsub(RE,"x")
printf "%-20s%-10s%-10s%-10s%-10s\n",RE,IN,OUT,$0,$0==OUT?"":"**BUG**"
}
}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
This is the test data file used;
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
testre.dat
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
(^) 12345 x12345
($) 12345 12345x
(^)|($) 12345 x12345x
($)|(^) 12345 x12345x
2 12345 1x345
(^)|2 12345 x1x345
2|(^) 12345 x1x345
($)|2 12345 1x345x
2|($) 12345 1x345x
(2)|(^) 12345 x1x345
(^)|(2) 12345 x1x345
(2)|($) 12345 1x345x
($)|(2) 12345 1x345x
((2)|(^)). 12345 xx45
((^)|(2)). 12345 xx45
.((2)|($)) 12345 x34x
.(($)|(2)) 12345 x34x
(^)|6 12345 x12345
6|(^) 12345 x12345
($)|6 12345 12345x
6|($) 12345 12345x
2|6|(^) 12345 x1x345
2|(^)|6 12345 x1x345
6|2|(^) 12345 x1x345
6|(^)|2 12345 x1x345
(^)|6|2 12345 x1x345
(^)|2|6 12345 x1x345
2|6|($) 12345 1x345x
2|($)|6 12345 1x345x
6|2|($) 12345 1x345x
6|($)|2 12345 1x345x
($)|6|2 12345 1x345x
($)|2|6 12345 1x345x
2|4|(^) 12345 x1x3x5
2|(^)|4 12345 x1x3x5
4|2|(^) 12345 x1x3x5
4|(^)|2 12345 x1x3x5
(^)|4|2 12345 x1x3x5
(^)|2|4 12345 x1x3x5
2|4|($) 12345 1x3x5x
2|($)|4 12345 1x3x5x
4|2|($) 12345 1x3x5x
4|($)|2 12345 1x3x5x
($)|4|2 12345 1x3x5x
($)|2|4 12345 1x3x5x
x{0}((2)|(^)) 12345 x1x345
x{0}((^)|(2)) 12345 x1x345
x{0}((2)|($)) 12345 1x345x
x{0}(($)|(2)) 12345 1x345x
x*((2)|(^)) 12345 x1x345
x*((^)|(2)) 12345 x1x345
x*((2)|($)) 12345 1x345x
x*(($)|(2)) 12345 1x345x
x{0}^ 12345 x12345
x{0}$ 12345 12345x
(x{0}^)|2 12345 x1x345
(x{0}$)|2 12345 1x345x
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
I've attached a full copy of this e-mail in ZIP format
in case of e-mail transport errors corrupting the data.
I've posted the same bug report to gnu.utils.bug and
it's being discussed in this thread on comp.lang.awk;
From: address@hidden (laura fairhead)
Newsgroups: comp.lang.awk
Subject: bug in gawk3.1.0 regex code
Date: Wed, 08 May 2002 23:31:40 GMT
Message-ID: <address@hidden>
byefrom
Laura Fairhead
--------------------
talk21 your FREE portable and private address on the net at
http://www.talk21.com
COPY.ZIP
Description: