bug-grep
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[bug-grep] dfa.c sync with gawk, progress report


From: Stepan Kasal
Subject: [bug-grep] dfa.c sync with gawk, progress report
Date: Fri, 17 Dec 2004 14:06:47 +0100
User-agent: Mutt/1.4.1i

Hi,
   I'd like to catch up with gawk/dfa.c, so that we can serve as
a primary source of dfa for both.  After the i18n patch by Isamu
Hasegawa, I checked in a few simple changes imported from gawk.

The following patches remain in the queue:

1) dfa-newline.patch -- adds ability to match embedded newlines;
   grep doesn't use it but gawk needs it.
2) dfa_backslash_s.patch  -- support \s and \S
3) dfa-arnold-icase.patch
4) http://lists.gnu.org/archive/html/bug-gnu-utils/2004-11/msg00096.html
5) http://lists.gnu.org/archive/html/bug-gnu-utils/2004-11/msg00098.html
6) http://lists.gnu.org/archive/html/bug-gnu-utils/2004-11/msg00097.html

The former three patches are attched to this mail.
The last three are URLs to gawk bug reports, but I believe they apply to
grep, too.  The fixes submitted there are attached to this mail as a
combined patch, named dfa-deb-jp.patch .

When we process these, we are in sync with gawk's dfa.c.

Comments:
1) Matching enbedded newlines is a need for awk.  We could #ifdef it if
it would worth the work.
I think the interface is probably the same as it was in dfa.c in
grep-2.4.2 and older.  I think we could modify it.  Noone wants the
COUNT parameter, for example.

2) looks OK

3) fixes the bug demonstrated by:
   echo C |LC_ALL=en_GB.UTF8 src/grep -i '[c]'

4)--6) fix the bugs described in the bug reports.  I have reproduced them all
and verified that each of them is fixed by the patch in question.
(There is a gotcha now: the configure script doesn't match the needs of
mbsupport.h, so you have to force it: I added a #define at the end of my
copy of mbsupport.h.)

To sum up: 1) may need more work/review, but 2)--6) should go in soon.

3)--6) really deserve a test case.  Before doing that, I'd like to see some
cleanup of the test suite.  I think we should have only one awk script, and
the top of each *.tests file would described which command to call and other
details.
I also think we need a new test collection to test ``grep -i''.

The gawk testing procedure:
1) unpack gawk-3.1.4.tar.bz2, cd gawk-3.1.4
2) sed -n '/hard-locale/,$p' dfa.c >hard-locale.h
3) cp ~/grep-cvs/grep/src/dfa.[ch] .
4) configure && make
5) make check
6) GAWKLOCALE=en_US.utf8 make check

With all the patches attached to this mail, `make check' suceeds in 5),
and in 6) only `gsubtst5' fails (this is expected).

If any of you decide to work on the newline patch, you can use this gawk test,
if you feel like it.

That's it for now, have a nice weekend,
        Stepan

Attachment: dfa-newline.patch
Description: Text document

Attachment: dfa_backslash_s.patch
Description: Text document

Attachment: dfa-arnold-icase.patch
Description: Text document

Attachment: dfa-deb-jp.patch
Description: Text document


reply via email to

[Prev in Thread] Current Thread [Next in Thread]