|
From: | Stepan Kasal |
Subject: | [bug-grep] dfa.c sync with gawk, progress report |
Date: | Fri, 17 Dec 2004 14:06:47 +0100 |
User-agent: | Mutt/1.4.1i |
Hi, I'd like to catch up with gawk/dfa.c, so that we can serve as a primary source of dfa for both. After the i18n patch by Isamu Hasegawa, I checked in a few simple changes imported from gawk. The following patches remain in the queue: 1) dfa-newline.patch -- adds ability to match embedded newlines; grep doesn't use it but gawk needs it. 2) dfa_backslash_s.patch -- support \s and \S 3) dfa-arnold-icase.patch 4) http://lists.gnu.org/archive/html/bug-gnu-utils/2004-11/msg00096.html 5) http://lists.gnu.org/archive/html/bug-gnu-utils/2004-11/msg00098.html 6) http://lists.gnu.org/archive/html/bug-gnu-utils/2004-11/msg00097.html The former three patches are attched to this mail. The last three are URLs to gawk bug reports, but I believe they apply to grep, too. The fixes submitted there are attached to this mail as a combined patch, named dfa-deb-jp.patch . When we process these, we are in sync with gawk's dfa.c. Comments: 1) Matching enbedded newlines is a need for awk. We could #ifdef it if it would worth the work. I think the interface is probably the same as it was in dfa.c in grep-2.4.2 and older. I think we could modify it. Noone wants the COUNT parameter, for example. 2) looks OK 3) fixes the bug demonstrated by: echo C |LC_ALL=en_GB.UTF8 src/grep -i '[c]' 4)--6) fix the bugs described in the bug reports. I have reproduced them all and verified that each of them is fixed by the patch in question. (There is a gotcha now: the configure script doesn't match the needs of mbsupport.h, so you have to force it: I added a #define at the end of my copy of mbsupport.h.) To sum up: 1) may need more work/review, but 2)--6) should go in soon. 3)--6) really deserve a test case. Before doing that, I'd like to see some cleanup of the test suite. I think we should have only one awk script, and the top of each *.tests file would described which command to call and other details. I also think we need a new test collection to test ``grep -i''. The gawk testing procedure: 1) unpack gawk-3.1.4.tar.bz2, cd gawk-3.1.4 2) sed -n '/hard-locale/,$p' dfa.c >hard-locale.h 3) cp ~/grep-cvs/grep/src/dfa.[ch] . 4) configure && make 5) make check 6) GAWKLOCALE=en_US.utf8 make check With all the patches attached to this mail, `make check' suceeds in 5), and in 6) only `gsubtst5' fails (this is expected). If any of you decide to work on the newline patch, you can use this gawk test, if you feel like it. That's it for now, have a nice weekend, Stepan
dfa-newline.patch
Description: Text document
dfa_backslash_s.patch
Description: Text document
dfa-arnold-icase.patch
Description: Text document
dfa-deb-jp.patch
Description: Text document
[Prev in Thread] | Current Thread | [Next in Thread] |