bug-grep
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#17376: [PATCH] grep: fix the different behaviour for a invalid seque


From: Paul Eggert
Subject: bug#17376: [PATCH] grep: fix the different behaviour for a invalid sequence between KWset and DFA
Date: Mon, 05 May 2014 20:26:37 -0700
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.5.0

While thinking about Bug#17376 I noticed some related bugs, which appear to have been in 'grep' since at least grep 2.0. For example:

$ encode() { echo "$1" | tr ABC '\357\274\241'; }
$ encode ABCABC >exp3
$ encode _____________________ABCABC___ >exp4
$ bca=$(encode BCA)
$ grep "$bca" exp3
$ grep -F "$bca" exp3
$ grep "\\(\\)\\1$bca" exp3
AA

Here the regexp code disagrees with KWset and with the DFA, which is a bug: KWset and DFA should affect only performance, not behavior.

$ grep "$bca" exp4
_____________________AA___
$ grep -F "$bca" exp4
_____________________AA___
$ grep "\\(\\)\\1$bca" exp4
_____________________AA___

Here they agree, but only because there's a bug in is_mb_middle!
Fixing that will cause them to disagree again.

I installed the attached patch to fix the bugs I found, and to adjust the test cases accordingly.

Attachment: 0001-grep-fix-encoding-error-incompatibilities-among-rege.patch
Description: Text document


reply via email to

[Prev in Thread] Current Thread [Next in Thread]