[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#17376: [PATCH] grep: fix the different behaviour for a invalid seque
From: |
Paul Eggert |
Subject: |
bug#17376: [PATCH] grep: fix the different behaviour for a invalid sequence between KWset and DFA |
Date: |
Mon, 05 May 2014 20:26:37 -0700 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.5.0 |
While thinking about Bug#17376 I noticed some related bugs, which appear
to have been in 'grep' since at least grep 2.0. For example:
$ encode() { echo "$1" | tr ABC '\357\274\241'; }
$ encode ABCABC >exp3
$ encode _____________________ABCABC___ >exp4
$ bca=$(encode BCA)
$ grep "$bca" exp3
$ grep -F "$bca" exp3
$ grep "\\(\\)\\1$bca" exp3
AA
Here the regexp code disagrees with KWset and with the DFA, which is a
bug: KWset and DFA should affect only performance, not behavior.
$ grep "$bca" exp4
_____________________AA___
$ grep -F "$bca" exp4
_____________________AA___
$ grep "\\(\\)\\1$bca" exp4
_____________________AA___
Here they agree, but only because there's a bug in is_mb_middle!
Fixing that will cause them to disagree again.
I installed the attached patch to fix the bugs I found, and to adjust
the test cases accordingly.
0001-grep-fix-encoding-error-incompatibilities-among-rege.patch
Description: Text document