bug#17376: [PATCH] grep: fix the different behaviour for a invalid seque

bug-grep

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#17376: [PATCH] grep: fix the different behaviour for a invalid seque

From:	Norihiro Tanaka
Subject:	bug#17376: [PATCH] grep: fix the different behaviour for a invalid sequence between KWset and DFA
Date:	Thu, 01 May 2014 07:16:31 +0900

Sorry, tow test cases are wrong.  It's as follows surely.

  encode() { echo "$1" | tr ABC '\357\274\241'; }
  encode ABC | env LC_ALL=en_US.utf8 src/grep "$(encode A)\|q"
  encode ABC | env LC_ALL=en_US.utf8 src/grep -F "$(encode A)"
  encode aABC | env LC_ALL=en_US.utf8 src/grep "a$(encode A)\|q"
         ^
  encode aABC | env LC_ALL=en_US.utf8 src/grep -F "a$(encode A)"
         ^

We will expect none of the commands output anything, but we get 1 row in
4th.  We need to fix last line in searchutils.c (is_mb_middle) to make
it correctly.

  return 0 < match_len && match_len < mbrlen (p, end - p, &cur_state);

We must check whether it's valid at not the top but a part of last of
matched pattern.  Now, although checked here: `a$(encode A)', correctly
should be checked here: `a$(encode A)'.        ^
                          ^^^^^^^^^^^
However, it may cause slowdown in some typical cases which doesn't include
any invalid sequences, and many users won't hope it.

Further more, I seem that DFA doesn't treat invalid sequence correctly.
I checked it with debugging on.  No longer tokens are broken in 1st case.

  encode ABC | env LC_ALL=en_US.utf8 src/grep "$(encode A)\|q"

  dfaanalyze:
   0:c3 1:af 2:CAT 3:71 4:OR 5:END 6:CAT

I expect below, becuase `encode ABC' is `ef bc a1'.

  dfaanalyze:
    0:ef 1:71 2:OR 3:END 4:CAT

However, It will be also difficult to fix it correctly.  Therefore,
I propose the simple fix in the patch.

Thanks,
Norihiro

[Prev in Thread]

Current Thread

[Next in Thread]

bug#17376: [PATCH] grep: fix the different behaviour for a invalid sequence between KWset and DFA, Norihiro Tanaka, 2014/04/30
- bug#17376: [PATCH] grep: fix the different behaviour for a invalid sequence between KWset and DFA, Paul Eggert, 2014/04/30
  - bug#17376: [PATCH] grep: fix the different behaviour for a invalid sequence between KWset and DFA, Norihiro Tanaka <=

Prev by Date: bug#17377: [PATCH] dfa: optimization of memory allocation
Next by Date: bug#17350: [PATCH] grep: speed up for a case to repeat failure in DFA after success in kwset
Previous by thread: bug#17376: [PATCH] grep: fix the different behaviour for a invalid sequence between KWset and DFA
Next by thread: bug#17377: [PATCH] dfa: optimization of memory allocation
Index(es):
- Date
- Thread