bug-grep
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#16912: [PATCH] no longer use CSET for non-UTF8 locale in DFA engine


From: Paolo Bonzini
Subject: bug#16912: [PATCH] no longer use CSET for non-UTF8 locale in DFA engine
Date: Wed, 05 Mar 2014 15:31:54 +0100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.2.0

Il 05/03/2014 14:41, Norihiro Tanaka ha scritto:
Paolo Bonzini wrote:

What about these two commands:

    grep [a]
    grep -i A

Would they match \x82\x61 ("B", U+0FF22) with your patch?  And without it?

No match for all.

Right, it's handled by SKIP_REMAINS_MB_IF_INITIAL_STATE.
dfa.c never stops surprising me.  Great catch!

Paolo

--
Before the patch:

$ locale -a | grep sjis
ja_JP.sjis
$ printf "\x82\x61\n" | env LC_ALL=ja_JP.sjis src/grep -i 'A'
dfaanalyze:
 0:A 1:a 2:OR 3:END 4:CAT
$ printf "\x82\x61\n" | env LC_ALL=ja_JP.sjis src/grep '[a]'
dfaanalyze:
 0:MBCSET 1:END 2:CAT

After the patch:

$ locale -a | grep sjis
ja_JP.sjis
$ printf "\x82\x61\n" | env LC_ALL=ja_JP.sjis src/grep -i 'A'
dfaanalyze:
 0:CSET 1:END 2:CAT
$ printf "\x82\x61\n" | env LC_ALL=ja_JP.sjis src/grep '[a]'
dfaanalyze:
 0:CSET 1:END 2:CAT
--

Norihiro








reply via email to

[Prev in Thread] Current Thread [Next in Thread]