|
From: | Paolo Bonzini |
Subject: | bug#16912: [PATCH] no longer use CSET for non-UTF8 locale in DFA engine |
Date: | Wed, 05 Mar 2014 15:31:54 +0100 |
User-agent: | Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.2.0 |
Il 05/03/2014 14:41, Norihiro Tanaka ha scritto:
Paolo Bonzini wrote:What about these two commands: grep [a] grep -i A Would they match \x82\x61 ("B", U+0FF22) with your patch? And without it?No match for all.
Right, it's handled by SKIP_REMAINS_MB_IF_INITIAL_STATE. dfa.c never stops surprising me. Great catch! Paolo
-- Before the patch: $ locale -a | grep sjis ja_JP.sjis $ printf "\x82\x61\n" | env LC_ALL=ja_JP.sjis src/grep -i 'A' dfaanalyze: 0:A 1:a 2:OR 3:END 4:CAT $ printf "\x82\x61\n" | env LC_ALL=ja_JP.sjis src/grep '[a]' dfaanalyze: 0:MBCSET 1:END 2:CAT After the patch: $ locale -a | grep sjis ja_JP.sjis $ printf "\x82\x61\n" | env LC_ALL=ja_JP.sjis src/grep -i 'A' dfaanalyze: 0:CSET 1:END 2:CAT $ printf "\x82\x61\n" | env LC_ALL=ja_JP.sjis src/grep '[a]' dfaanalyze: 0:CSET 1:END 2:CAT -- Norihiro
[Prev in Thread] | Current Thread | [Next in Thread] |