bug-grep
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 16/17] grep: remove check_multibyte_string, fix non-UTF8 miss


From: Paolo Bonzini
Subject: Re: [PATCH 16/17] grep: remove check_multibyte_string, fix non-UTF8 missed match
Date: Sun, 14 Mar 2010 13:33:00 +0100
User-agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.7) Gecko/20100120 Fedora/3.0.1-1.fc12 Lightning/1.0b2pre Thunderbird/3.0.1

On 03/14/2010 02:16 AM, Norihiro Tanaka wrote:
Hi,

By this patch, even when multibyte-check failed for a simple pattern
that doesn't contain the wild-card and the repetition expression, `dfaexec'
will have called.

Do you intend it?

Yes, see for example bug 23814. There, I'm searching for \xAA\xBB; kwset could give an exact match, but it only finds an unaligned match in \xBB\xAA\xBB\xAA. Note that DFA search anyway runs only on the line that kwset selected. Also, for UTF-8 the is_mb_middle test should always lead to success unless an invalid UTF-8 character gets into the DFA's "must" kwset.

The alternative is making kwset multibyte-aware, which is probably not impossible but not easy either; I would know how to do it only if I could specialize kwset with knowledge of the particular charsets, which is not good.

Paolo




reply via email to

[Prev in Thread] Current Thread [Next in Thread]