--- Begin Message ---
Subject: |
[PATCH] dfa: don't consider RE_DOT_NEWLINE and RE_DOT_NOT_NULL in matching with a bracket expression |
Date: |
Sat, 18 Oct 2014 21:39:37 +0900 |
RE_DOT_NEW_LINE and NOT_NULL work for '.' only in regex. OTOH, they
work for MBCSET in addition to '.' in DFA. This patch adapts the behavior
of DFA to of regex.
BTW, at the moment, grep and gawk never use match_mb_charset function to
be fixed by it.
0001-dfa-don-t-consider-RE_DOT_NEWLINE-and-RE_DOT_NOT_NUL.patch
Description: Text document
--- End Message ---
--- Begin Message ---
Subject: |
Re: bug#18762: [PATCH] dfa: don't consider RE_DOT_NEWLINE and RE_DOT_NOT_NULL in matching with a bracket expression |
Date: |
Sun, 19 Oct 2014 18:24:56 -0700 |
On Sat, Oct 18, 2014 at 7:07 PM, Norihiro Tanaka <address@hidden> wrote:
> Jim Meyering <address@hidden> wrote:
>> dfa.c's match_mb_charset function *is* used, e.g., in a
>> command like this one:
>>
>> printf '\0' |src/grep -aE '^\s?$'
>
> Wow, just it isn't good. I think that behavior of `fails' should be
> same as of `trans' except `fails' checks accepted conditions, including
> following part. match_mb_charset() should be avoided as far as possible,
> as it doesn't support collating symbols and equivalence classes.
>
>> /* Falling back to the glibc matcher in this case gives
>> better performance (up to 25% better on [a-z], for
>> example) and enables support for collating symbols and
>> equivalence classes. */
>> if (d->states[s].has_mbcset && backref)
>> {
>> *backref = 1;
>> goto done;
>> }
Nice change. I've adjusted the commit log and added the test
above, since no other code even excercised the
now-inaccessible function. I will push it tomorrow.
0001-dfa-process-all-MBCSET-constructs-via-glibc-s-matche.patch
Description: Binary data
--- End Message ---