[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#16232: [PATCH] grep: make --ignore-case (-i) faster (sometimes 10x)
From: |
Jim Meyering |
Subject: |
bug#16232: [PATCH] grep: make --ignore-case (-i) faster (sometimes 10x) in multibyte locales |
Date: |
Fri, 10 Jan 2014 21:40:24 -0800 |
On Fri, Jan 10, 2014 at 8:52 PM, Jim Meyering <address@hidden> wrote:
>> I wonder might this faster path be restricted to a safer but very common
>> input subset of:
>>
>> (MB_CUR_MAX == 1 || (in_utf8 && *c < 0x80))
>
> That sounds like a good approach.
> Now I need another test case, to demonstrate that the current code can
> cause trouble.
Hmm... after thinking about this for a while and actually trying to
break the current code (did not find a way to demonstrate a regression),
I have concluded that the current approach is no worse than the prior
one of matching a case-mapped regexp vs. each case-mapped input line.
That's not to say that it's perfect, of course.
The "LATIN SMALL LETTER J WITH CARON, COMBINING DOT BELOW" example
from gnulib's test-ulc-casecmp.c is a great example: this matches:
printf '\x6A\xCC\x8C\xCC\xA3\n'|src/grep -i "$(printf
'\x6A\xCC\x8C\xCC\xA3')"
but this does not, yet probably should:
printf '\xC7\xB0\xCC\xA3\n'|src/grep -i "$(printf '\x6A\xCC\x8C\xCC\xA3')"
Can you see a way to demonstrate a regression?
Thanks again,
Jim
- bug#16232: [PATCH] grep: make --ignore-case (-i) faster (sometimes 10x) in multibyte locales, Jim Meyering, 2014/01/07
- bug#16232: [PATCH] grep: make --ignore-case (-i) faster (sometimes 10x) in multibyte locales, Pádraig Brady, 2014/01/10
- bug#16232: [PATCH] grep: make --ignore-case (-i) faster (sometimes 10x) in multibyte locales, Jim Meyering, 2014/01/10
- bug#16232: [PATCH] grep: make --ignore-case (-i) faster (sometimes 10x) in multibyte locales,
Jim Meyering <=
- bug#16232: [PATCH] grep: make --ignore-case (-i) faster (sometimes 10x) in multibyte locales, Pádraig Brady, 2014/01/11
- bug#16232: [PATCH] grep: make --ignore-case (-i) faster (sometimes 10x) in multibyte locales, Pádraig Brady, 2014/01/11
- bug#16232: [PATCH] grep: make --ignore-case (-i) faster (sometimes 10x) in multibyte locales, Jim Meyering, 2014/01/11
- bug#16232: [PATCH] grep: make --ignore-case (-i) faster (sometimes 10x) in multibyte locales, Jim Meyering, 2014/01/11
- bug#16232: [PATCH] grep: make --ignore-case (-i) faster (sometimes 10x) in multibyte locales, Pádraig Brady, 2014/01/12