bug#16232: [PATCH] grep: make --ignore-case (-i) faster (sometimes 10x)

bug-grep

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#16232: [PATCH] grep: make --ignore-case (-i) faster (sometimes 10x)

From:	Jim Meyering
Subject:	bug#16232: [PATCH] grep: make --ignore-case (-i) faster (sometimes 10x) in multibyte locales
Date:	Fri, 10 Jan 2014 20:52:00 -0800

On Fri, Jan 10, 2014 at 5:49 PM, Pádraig Brady <address@hidden> wrote:
> Cool so it does this transformation:
>
>   sed 's/./[\L&\U&]/g'
>
> Though multi byte case handling has all sorts of edge cases (pardon the pun),
> and it may not be always valid to treat each character independently?
> For example see some of the tests in:
> http://git.sv.gnu.org/gitweb/?p=gnulib.git;a=blob;f=tests/unicase/test-ulc-casecmp.c;hb=HEAD

It seems you're right.  Since it's a many-to-one mapping in some
cases, simply using one lower case character and one upper case
version won't cover all possibilities.

> I wonder might this faster path be restricted to a safer but very common 
> input subset of:
>
> (MB_CUR_MAX == 1 || (in_utf8 && *c < 0x80))

That sounds like a good approach.
Now I need another test case, to demonstrate that the current code can
cause trouble.

> Also are the following printfs in the test redundant?
>
>> +data=$(      printf "I:$I $i:i")
>> +search_str=$(printf "$i:i I:$I")

Good catch.  Those were vestiges of pre-factoring code, where they
were needed.  Here's the patch to fix that part, in your name:

k.txt
Description: Text document

[Prev in Thread]

Current Thread

[Next in Thread]

bug#16232: [PATCH] grep: make --ignore-case (-i) faster (sometimes 10x) in multibyte locales, Jim Meyering, 2014/01/07
- bug#16232: [PATCH] grep: make --ignore-case (-i) faster (sometimes 10x) in multibyte locales, Jim Meyering, 2014/01/10
- bug#16232: [PATCH] grep: make --ignore-case (-i) faster (sometimes 10x) in multibyte locales, Pádraig Brady, 2014/01/10
  - bug#16232: [PATCH] grep: make --ignore-case (-i) faster (sometimes 10x) in multibyte locales, Jim Meyering <=
    - bug#16232: [PATCH] grep: make --ignore-case (-i) faster (sometimes 10x) in multibyte locales, Jim Meyering, 2014/01/11
    - bug#16232: [PATCH] grep: make --ignore-case (-i) faster (sometimes 10x) in multibyte locales, Pádraig Brady, 2014/01/11
    - bug#16232: [PATCH] grep: make --ignore-case (-i) faster (sometimes 10x) in multibyte locales, Pádraig Brady, 2014/01/11
    - bug#16232: [PATCH] grep: make --ignore-case (-i) faster (sometimes 10x) in multibyte locales, Jim Meyering, 2014/01/11
    - bug#16232: [PATCH] grep: make --ignore-case (-i) faster (sometimes 10x) in multibyte locales, Jim Meyering, 2014/01/11
    - bug#16232: [PATCH] grep: make --ignore-case (-i) faster (sometimes 10x) in multibyte locales, Pádraig Brady, 2014/01/12

Prev by Date: bug#16232: [PATCH] grep: make --ignore-case (-i) faster (sometimes 10x) in multibyte locales
Next by Date: bug#16232: [PATCH] grep: make --ignore-case (-i) faster (sometimes 10x) in multibyte locales
Previous by thread: bug#16232: [PATCH] grep: make --ignore-case (-i) faster (sometimes 10x) in multibyte locales
Next by thread: bug#16232: [PATCH] grep: make --ignore-case (-i) faster (sometimes 10x) in multibyte locales
Index(es):
- Date
- Thread