bug-grep
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[bug #36567] grep -i (case-insensitive) is broken with UTF8


From: Strahinja Kustudic
Subject: [bug #36567] grep -i (case-insensitive) is broken with UTF8
Date: Thu, 31 May 2012 11:18:30 +0000
User-agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:11.0) Gecko/20100101 Firefox/11.0

URL:
  <http://savannah.gnu.org/bugs/?36567>

                 Summary: grep -i (case-insensitive) is broken with UTF8
                 Project: grep
            Submitted by: kustodian
            Submitted on: Thu 31 May 2012 11:18:30 AM GMT
                Category: None
                Severity: 3 - Normal
              Item Group: None
                  Status: None
                 Privacy: Public
             Assigned to: None
             Open/Closed: Open
         Discussion Lock: Any

    _______________________________________________________

Details:

Since version 2.6.1 grep doesn't work correctly if you use a case-insesitive
search with UTF8 encoding when there is an UTF8 character. Here is the
example:

# Without -i switch everything works correctly
$ echo -e 'AA UTF8 char İ 12345\nAA 12345' | grep 'AA'
AA UTF8 char İ 12345
AA 12345


# With -i it breaks
$ echo -e 'AA UTF8 char İ 12345\nAA 12345' | grep -i 'AA'
AA UTF8 char İ 12345AA 12345


As you can see it somehow deletes the new line character in the line which has
an UTF8 'İ' character.

Everything works correctly in versions 2.5.4 and below, it's broken from 2.6.1
to the latest version (which is atm 2.6.12).

This is a big concern, since it can break scripts which filtered UTF8 input
using -i switch.




    _______________________________________________________

Reply to this item at:

  <http://savannah.gnu.org/bugs/?36567>

_______________________________________________
  Message sent via/by Savannah
  http://savannah.gnu.org/




reply via email to

[Prev in Thread] Current Thread [Next in Thread]