bug-findutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[bug #48055] Regex ranges and locales in gnu-awk regextype


From: Piotr Jurkiewicz
Subject: [bug #48055] Regex ranges and locales in gnu-awk regextype
Date: Mon, 30 May 2016 06:12:43 +0000 (UTC)
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Firefox/45.0

URL:
  <http://savannah.gnu.org/bugs/?48055>

                 Summary: Regex ranges and locales in gnu-awk regextype
                 Project: findutils
            Submitted by: piotrjurkiewicz
            Submitted on: Mon 30 May 2016 08:12:40 AM CEST
                Category: find
                Severity: 3 - Normal
              Item Group: Wrong result
                  Status: None
                 Privacy: Public
             Assigned to: None
         Originator Name: 
        Originator Email: 
             Open/Closed: Open
         Discussion Lock: Any
                 Release: 4.6.0
           Fixed Release: None

    _______________________________________________________

Details:

Starting with gawk 4.0 the traditional behaviour of regex ranges has been
brought back. This means that [a-z] matches only lowercase letters and [A-Z]
matches only uppercase letters, regardless of locale and collation being set.

See more:
https://www.gnu.org/software/gawk/manual/html_node/Ranges-and-Locales.html

Can test this with the following command:

$ echo ABC | LC_COLLATE=pl_PL.utf8 gawk '$0 ~ /^[a-b]/' # gawk pre-4.0
ABC 

$ echo ABC | LC_COLLATE=pl_PL.utf8 gawk '$0 ~ /^[a-b]/' # gawk 4.0+
[nothing]

Findutils, however, still emulate the old behaviour of gawk in gnu-awk mode.
That is, when using certain locales, [a-z] and [A-Z] ranges matches both
lowercase and uppercase letters.

Test:

Prepare:

mkdir test
cd test
touch a.lower
touch b.UPPER

Then both commands:

LC_COLLATE=pl_PL.utf8 find -regextype gnu-awk -regex '.*[a-z]{5}$'
LC_COLLATE=pl_PL.utf8 find -regextype gnu-awk -regex '.*[A-Z]{5}$'

returns:

./a.lower
./b.UPPER

instead just one file with appropriate case.




    _______________________________________________________

Reply to this item at:

  <http://savannah.gnu.org/bugs/?48055>

_______________________________________________
  Message sent via/by Savannah
  http://savannah.gnu.org/




reply via email to

[Prev in Thread] Current Thread [Next in Thread]