bug#38503: Locale can cause incorrect number parsing in binary files

bug-grep

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#38503: Locale can cause incorrect number parsing in binary files

From:	jan h
Subject:	bug#38503: Locale can cause incorrect number parsing in binary files
Date:	Thu, 5 Dec 2019 18:40:21 +0000

On another machine with grep 3.1 this does not appear to be the case,
so, regression?

Kontakt jan h (<address@hidden>) kirjutas kuupäeval N, 5.
detsember 2019 kell 18:30:
>
> grep 3.3
>
> I get a few weird symbols (seems valid utf-8), along with normal
> numbers with the following simple snippet (.UTF-8 and .utf8 result in
> same, even .UtF---8 is the same):
> LC_ALL=en_US.UTF-8 grep -o "[0-9]" -a /dev/urandom|head -n 1024|tr -d "\n"
> wc -c counts 1047 and and 1033 and 1036 etc, so they're multi-byte characters
> meanwhile, with LC_ALL being C.UTF-8 this is not the case,
> LC_ALL=C.UTF-8 grep -o "[0-9]" -a /dev/urandom|head -n 1024|tr -d "\n"|wc -c
> consistently results in 1024 characters/bytes, as it's supposed to be...
> it's not just en_US, it seems ANY utf-8 locale, other than C results
> in this bug, whereas non-utf8 versions are fine, bare en_US doesn't
> show this bug, nor does en_US.iso88591...
>
> worthy of note is that [[:digit:]] works correctly, while [0-9] does
> not (and 1-9 is same bug as 0-9, if you were wondering), setting -E
> doesn't change anything either...

[Prev in Thread]

Current Thread

[Next in Thread]

bug#38503: Locale can cause incorrect number parsing in binary files, jan h, 2019/12/05
- bug#38503: Locale can cause incorrect number parsing in binary files, Eric Blake, 2019/12/05
  - bug#38503: Locale can cause incorrect number parsing in binary files, Eric Blake, 2019/12/05
    - bug#38503: Locale can cause incorrect number parsing in binary files, Paul Eggert, 2019/12/05
- bug#38503: Locale can cause incorrect number parsing in binary files, jan h <=
  - bug#38503: Locale can cause incorrect number parsing in binary files, jan h, 2019/12/05
    - bug#38503: Locale can cause incorrect number parsing in binary files, Eric Blake, 2019/12/05

Prev by Date: bug#38503: Locale can cause incorrect number parsing in binary files
Next by Date: bug#38503: Locale can cause incorrect number parsing in binary files
Previous by thread: bug#38503: Locale can cause incorrect number parsing in binary files
Next by thread: bug#38503: Locale can cause incorrect number parsing in binary files
Index(es):
- Date
- Thread