|
From: | Michael Klement |
Subject: | Re: [bug-gawk] v4.1.3 (run on OSX 10.11.3): potential gsub() bug |
Date: | Sun, 7 Feb 2016 10:19:56 -0500 |
Test string is 'hätă', comprising
characters. With no regex option (basic regex) and with -E (extended), \xhh is apparently not recognized at all (the same applies to BSD Grep on OSX 10.11.3): $ grep -o '[\x00-\x7f]' <<<'hätă' # !! NO output $ grep -Eo '[\x00-\x7f]' <<<'hätă' # !! NO output With -P (PCRE), you get Unicode-aware range support, based on \x{…} with a variable number of hex digits:
$ grep -Po '[\x00-\x7f]' <<<'hätă' # OK - includes only ASCII chars. h t $ grep -Po '[^\x80-\xFF]' <<<'hätă' # OK - only excludes the extended ASCII range, so ă (0x103) is retained h t ă $ grep -Po '[^\x80-\x{10f7ff}]' <<<'hätă' # OK - excludes all non-ASCII chars. h t |
[Prev in Thread] | Current Thread | [Next in Thread] |