bug-gawk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [bug-gawk] v4.1.3 (run on OSX 10.11.3): potential gsub() bug


From: Hermann Peifer
Subject: Re: [bug-gawk] v4.1.3 (run on OSX 10.11.3): potential gsub() bug
Date: Fri, 29 Jan 2016 09:19:37 +0100
User-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:38.0) Gecko/20100101 Thunderbird/38.5.1

Below is what I get on Mac OS X 10.10.5, using gawk/master

This seems to be related to the UTF-8 locale and the fact that all bytes
in the given range (0x80..0xFF) are not valid as first byte in an UTF-8
byte sequence.

Hermann

$ locale
LANG="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_CTYPE="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_ALL="en_US.UTF-8"
$
$ echo 'hät' | gawk '{ gsub(/[\x80-\xFF]/, ""); print }'
gawk: cmd. line:1: error: Invalid collation character: /[�-�]/
$
$ echo 'hät' | LC_ALL=C gawk '{ gsub(/[^\x80-\xFF]/, ""); print }'
ä



On 2016-01-29 1:29, Michael Klement wrote:
> The following, which should return 'ht', crashes:
> 
> $ echo 'hät' | gawk '{ gsub(/[\x80-\xFF]/, ""); print }' 
> gawk: cmd. line:1: fatal error: internal error
> Abort trap: 6
> 
> Its inverse, which should return 'ä', does not:
> 
> $ echo 'hät' | gawk '{ gsub(/[^\x80-\xFF]/, ""); print }' 
> ä
> 
> 
> Regards,
> 
> Michael




reply via email to

[Prev in Thread] Current Thread [Next in Thread]