bug-gawk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [bug-gawk] characters-as-bytes switch


From: Aharon Robbins
Subject: Re: [bug-gawk] characters-as-bytes switch
Date: Mon, 18 Jun 2012 23:15:48 +0300
User-agent: Heirloom mailx 12.4 7/29/08

Hi. Thanks for the bug report. I have reproduced this here under GNU/Linux.

I will work on a fix.

Thanks,

Arnold

> From: "SP" <address@hidden>
> To: <address@hidden>
> Date: Sun, 17 Jun 2012 00:55:16 +0200
> Subject: [bug-gawk] characters-as-bytes switch
>
> Hello,
>
> Sorry for my approximate english, I'm french ;-)
>
> Well, I've just installed the latest cygwin binaries under Windows 7, in
> order to have a gawk with "characters-as-bytes" switch. Unfortunately, this
> switch doesn't seem to act correctly within pattern. Here is a full log
> demonstrating the problem. Note that \xE2\x80\x93 is a valid UTF-8
> character, not \xE2\x80\x42, and note the period in the gensub pattern.
>
> ==========
>
> C:\>ver
> Microsoft Windows [Version 6.1.7601]
>
> C:\>gawk.exe --version
> GNU Awk 4.0.1
> ...
> blah blah
>
> C:\>gawk.exe 'BEGIN { print "\xE2\x80\x93"; exit }' | gawk.exe
> --characters-as-bytes "{ print gensub(/\xE2\x80./,""ZZZ"",""g"",$0)}" | od
> -c -t x1
>
> 0000000 342 200 223  \n
>          e2  80  93  0a
> 0000004
>
> C:\>gawk.exe 'BEGIN { print "\xE2\x80\x42"; exit }' | gawk.exe
> --characters-as-bytes "{ print gensub(/\xE2\x80./,""ZZZ"",""g"",$0)}" | od
> -c -t x1
>
> 0000000   Z   Z   Z  \n
>          5a  5a  5a  0a
> 0000004
>
> ==========
>
> If I inject a real UTF-8 char, /\xE2\x80./ doestn't match despite
> --characters-as-bytes. And if I inject an invalid UTF-8 char /\xE2\x80./
> matches.
>
> Thanks by advance for your help in circumvention and/or correction of this
> problem ! 
>
> St?phane



reply via email to

[Prev in Thread] Current Thread [Next in Thread]