[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[bug-gawk] characters-as-bytes switch
From: |
SP |
Subject: |
[bug-gawk] characters-as-bytes switch |
Date: |
Sun, 17 Jun 2012 00:55:16 +0200 |
Hello,
Sorry for my approximate english, I'm french ;-)
Well, I've just installed the latest cygwin binaries under Windows 7, in
order to have a gawk with "characters-as-bytes" switch. Unfortunately, this
switch doesn't seem to act correctly within pattern. Here is a full log
demonstrating the problem. Note that \xE2\x80\x93 is a valid UTF-8
character, not \xE2\x80\x42, and note the period in the gensub pattern.
==========
C:\>ver
Microsoft Windows [Version 6.1.7601]
C:\>gawk.exe --version
GNU Awk 4.0.1
...
blah blah
C:\>gawk.exe 'BEGIN { print "\xE2\x80\x93"; exit }' | gawk.exe
--characters-as-bytes "{ print gensub(/\xE2\x80./,""ZZZ"",""g"",$0)}" | od
-c -t x1
0000000 342 200 223 \n
e2 80 93 0a
0000004
C:\>gawk.exe 'BEGIN { print "\xE2\x80\x42"; exit }' | gawk.exe
--characters-as-bytes "{ print gensub(/\xE2\x80./,""ZZZ"",""g"",$0)}" | od
-c -t x1
0000000 Z Z Z \n
5a 5a 5a 0a
0000004
==========
If I inject a real UTF-8 char, /\xE2\x80./ doestn't match despite
--characters-as-bytes. And if I inject an invalid UTF-8 char /\xE2\x80./
matches.
Thanks by advance for your help in circumvention and/or correction of this
problem !
Stéphane
- [bug-gawk] characters-as-bytes switch,
SP <=