bug-gawk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[bug-gawk] escaped pipe char in FS mistreated


From: Nat Brown
Subject: [bug-gawk] escaped pipe char in FS mistreated
Date: Wed, 13 Mar 2013 06:14:25 -0700

background: if I have comma-separated data, such as

  12345 , , , , , 12 , Data Street,   Command Deck   , Enterprise, Space, 17094

and i would like to use awk/gawk to split it into fields around commas and also strip leading/trailing whitespace, i can use the following FS and script:

 BEGIN { FS="[ \t]*,[ \t]*"; }
 {
   for (i=1; i <= NF; i++) {
       printf "%2d '%s'\n", i, $i;
   }
 }

to receive the following output:

 1 '12345'
 2 ''
 3 ''
 4 ''
 5 ''
 6 '12'
 7 'Data Street'
 8 'Command Deck'
 9 'Enterprise'
10 'Space'
11 '17094'

the same works for any other normal character separator. except…

issue: if my data is instead pipe-separated, such as

  12345 | | | | | 12 | Data Street|   Command Deck   | Enterprise| Space| 17094

using FS="|" works to split fields around the pipe character, but including the pipe in a regexp FS results in silent failure by AWK, non sensible warning "warning: escape sequence `\|' treated as plain `|'" and failure by GAWK:

 BEGIN { FS="[ \t]*\|[ \t]*"; }
 {
   for (i=1; i <= NF; i++) {
       printf "%2d '%s'\n", i, $i;
   }
 }

yields:

 1 '12345'
 2 '|'
 3 '|'
 4 '|'
 5 '|'
 6 '|'
 7 '12'
 8 '|'
 9 'Data'
10 'Street|'
11 'Command'
12 'Deck'
13 '|'
14 'Enterprise|'
15 'Space|'
16 '17094'

expected behavior would be to treat '\|' as the character '|', identically to ',' or other characters, rather than stripping the escape and incorporating it into the FS regexp.

thx, n@

reply via email to

[Prev in Thread] Current Thread [Next in Thread]