bug-gawk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [bug-gawk] escaped pipe char in FS mistreated


From: Davide Brini
Subject: Re: [bug-gawk] escaped pipe char in FS mistreated
Date: Wed, 13 Mar 2013 16:01:36 +0100

On Wed, 13 Mar 2013 06:14:25 -0700, Nat Brown <address@hidden> wrote:

> issue: if my data is instead pipe-separated, such as
> 
>   12345 | | | | | 12 | Data Street|   Command Deck   | Enterprise| Space|
> 17094
> 
> using FS="|" works to split fields around the pipe character, but
> including the pipe in a regexp FS results in silent failure by AWK, non
> sensible warning "warning: escape sequence `\|' treated as plain `|'" and
> failure by GAWK:
> 
>  BEGIN { FS="[ \t]*\|[ \t]*"; }
>  {
>    for (i=1; i <= NF; i++) {
>        printf "%2d '%s'\n", i, $i;
>    }
>  }
> 
> yields:
> 
>  1 '12345'
>  2 '|'
>  3 '|'
>  4 '|'
>  5 '|'
>  6 '|'
>  7 '12'
>  8 '|'
>  9 'Data'
> 10 'Street|'
> 11 'Command'
> 12 'Deck'
> 13 '|'
> 14 'Enterprise|'
> 15 'Space|'
> 16 '17094'
> 
> expected behavior would be to treat '\|' as the character '|', identically
> to ',' or other characters, rather than stripping the escape and
> incorporating it into the FS regexp.

The warning you get is sensible, and tells you exactly what gawk is doing.

Gawk interpolates literal strings (that's why you can do
var = "abc\ndef\tfoo"), so after it does that the string which ends up
in FS (or any variable you'd assign that string to, for that matter) is

"[ \t]*|[ \t]*"

So to do what you're trying to do you have to use

FS="[ \t]*\\|[ \t]*"

After interpolation, this becomes

"[ \t]*\|[ \t]*"

which, when used as a regexp for FS, means what you want.

When you used FS="|", it worked because when FS is a single char it's
special-cased and treated as a literal character and not as a regular
expression.

More information:
http://www.gnu.org/software/gawk/manual/gawk.html#Computed-Regexps

-- 
D.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]