bug-gawk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [bug-gawk] example tweak in documentations


From: Aharon Robbins
Subject: Re: [bug-gawk] example tweak in documentations
Date: Tue, 07 Apr 2015 10:48:38 +0300
User-agent: Heirloom mailx 12.5 6/20/10

Hi Ed.

The doc discusses replacing + with *; the main thing you seem to be
pointing out is the use of * to allow empty quoted fields.

I don't know if this is worth the trouble, but I've made a note
in the doc to revisit this at some point.

Thanks,

Arnold

> Date: Fri, 20 Mar 2015 18:49:07 +0000 (UTC)
> From: Ed Morton <address@hidden>
> To: address@hidden
> Subject: [bug-gawk] example tweak in documentations
>
> The FPAT example used in: 
>
> http://www.gnu.org/software/gawk/manual/gawk.html#Splitting-By-Content 
>
> is, I'm sure, used as the starting point for many people working on CSV
> files. It doesn't support empty fields, however, and with a small tweak
> it could. For example:
>
> $ cat file 
> Robbins,Arnold,"1234 A Pretty Street, NE",MyTown,MyState,12345-6789,USA 
> Smith,John,"314 Pi Ave, IL",HisTown,HisState,,USA 
>
> Notice that in the 2nd line the ZIP code (6th field) is not populated
> and here's what the FPAT value from the documentation does with that:
>
> $ cat tst1.awk 
> BEGIN { 
> FPAT = "([^,]+)|(\"[^\"]+\")" 
> } 
>
> { 
> print "\nNF = ", NF 
> for (i = 1; i <= NF; i++) { 
> printf("$%d = <%s>\n", i, $i) 
> } 
> } 
> $ awk -f tst1.awk file 
>
> NF = 7 
> $1 = <Robbins> 
> $2 = <Arnold> 
> $3 = <"1234 A Pretty Street, NE"> 
> $4 = <MyTown> 
> $5 = <MyState> 
> $6 = <12345-6789> 
> $7 = <USA> 
>
> NF = 6 
> $1 = <Smith> 
> $2 = <John> 
> $3 = <"314 Pi Ave, IL"> 
> $4 = <HisTown> 
> $5 = <HisState> 
> $6 = <USA> 
>
> i.e. it discards it completely. Now if we tweak the FPAT to just use
> `*` instead of `+` as the repetition metacharacter:
>
> $ cat tst2.awk 
> BEGIN { 
> FPAT = "([^,]*)|(\"[^\"]*\")" 
> } 
>
> { 
> print "\nNF = ", NF 
> for (i = 1; i <= NF; i++) { 
> printf("$%d = <%s>\n", i, $i) 
> } 
> } 
> $ 
> $ awk -f tst2.awk file 
>
> NF = 7 
> $1 = <Robbins> 
> $2 = <Arnold> 
> $3 = <"1234 A Pretty Street, NE"> 
> $4 = <MyTown> 
> $5 = <MyState> 
> $6 = <12345-6789> 
> $7 = <USA> 
>
> NF = 7 
> $1 = <Smith> 
> $2 = <John> 
> $3 = <"314 Pi Ave, IL"> 
> $4 = <HisTown> 
> $5 = <HisState> 
> $6 = <> 
> $7 = <USA> 
>
> it handles it correctly. I know this is just an FPAT example and as
> such doesn't need to be perfect handle all cases but I think given this
> is probably being copy/pasted into a lot of scripts and it's a trivial
> tweak to fix it, it might be worth doing.
>
> Ed. 



reply via email to

[Prev in Thread] Current Thread [Next in Thread]