bug-gawk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[bug-gawk] example tweak in documentations


From: Ed Morton
Subject: [bug-gawk] example tweak in documentations
Date: Fri, 20 Mar 2015 18:49:07 +0000 (UTC)

The FPAT example used in:

http://www.gnu.org/software/gawk/manual/gawk.html#Splitting-By-Content

is, I'm sure, used as the starting point for many people working on CSV files. It doesn't support empty fields, however, and with a small tweak it could. For example:

$ cat file
Robbins,Arnold,"1234 A Pretty Street, NE",MyTown,MyState,12345-6789,USA
Smith,John,"314 Pi Ave, IL",HisTown,HisState,,USA

Notice that in the 2nd line the ZIP code (6th field) is not populated and here's what the FPAT value from the documentation does with that:
                   
$ cat tst1.awk
BEGIN {
    FPAT = "([^,]+)|(\"[^\"]+\")"
}

{
    print "\nNF = ", NF
    for (i = 1; i <= NF; i++) {
        printf("$%d = <%s>\n", i, $i)
    }
}
$ awk -f tst1.awk file

NF =  7
$1 = <Robbins>
$2 = <Arnold>
$3 = <"1234 A Pretty Street, NE">
$4 = <MyTown>
$5 = <MyState>
$6 = <12345-6789>
$7 = <USA>

NF =  6
$1 = <Smith>
$2 = <John>
$3 = <"314 Pi Ave, IL">
$4 = <HisTown>
$5 = <HisState>
$6 = <USA>

i.e. it discards it completely. Now if we tweak the FPAT to just use `*` instead of `+` as the repetition metacharacter:

$ cat tst2.awk
BEGIN {
    FPAT = "([^,]*)|(\"[^\"]*\")"
}

{
    print "\nNF = ", NF
    for (i = 1; i <= NF; i++) {
        printf("$%d = <%s>\n", i, $i)
    }
}
$
$ awk -f tst2.awk file

NF =  7
$1 = <Robbins>
$2 = <Arnold>
$3 = <"1234 A Pretty Street, NE">
$4 = <MyTown>
$5 = <MyState>
$6 = <12345-6789>
$7 = <USA>

NF =  7
$1 = <Smith>
$2 = <John>
$3 = <"314 Pi Ave, IL">
$4 = <HisTown>
$5 = <HisState>
$6 = <>
$7 = <USA>

it handles it correctly. I know this is just an FPAT example and as such doesn't need to be perfect handle all cases but I think given this is probably being copy/pasted into a lot of scripts and it's a trivial tweak to fix it, it might be worth doing.

     Ed.

reply via email to

[Prev in Thread] Current Thread [Next in Thread]