The FPAT example used in:
is, I'm sure, used as the starting point for many people working on CSV files. It doesn't support empty fields, however, and with a small tweak it could. For example:
$ cat file
Robbins,Arnold,"1234 A Pretty Street, NE",MyTown,MyState,12345-6789,USA
Smith,John,"314 Pi Ave, IL",HisTown,HisState,,USA
Notice that in the 2nd line the ZIP code (6th field) is not populated and here's what the FPAT value from the documentation does with that:
$ cat tst1.awk
BEGIN {
FPAT = "([^,]+)|(\"[^\"]+\")"
}
{
print "\nNF = ", NF
for (i = 1; i <= NF; i++) {
printf("$%d = <%s>\n", i, $i)
}
}
$ awk -f tst1.awk file
NF = 7
$1 = <Robbins>
$2 = <Arnold>
$3 = <"1234 A Pretty Street, NE">
$4 = <MyTown>
$5 = <MyState>
$6 = <12345-6789>
$7 = <USA>
NF = 6
$1 = <Smith>
$2 = <John>
$3 = <"314 Pi Ave, IL">
$4 = <HisTown>
$5 = <HisState>
$6 = <USA>
i.e. it discards it completely. Now if we tweak the FPAT to just use `*` instead of `+` as the repetition metacharacter:
$ cat tst2.awk
BEGIN {
FPAT = "([^,]*)|(\"[^\"]*\")"
}
{
print "\nNF = ", NF
for (i = 1; i <= NF; i++) {
printf("$%d = <%s>\n", i, $i)
}
}
$
$ awk -f tst2.awk file
NF = 7
$1 = <Robbins>
$2 = <Arnold>
$3 = <"1234 A Pretty Street, NE">
$4 = <MyTown>
$5 = <MyState>
$6 = <12345-6789>
$7 = <USA>
NF = 7
$1 = <Smith>
$2 = <John>
$3 = <"314 Pi Ave, IL">
$4 = <HisTown>
$5 = <HisState>
$6 = <>
$7 = <USA>
it handles it correctly. I know this is just an FPAT example and as such doesn't need to be perfect handle all cases but I think given this is probably being copy/pasted into a lot of scripts and it's a trivial tweak to fix it, it might be worth doing.
Ed.