bug-gawk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [bug-gawk] A CSV Standard


From: David Jordan
Subject: Re: [bug-gawk] A CSV Standard
Date: Tue, 18 Nov 2014 22:38:10 -0000

I would be happy to volunteer to write it as I have been wanting to
contribute to a free software project for a while and it seems a simple
enough task (always dangerous to say). Do you think it would be better off
standalone or as part of gawketxtlib?  

-----Original Message-----
From: Andrew J. Schorr [mailto:address@hidden 
Sent: 18 November 2014 19:49
To: Aharon Robbins
Cc: address@hidden; address@hidden
Subject: Re: [bug-gawk] A CSV Standard

On Tue, Nov 18, 2014 at 09:28:49PM +0200, Aharon Robbins wrote:
> Thanks for the note. I wasn't aware of this RFC. I'll update the 
> manual in the next day or two.

I never noticed this section of the manual before.  Doesn't this FPAT
solution break for fields that contain a mix of embedded quotes and commas?
For example:

bash-4.2$ cat /tmp/bad.csv
f1,f2,f3,f4,f5
"a","b","c","this one has a quote "" inside, and also a comma","d"
bash-4.2$ cat /tmp/simple-csv.awk 
     BEGIN {
         FPAT = "([^,]+)|(\"[^\"]+\")"
     }

     {
         print "NF = ", NF
         for (i = 1; i <= NF; i++) {
             printf("$%d = <%s>\n", i, $i)
         }
     }
bash-4.2$ gawk -f /tmp/simple-csv.awk /tmp/bad.csv NF =  5
$1 = <f1>
$2 = <f2>
$3 = <f3>
$4 = <f4>
$5 = <f5>
NF =  6
$1 = <"a">
$2 = <"b">
$3 = <"c">
$4 = <"this one has a quote "" inside>
$5 = < and also a comma">
$6 = <"d">

I wonder if we might need an extension to provide a CSV input parser to
handle this properly.

Regards,
Andy




reply via email to

[Prev in Thread] Current Thread [Next in Thread]