bug-gawk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: manual section 4.7.1


From: cph1968
Subject: Re: manual section 4.7.1
Date: Tue, 04 Apr 2023 15:04:04 +0000

   Thanks Arnold,

   I was not aware the —cvs option was not officially released yet, but it
   works well for me, still.

   /Jimmy

   On Tue, Apr 4, 2023 at 16:28, <[1]arnold@skeeve.com> wrote:

     Thank you for the note.
     As the documentation notes, FPAT is only a partial solution for
     dealing
     with CSV data.
     The --csv option is not yet released, although of course folks can
     build from
     git and use the result if they wish to.
     That section of the manual will be rewritten before gawk 5.3.0 is
     released.
     Thanks,
     Arnold
     cph1968@proton.me wrote:
     > the regex fp[2] in section 4.7.1 (below) don't quite cut it if the
     CSV file records end in both CR and NL [0H0D 0H0A]. I believe this
     is a common feature of Windows files.
     > A simple fix is however to use the gawk --csv option.
     >
     > ❯ head -n 2 TSCAINV_022023.csv| gawk -f print-fields.awk
     > >ID,CASRN,casregno,UID,EXP,ChemName,DEF,UVCB,FLAG,ACTIVITY
     > >F = 1 <ID,CASRN,casregno,UID,EXP,ChemName,DEF,UVCB,FLAG,ACTIVITY
     > >1,50-00-0,50000,,,Formaldehyde,,,,ACTIVE
     > >F = 1 <1,50-00-0,50000,,,Formaldehyde,,,,ACTIVE
     >
     > note here that the last '>' is first character on the next line.
     >
     > output using the --csv option:
     > ❯ head -n 2 TSCAINV_022023.csv| gawk --csv -f print-fields.awk
     > <ID,CASRN,casregno,UID,EXP,ChemName,DEF,UVCB,FLAG,ACTIVITY>
     > NF = 10
     <ID><CASRN><casregno><UID><EXP><ChemName><DEF><UVCB><FLAG><ACTIVITY>
     > <1,50-00-0,50000,,,Formaldehyde,,,,ACTIVE>
     > NF = 10 <1><50-00-0><50000><><><Formaldehyde><><><><ACTIVE>
     >
     > much better :-)
     >
     > ❯ cat print-fields.awk
     > {
     > print "<" $0 ">"
     > printf("NF = %s ", NF)
     > for (i = 1; i <= NF; i++) {
     > printf("<%s>", $i)
     > }
     > print ""
     > }
     >
     >
     > from section 4.7.1:
     > BEGIN {
     > fp[0] = "([^,]+)|(\"[^\"]+\")"
     > fp[1] = "([^,]*)|(\"[^\"]+\")"
     > fp[2] = "([^,]*)|(\"([^\"]|\"\")+\")"
     > FPAT = fp[fpat+0]
     > }
     >
     >
     >
     > kind regards,
     >
     > cph1968
     >
     > Sent with Proton Mail secure email.

References

   1. mailto:arnold@skeeve.com

Attachment: signature.asc
Description: OpenPGP digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]