Re: [bug-gawk] example tweak in documentations

bug-gawk

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [bug-gawk] example tweak in documentations

From:	Ed Morton
Subject:	Re: [bug-gawk] example tweak in documentations
Date:	Tue, 07 Apr 2015 06:50:33 -0500
User-agent:	Mozilla/5.0 (Windows NT 6.1; WOW64; rv:31.0) Gecko/20100101 Thunderbird/31.6.0

Sounds good. Yeah, I hadn't noticed that further down in the section it doesdiscuss using `*` instead of `+`, I just used the first example and fell over anempty field in my data. Just if you're changing something around there some dayanyway and you happen to think of it.....


Thanks,

    Ed.

On 4/7/2015 2:48 AM, Aharon Robbins wrote:

Hi Ed.

The doc discusses replacing + with *; the main thing you seem to be
pointing out is the use of * to allow empty quoted fields.

I don't know if this is worth the trouble, but I've made a note
in the doc to revisit this at some point.

Thanks,

Arnold

Date: Fri, 20 Mar 2015 18:49:07 +0000 (UTC)
From: Ed Morton <address@hidden>
To: address@hidden
Subject: [bug-gawk] example tweak in documentations

The FPAT example used in:

http://www.gnu.org/software/gawk/manual/gawk.html#Splitting-By-Content

is, I'm sure, used as the starting point for many people working on CSV
files. It doesn't support empty fields, however, and with a small tweak
it could. For example:

$ cat file
Robbins,Arnold,"1234 A Pretty Street, NE",MyTown,MyState,12345-6789,USA
Smith,John,"314 Pi Ave, IL",HisTown,HisState,,USA

Notice that in the 2nd line the ZIP code (6th field) is not populated
and here's what the FPAT value from the documentation does with that:

$ cat tst1.awk
BEGIN {
FPAT = "([^,]+)|(\"[^\"]+\")"
}

{
print "\nNF = ", NF
for (i = 1; i <= NF; i++) {
printf("$%d = <%s>\n", i, $i)
}
}
$ awk -f tst1.awk file

NF = 7
$1 = <Robbins>
$2 = <Arnold>
$3 = <"1234 A Pretty Street, NE">
$4 = <MyTown>
$5 = <MyState>
$6 = <12345-6789>
$7 = <USA>

NF = 6
$1 = <Smith>
$2 = <John>
$3 = <"314 Pi Ave, IL">
$4 = <HisTown>
$5 = <HisState>
$6 = <USA>

i.e. it discards it completely. Now if we tweak the FPAT to just use
`*` instead of `+` as the repetition metacharacter:

$ cat tst2.awk
BEGIN {
FPAT = "([^,]*)|(\"[^\"]*\")"
}

{
print "\nNF = ", NF
for (i = 1; i <= NF; i++) {
printf("$%d = <%s>\n", i, $i)
}
}
$
$ awk -f tst2.awk file

NF = 7
$1 = <Robbins>
$2 = <Arnold>
$3 = <"1234 A Pretty Street, NE">
$4 = <MyTown>
$5 = <MyState>
$6 = <12345-6789>
$7 = <USA>

NF = 7
$1 = <Smith>
$2 = <John>
$3 = <"314 Pi Ave, IL">
$4 = <HisTown>
$5 = <HisState>
$6 = <>
$7 = <USA>

it handles it correctly. I know this is just an FPAT example and as
such doesn't need to be perfect handle all cases but I think given this
is probably being copy/pasted into a lot of scripts and it's a trivial
tweak to fix it, it might be worth doing.

Ed.

[Prev in Thread]

Current Thread

[Next in Thread]

Re: [bug-gawk] example tweak in documentations, Aharon Robbins, 2015/04/07
- Re: [bug-gawk] example tweak in documentations, Ed Morton <=

Prev by Date: Re: [bug-gawk] example tweak in documentations
Next by Date: [bug-gawk] beta release of gawk 4.1.2 available
Previous by thread: Re: [bug-gawk] example tweak in documentations
Next by thread: [bug-gawk] beta release of gawk 4.1.2 available
Index(es):
- Date
- Thread