bug-gawk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: CSV extension status


From: Manuel Collado
Subject: Re: CSV extension status
Date: Tue, 18 May 2021 16:41:24 +0200
User-agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:38.0) Gecko/20100101 Thunderbird/38.5.0

El 18/05/2021 a las 14:56, Andrew J. Schorr escribió:
...
I think I understand the conceptual problem, but I feel as if maybe we're
letting the perfect be the enemy of the good.

Agreed.

In 99.9% of the cases where I use
CSV files, I simply want to have read-only access to the fields. Actually, if
I'm being honest, it's 100%. In other words, I want to be able to say something
like:

gawk -lcsv '
NR == 1 {
        for (i = 1; i <= NF; i++)
                m[$i] = i
        next
}

$m["age"] > 30 {
        sum += $m["weight"]
        n++
}

END {
        printf "found %d people over 30 with an average weight of %.3f\n",
               n, (n? sum/n : 0)
}'

Can this be done without a library?

Do you mean without an API-based extension? Yes.

I thought that the possibility of embedded
newlines meant that we needed a library for this rather than a simple FPAT
solution. Maybe I'm confused.

A pure gawk library is enough to effectively process CSV data. By using my CSVMODE library from http://mcollado.z15.es/xgawk/ your example can be coded almost verbatim:

gawk -i csvmode-1 '
NR==1 {next}

csvfield("age") > 30 {
        sum += csvfield("weight")
        n++
}

END {
        printf "found %d people over 30 with an average weight of %.3f\n",
               n, (n? sum/n : 0)
}'

And this code works with fields quoted, unquoted or with embedded newlines. This is why I'm unsure if an API-based gawk-csv extension is really needed.

How about also hosting pure gawk libraries, like CSVMODE, in the gawkextlib site? Arnold suggested this sometime ago.

Regards.

--
Manuel Collado - http://mcollado.z15.es



reply via email to

[Prev in Thread] Current Thread [Next in Thread]