[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: CSV extension status
From: |
Andrew J. Schorr |
Subject: |
Re: CSV extension status |
Date: |
Tue, 18 May 2021 08:56:52 -0400 |
User-agent: |
Mutt/1.5.21 (2010-09-15) |
On Mon, May 17, 2021 at 11:44:56PM +0200, Manuel Collado wrote:
> A record is parsed when read from an input file. And also after
> assigning $0 = "new value". The API allows a custom input parser do
> the first, but not the second.
>
> For instance, a standard way of prepending a field to the current
> record would be:
>
> $0 = "new field" OFS $0
>
> For CSV fields and records this construction only works if FPAT and
> OFS have the appropriate values. But the API doesn't allow the
> extension to silently assign values to the predefined variables.
Ah, OK, this is because the API sym_update function refuses to allow
extensions to set predefined variables listed in main.c:varinit.
> And things are even worse if the record syntax can not be parsed
> with the supported FS/FPAT/FIELDWIDTHS modes.
OK.
> A naive approach would be to let the API offer a hook that allows a
> custom input parser to fully override the internal gawk record
> parser. But this possibility require a careful consideration.
>
> Hope this clarify things. I'm ready to further explain my goals, if
> you like.
I think I understand the conceptual problem, but I feel as if maybe we're
letting the perfect be the enemy of the good. In 99.9% of the cases where I use
CSV files, I simply want to have read-only access to the fields. Actually, if
I'm being honest, it's 100%. In other words, I want to be able to say something
like:
gawk -lcsv '
NR == 1 {
for (i = 1; i <= NF; i++)
m[$i] = i
next
}
$m["age"] > 30 {
sum += $m["weight"]
n++
}
END {
printf "found %d people over 30 with an average weight of %.3f\n",
n, (n? sum/n : 0)
}'
Can this be done without a library? I thought that the possibility of embedded
newlines meant that we needed a library for this rather than a simple FPAT
solution. Maybe I'm confused.
Perhaps I simply haven't dug deep enough into the wonders of CSV format, but if
we could somehow have a csv library or include file that enabled CSV parsing to
work transparently in the read-only case, I think that would be a big win. If
we in addition need to have an insanely complicated gawk library on top of that
to enable reparsing and reconstruction and writing of records, that's fine, but
I suspect that just being able to parse correctly on a read-only basis
(including stripping encapsulating quotes from field values) would be a very
useful tool for lots of people in many situations. Is that doable with an FPAT
solution or a parser library?
Regards,
Andy
- CSV extension status, Ed Morton, 2021/05/16
- Re: CSV extension status, Manuel Collado, 2021/05/17
- Re: CSV extension status, Andrew J. Schorr, 2021/05/17
- Re: CSV extension status, Manuel Collado, 2021/05/17
- Re: CSV extension status,
Andrew J. Schorr <=
- Re: CSV extension status, Manuel Collado, 2021/05/18
- Re: CSV extension status, Andrew J. Schorr, 2021/05/18
- Re: CSV extension status, Manuel Collado, 2021/05/18
- Re: CSV extension status, Manuel Collado, 2021/05/19
- Re: CSV extension status, Andrew J. Schorr, 2021/05/19
- Re: CSV extension status, Andrew J. Schorr, 2021/05/19
- Re: CSV extension status, Manuel Collado, 2021/05/19
- Re: CSV extension status, Andrew J. Schorr, 2021/05/19
- Re: CSV extension status, Manuel Collado, 2021/05/19
- Re: CSV extension status, Andrew J. Schorr, 2021/05/19