[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: CSV extension status

From: Manuel Collado
Subject: Re: CSV extension status
Date: Wed, 19 May 2021 13:29:38 +0200
User-agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:38.0) Gecko/20100101 Thunderbird/38.5.0

El 18/05/2021 a las 23:38, Manuel Collado escribió:
El 18/05/2021 a las 17:33, Andrew J. Schorr escribió:
For those who want simple, read-only access to CSV documents, my
gut instinct is that an input parser library would be a better and
more robust solution. In particular, the splitting and
reconstruction of the record with OFS seems a bit slow and fragile
to me.

If your code never rewrites the data this reconstruction will never take
place. And if it does, the reconstruction is certainly done in the gawk
core, not in the extension. Do you think the gawk core is slow and
fragile? ;-)

Oh! Sorry. I've just realized that you are probably talking about how csvmode.awk rebuilds the record with clean values delimited by CSVOFS. Of course, you are right. On simple cases demangling CSV and composing the clean values almost duplicates the processing time.

But, surprisingly, even in that case the pure gawk library beats the API-based extension. A simple test based on your previous age/weight example, with a sample of 10000 random values gives:

-- with csvmode.awk
CSVMODE = 1 (CSV fragments)
real    0m0.151s
user    0m0.109s
sys     0m0.015s

CSVMODE = -1 (clean values)
real    0m0.253s
user    0m0.203s
sys     0m0.030s

-- with gawk-csv (clean values)
real    0m0.980s
user    0m0.312s
sys     0m0.672s

Don't know the reason of this unexpected result.

Manuel Collado - http://mcollado.z15.es

reply via email to

[Prev in Thread] Current Thread [Next in Thread]