|
From: | Manuel Collado |
Subject: | Re: CSV extension status |
Date: | Wed, 19 May 2021 13:29:38 +0200 |
User-agent: | Mozilla/5.0 (Windows NT 10.0; WOW64; rv:38.0) Gecko/20100101 Thunderbird/38.5.0 |
El 18/05/2021 a las 23:38, Manuel Collado escribió:
El 18/05/2021 a las 17:33, Andrew J. Schorr escribió: ..For those who want simple, read-only access to CSV documents, my gut instinct is that an input parser library would be a better and more robust solution. In particular, the splitting and reconstruction of the record with OFS seems a bit slow and fragile to me.If your code never rewrites the data this reconstruction will never take place. And if it does, the reconstruction is certainly done in the gawk core, not in the extension. Do you think the gawk core is slow and fragile? ;-)
Oh! Sorry. I've just realized that you are probably talking about how csvmode.awk rebuilds the record with clean values delimited by CSVOFS. Of course, you are right. On simple cases demangling CSV and composing the clean values almost duplicates the processing time.
But, surprisingly, even in that case the pure gawk library beats the API-based extension. A simple test based on your previous age/weight example, with a sample of 10000 random values gives:
-- with csvmode.awk CSVMODE = 1 (CSV fragments) real 0m0.151s user 0m0.109s sys 0m0.015s CSVMODE = -1 (clean values) real 0m0.253s user 0m0.203s sys 0m0.030s -- with gawk-csv (clean values) real 0m0.980s user 0m0.312s sys 0m0.672s Don't know the reason of this unexpected result. Regards. -- Manuel Collado - http://mcollado.z15.es
[Prev in Thread] | Current Thread | [Next in Thread] |