[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [bug-gawk] Tentative CSV extension - please advise
From: |
Andrew J. Schorr |
Subject: |
Re: [bug-gawk] Tentative CSV extension - please advise |
Date: |
Mon, 14 Mar 2016 19:56:57 -0400 |
User-agent: |
Mutt/1.5.23 (2014-03-12) |
On Mon, Mar 14, 2016 at 11:14:20PM +0100, Manuel Collado wrote:
> I'm considering to develop a CSV read/write extension. Currently in a very
> preliminary stage. Just reading specs, api docs, collecting csv libraries
> for a variety of languages, etc.
Cool.
> In order to organize things in an adequate way, please confirm, correct or
> comment the following points:
>
> 1.- The extension should provide CSV field values as $1-$NF
That sounds right.
> 2.- The current extension api only allows to provide a string value for $0,
> but not individual field values. $0 is split into fields by the gawk core
> according to the current FS.
I took a look at the awk_input_buf_t API, and you are right. And it has to be
this way, since the file could also be read using getline into a variable, in
which case the notion of $1-$NF does not apply.
> 3.- A possible solution would be to forge a $0 record by joining the csv
> fields with a customizable ad-hoc field separator, and temporarily set FS to
> that separator value.
Ugh. If there were a string that is guaranteed never to appear in a CSV
file, then we could use that for FS. If such a string does not exist, then
the only clean way to solve this problem seems to be to modify core gawk
to add a CSV field-splitting capability.
> 6.- Help about how to initially setup a gawkextlib extension development
> directory will be really appreciated.
If you clone the gawkextlib tree, it contains a script named
"make_extension_directory.sh" at the top level. It sets up a skeleton
extension and tells you what to do next. Here's the help message:
bash-4.2$ ./make_extension_directory.sh --help
./make_extension_directory.sh: illegal option -- -
Usage: make_extension_directory.sh [-g <path to gawk>] [-l <path to
gawkextlib>] <new extension directory name> <author name> <author email address>
Configures a new directory for adding an extension. This installs
the boilerplate stuff so you can focus on writing code,
documentation,
and test cases.
Options:
-g Specify path to your gawk installation, in case
it's in a nonstandard place.
-l Specify path to your gawkextlib installation, in case
it's in a nonstandard place.
So that part should be easy. The problem is the CSV field parsing. I hadn't
considered that issue.
Regards,
Andy