bug-gawk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [bug-gawk] Tentative CSV extension - please advise


From: Miriam English
Subject: Re: [bug-gawk] Tentative CSV extension - please advise
Date: Wed, 16 Mar 2016 00:54:48 +1000
User-agent: Mozilla/5.0 (X11; Linux i686; rv:32.0) Gecko/20100101 Firefox/32.0 SeaMonkey/2.29.1


address@hidden wrote:

Yes. That's getting too messy.  Instead such an extension should simply define
a csvsplit() function.

The big problem is embedded newlines in the fields.  Sigh. If not for that
we could deal with this issue much more easily...

If the fields are preprocessed by turning
"the quick
brown fox"
into
"the quick\nbrown fox"
then it becomes much easier. Then after fields are extracted the escaped characters can be returned to their literals.

It would be slightly simplified using something similar to what sed recently has done in adding the -z option, where the input "line" is delimited by a zero byte instead of a newline. I often use that to process entire files as a single line, treating newlines as just another character.

It reduces the problem to working out which newlines end a csv record and shouldn't be escaped.



--

As artists, it would be a hell of a lot easier if our audiences were
more tolerant of our penchant for boring them.
  - Cory Doctorow



reply via email to

[Prev in Thread] Current Thread [Next in Thread]