|
From: | Miriam English |
Subject: | Re: [bug-gawk] Tentative CSV extension - please advise |
Date: | Wed, 16 Mar 2016 00:54:48 +1000 |
User-agent: | Mozilla/5.0 (X11; Linux i686; rv:32.0) Gecko/20100101 Firefox/32.0 SeaMonkey/2.29.1 |
address@hidden wrote:
Yes. That's getting too messy. Instead such an extension should simply define a csvsplit() function. The big problem is embedded newlines in the fields. Sigh. If not for that we could deal with this issue much more easily...
If the fields are preprocessed by turning "the quick brown fox" into "the quick\nbrown fox"then it becomes much easier. Then after fields are extracted the escaped characters can be returned to their literals.
It would be slightly simplified using something similar to what sed recently has done in adding the -z option, where the input "line" is delimited by a zero byte instead of a newline. I often use that to process entire files as a single line, treating newlines as just another character.
It reduces the problem to working out which newlines end a csv record and shouldn't be escaped.
-- As artists, it would be a hell of a lot easier if our audiences were more tolerant of our penchant for boring them. - Cory Doctorow
[Prev in Thread] | Current Thread | [Next in Thread] |