bug-gawk
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [bug-gawk] 4.7 Defining Fields by Content


From: Miriam English
Subject: Re: [bug-gawk] 4.7 Defining Fields by Content
Date: Tue, 15 Mar 2016 08:09:54 +1000
User-agent: Mozilla/5.0 (X11; Linux i686; rv:32.0) Gecko/20100101 Firefox/32.0 SeaMonkey/2.29.1

Is it "normal" for csv files to have embedded linefeeds? All the csv files I've seen with special characters inside their fields have them written as escaped codes (such as \t, \n, \f, and so on) which are replaced with the actual characters on use. If raw control characters do exist inside fields of csv files then wouldn't a pass through to convert them to escaped codes solve that problem?

Cheers,

        - Miriam

Andrew J. Schorr wrote:
On Mon, Mar 14, 2016 at 09:40:14AM +0100, Marco Coletti wrote:
This is just short of what is needed to correctly parse RFC 4180
formatted data, in that it does not account for double quotes
appearing as part of a field.

But even with the enhanced FPAT you propose, unless I'm confused,
it still won't work with records containing embedded linefeed
characters. We have discussed in the past developing a CSV
input parser extension, but nobody has implemented it yet.
If you'd like to develop it, we would welcome the contribution
of such an extension, possibly for the gawkextlib project if not
appropriate for inclusion in mainline gawk.

Regards,
Andy



--

As artists, it would be a hell of a lot easier if our audiences were
more tolerant of our penchant for boring them.
  - Cory Doctorow



reply via email to

[Prev in Thread] Current Thread [Next in Thread]