bug-apl
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-apl] Performance problems when constructing large(ish) arrays


From: Blake McBride
Subject: Re: [Bug-apl] Performance problems when constructing large(ish) arrays
Date: Tue, 17 Jan 2017 18:48:55 -0600

Rather than jump to adding new quad functions, I'm wondering what the timing of reading that CSV file is when you optimize the APL code like the few suggestions made by Juergen.

Specifically, we all know APL is a dog when it comes to looping and doing one thing at a time.  Reading the whole thing in as a matrix and processing it as a unit is more APL-ish and would probably have beaten the bad version of the Lisp code.  (Of course reading the whole thing in and processing it as a unit could end up taking 1GB of RAM with the intermediary stuff.)

On the other hand, reading CSV and fixed length record files is pretty common and useful.

Thanks.

Blake


On Tue, Jan 17, 2017 at 5:01 PM, Juergen Sauermann <address@hidden> wrote:
Hi Elias,

I believe in principle what we want is something like this:

Z←FOO¨Z←⎕FIO[N] 'filename'

where ⎕FIO[N] reads 'filename' line by line putting each line j into the nested item Z[j]
and FOO is a decoding function that translates a line into whatever Z[j] shall become in the end.

The current performance problem is then solved by the ¨ operator which allocates a big enough Z beforehand
and fills it with the result of FOO for each line.

I can try to make ⎕FIO an operator so that you can use

Z←FOO ⎕FIO[N] 'filename'

for the above and I hope that will be syntactically possible. But it looks almost like +/[N]B with FOO
instead of + and ⎕FIO instead of / which I believe should work somehow. Can become a little tricky though,
because there are the same ambiguities for ⎕FIO then those for / (function versus operator).

/// Jürgen



On 01/17/2017 09:37 PM, Elias Mårtenson wrote:
On 18 January 2017 at 04:10, Juergen Sauermann <address@hidden> wrote:
 
What I do not like about ⎕CSV (actually I am only guessing here because I dont know what it reallly does,
but I assume it is specifically for comma separated lists) is that it is supposedly only works for comma
separated lists. If we have something more general which solves the performance problem of
Z⍪ without only working for specific formats like CSV then I would prefer that.

You make a good point, and in my envisioned function (being an external function, or a built-in one (called ⎕CSV or otherwise)) would accept a left-hand argument, being a format definition telling the function how to parse the CSV data.

You are absolutely correct in that there are many ways to express CSV data, and looking at the flags available in R gives some insight into this. My intention is to build something that can at least handle the most important of these variations. What the left-hand format definition will look like, I have not yet decided, except for one thing: I want to be able to specify a function that will be called that can be responsible for parsing a line. This way it'll be possible to handle any format that is not natively supported.

Regards,
Elias



reply via email to

[Prev in Thread] Current Thread [Next in Thread]