[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Bug] strread() elaborated format strings
From: |
Philip Nienhuis |
Subject: |
Re: [Bug] strread() elaborated format strings |
Date: |
Mon, 14 May 2012 21:12:37 +0200 |
User-agent: |
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1.11) Gecko/20100701 SeaMonkey/2.0.6 |
Hi Júlio:
Júlio Hoffimann wrote:
Thanks Philip, that's why i didn't found in the code where %[^] was
being handled. I know John wants to rewrite some of this I/O functions
in pure C++, but if i have time to do a quick fix in strread.m, should i
add something around the line 473? It's the section where we need a new
branch for dealing with the mentioned format specifier?
I'm not quite sure whether %[] format specifiers can be implemented
efficiently in strread's current form.
Last summer I tried to get it together but it turned out to be a messy
affair full of gotchas and corner cases, and as a consequence, lots of
if clauses and thus very slow code. (I actually needed %[] and %[^]
myself but luckily I found ways to avoid them. Plus, at work we do have
Matlab.)
In addition, IIRC later on Rik found that Octave's regexp (based on
pcre) is relatively slow, and I think for each %[] we need one or two
calls to regexp().
Nevertheless, if you really need %[] I'm happy to again look into it.
The code for splitting the data into columns is much more reliable these
days so perhaps %[] can be made to work now.
The very best option would be to implement a binary (compiled) textscan
as work horse for strread (instead of vice versa). A while ago John has
sent me a rough textscan.c framework it but I lack C++ proficiency (and
I suppose John lacks time).
So, the question remains whether it is at all worthwile to again invest
in strread.m given the plans to have a binary textscan()
Anyway, if you're in a hurry, be my guest to give it a try.
Note: currently there are some pending strread.m fixes in the bug
tracker. See bugs #36356 + #36392 and #36398 (the last one should be
rebased). (I can't push those as my hgrc/mercurial setup got fubarred
repeatedly and I have neither time nor appetite to again fix it.)
Some guidelines (don't be put off):
First, you'd have to adapt the format string parsing code (L.284-309 in
-my patched see bug #s above- strread.m) to correctly parse and isolate
%[] specifiers. Shouldn't be too hard.
Next you'll have to adapt the format string matching code in L.450-530,
and adapt the column-splitting code in L.532-618. Especially this part
of the code is where I expect you to spend many an evening. (But who
knows...)
Then, further below you'd have to add a stanza for processing the %[]
specifiers to every matching column. Probably a breeze once the column
splitting is right.
Finally, a fair number of test cases should be added, covering all
imaginable corner cases. Have Matlab at hand for comparison.
Bug reported: https://savannah.gnu.org/bugs/index.php?36464
Thanks,
I'll first add a format scan for all not(-yet)-implemented ML format
specifiers + error msg.
Only then I'll start thinking about %[] (unless you or someone else
beats me to it).
Philip