help-octave
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: xlsread in Octave 3.6.4


From: Markus Bergholz
Subject: Re: xlsread in Octave 3.6.4
Date: Mon, 2 Sep 2013 00:10:46 +0200




On Sun, Sep 1, 2013 at 11:42 PM, PhilipNienhuis <address@hidden> wrote:
Markus Bergholz wrote
> now it's faster than matlab!!
> matlab takes  ~100 seconds
> xlsxread in octave ~80 seconds
> http://p.osuv.de/index.php/ZuBLam/ (autodelete after 5 days)
> i will push my modifications later.
>
>
> On Sun, Jun 2, 2013 at 10:25 PM, Markus Bergholz &lt;

> markuman@

> &gt; wrote:
>
>>
>>
>>
>> On Sun, May 12, 2013 at 9:26 PM, Philip Nienhuis &lt;

> pr.nienhuis@

> &gt;wrote:
>>
>>> Markus Bergholz wrote:
>>>
>>>>
>>>>
>>>>
>>>> On Wed, May 8, 2013 at 10:06 AM, PhilipNienhuis &lt;

> pr.nienhuis@

> &gt;>> &lt;mailto:

> pr.nienhuis@

> &gt;**> wrote:
>>>>
>>>>     E4
>>>>     Markus Bergholz wrote
>>>>      > I haven't follow this thread and it's issue, but i've wrote a
>>>>     xlsxread
>>>>      > function whitch don't need java.
>>>>      > but it's very very rudimentary, works just with linux and is a
>>>>     quick&dirty
>>>>      > write-down.
>>>>      > furthermore, you have to remove the string-analyse part, if your
>>>>     sheet
>>>>      > don't contain strings.
>>>>      > but maybe it helps someone else or someone want to improve it or
>>>>     someone
>>>>      > rewrite it in c/c++ as oct file, to get it even faster than
>>>>     matlab (for me
>>>>      > it's still faster than the java stuff atm).
>>>>      >
>>>>      >
>>>> http://git.osuv.de/Octave/**tree/functions/xlsxread.m&lt;http://git.osuv.de/Octave/tree/functions/xlsxread.m&gt;
>>>>
>>>>     The Java based options are relatively slow as they offer maximum
>>>>     flexibility
>>>>     as regards data types.
>>>>
>>>>     Before venturing in COM/ActiveX and Java based solutions for the io
>>>>     pkg 4
>>>>     years ago I've looked at a few other solutions, similar to yours.
>>>>     IIRC the
>>>>     most promising one was posted in an OpenWatcom news group. All of
>>>>     them (i.e.
>>>>     the "free solutions") suffered from the same limitations: lack of
>>>>     flexibility, lack of documentation, dependency on some very
>>>> specific
>>>>     development framework, and/or bound to specific .xls formats
>>>> (BIFF5,
>>>>     BIFF8,
>>>>     OOXML, what not).
>>>>
>>>>     If you want I can look if your code can somehow be absorbed in the
>>>>     io pkg as
>>>>     a sort of fall-back option.
>>>>
>>>>
>>>> i don't think that this is a good idea :D as i said, it just works with
>>>> linux (i'm using sed and unzip through 'system' command. furthermore, i
>>>> made quick&dirty my own tmp-dir (mktemp -d would be better). aaaaaand
>>>> so
>>>> on :)
>>>>
>>>>     To that end it needs a suitable license
>>>>
>>>>
>>>> i don't care about the licence as long as it's a free licence.
>>>>
>>>>     and
>>>>     someone should support/maintain it (my C/C++ skills are
>>>> rudimentary).
>>>>
>>>>     Philip
>>>>
>>>>
>>>> my c/c++ skills are rudimentary too :)
>>>> if you like, we could code together on github on a xlsxread function
>>>> e.g..
>>>> it is not so difficult but it is extremely time-consuming to parse the
>>>> shitty ms xml format!! (i don't read any specs yet, just do some lousy
>>>> reverse engineering).
>>>>
>>>
>>> Weighing the amount of work needed to build a good, robust and
>>> fool-proof
>>> C+/C-based xlsread backend versus already having available a well-tested
>>> choice of working (albeit relatively slow [1]) solutions, I just fail to
>>> see the benefits of reinventing the wheel.
>>>
>>> Just for the record & to emphasize an important aspect, I myself don't
>>> use xlsread (or xlswrite), I usually invoke the much more flexible
>>> xlsopen-xls2oct-[parsecell-]**oct2xls-xlsclose sequences. So we'd be
>>> talking about another interface in xlsopen/xls2oct/xlsclose rather than
>>> xlsread.
>>>
>>> Philip
>>>
>>> [1] OpenOffice / LibreOffice are really fast for large spreadsheets, I
>>> doubt a 2-person amateur team can beat the OOo/LO devs as regards speed
>>> tuning; the only problem is start-up time of OOo/LO.
>>> Oh and there's a currently unsolvable Java-UNO issue outlined when you
>>> use it for the first time.
>>> BTW a while ago I had a try with Starbasic (& ActiveX) invoking
>>> LibreOffice for spreadsheet I/O. I already had some success, but I had
>>> to
>>> put it away due to lack of time. Maybe next summer I can look at it
>>> again.
>>> Maybe that can be made cross-platform too.
>>>
>>
>>
>> I've do a rewrite of my xlsxread function and push it to github
>> https://github.com/markuman/xlsxread/
>> it is ~10% faster now, (still faster than the java version, but still
>> slow!)
>> Theoretical this could work in windows now too, but the unzip command in
>> octave don't accept the .xlsx extension:
>> warning: unrecognized file type, .xlsx
>> So i have to use a system command again (see line 47-51
>> https://github.com/markuman/xlsxread/blob/master/xlsxread.m )
>> strings are not recognized too atm. so it's still limited.
>> if someone has an idea how to improve it, i'd like so see some forks :D
>>
>>
>>
>>
>>
>
>
> --
> icq: 167498924
> XMPP|Jabber:

> address@hidden

>
> _______________________________________________
> Help-octave mailing list

> Help-octave@

> https://mailman.cae.wisc.edu/listinfo/help-octave

Hi Markus,

Tonight I had a brief glance of your code and tried a few command lines from
your .m files. Nice stuff.
I encountered a few hurdles (e.g., no unzip binaries in the MXE builds f
Windows) but OK that was easily solved.

yes, this is already fixed. see: http://savannah.gnu.org/bugs/index.php?39148
 
A first try, concerning a simple xlsx file from my test suite with one text
string inside a square, otherwise numerical cell range, breaks in the
reshape stage because your regexp line doesn't recognize and thus skips
<f></f> tags that AFAICS seem to be used for booleans (rather than <v></v>
tags).
Note that the enclosing <c...> (column) tags indicate the cell type, so in
principle text strings can be extracted as well.

yes :) it's all not supported atm.
 
I'd expect a next hurdle to be "merged" cells. But maybe that is easy.

It is probably not so hard to properly parse the xml worksheet files so that
text strings and booleans + probably formulas are read. But I am sure it
will induce a speed penalty.


yes, it will :)
my very first quick and dirty version did one sed command for parse line by line.
http://git.osuv.de/Octave/tree/functions/xlsxread.m
this is the easiest but slowest (but still faster than java!) way to parse it.
i made the last changes ~3 month ago https://github.com/markuman/xlsxread/
but i've never pushed my last commit with a 10% working range-read regexp part (that's another braking part).
So xlsxread is always on my mind, but i did roughly nothing in my semester break ;)
In ~2-3 weeks i'll be more active again.


All in all I think the blazing speed you claim (a claim I believe as-is)
comes at the cost of robustness and some flexibility. To be able to be
included in the io package I think some of the speed has to be sacrificed to
get some more robust code that won't provoke too many bug reports.
BTW I saw str2num being used to convert text to doubles. Any reason for
that? I ask because str2double is known to be much faster.

indeed: https://github.com/markuman/xlsxread/search?q=str2num&ref=cmdform
 

I don't know when I can have another look. Your code is promising though;
I'd like to amend and include it in the near future in the io package.
But to that end I hope you can make up your mind about the license. Would
you agree with GPL 3? I don't know if the current "do what the f**k you want
to" license is compatible with GPL 3 and thus compatible with the rest of
the io package.


GPL3 is fine too for me. 
feel free to fork it on github and commit it with a new licence and the str2double replacement :P

 
Philip




--
View this message in context: http://octave.1599824.n4.nabble.com/xlsread-in-Octave-3-6-4-tp4652046p4656979.html
Sent from the Octave - General mailing list archive at Nabble.com.
_______________________________________________
Help-octave mailing list
address@hidden
https://mailman.cae.wisc.edu/listinfo/help-octave



--
icq: 167498924
XMPP|Jabber: address@hidden

reply via email to

[Prev in Thread] Current Thread [Next in Thread]