[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Test files [WAS: Re: xlsread in Octave 3.6.4]
From: |
Philip Nienhuis |
Subject: |
Re: Test files [WAS: Re: xlsread in Octave 3.6.4] |
Date: |
Tue, 17 Sep 2013 20:56:14 +0200 |
User-agent: |
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1.11) Gecko/20100701 SeaMonkey/2.0.6 |
Markus,
Before answering, just a quick question:
What mailer do you use?
- Its indentation seems tabified and (IMO) screws up readibility;
- Could you please strip my email address from your replies? I know it
is easily reconstructed, but to just shout it all over the place....
Thanks
Read on...
Markus Bergholz wrote:
On Sun, Sep 15, 2013 at 9:56 PM, Philip Nienhuis wrote
:
<snip>
:
This weekend I'll sent you a few test Excel files with empty
row/columns, merged cells, dates, times, booleans, formulas,
erroneous
formulas and -consequently- error values in the cached value
attributes,
etc.
Attached. I've combined a few test spreadsheets I had lying around.
I used most of these for exploring ranges of occupied cells (which
in OOXML turns out to be extremely easy, the range is in one of the
first nodes of the worksheets) and cell value types. You'll find a
few spreadsheet formulas which -intentionally- contain errors.
Good luck with it,
<snip>
>
> * Mergedranges.xlsx
> looks nearly the same as with matlab R2010 at work. One difference is
> Matlab start read in column J. My xlsxread start at A as defined in the
> sheet. So i guess matlab ignore the real defined size.
> the values are the same. (as far as i quickly see).
Sorry, Matlab is actually right.
I deliberately merged the ten "topleftmost" rows and columns into one
*empty* cell. So your xlsxread.m fails at that hurdle. Oopsie...
Here is what xlsfinfo.m (in Octave's OF-io package, using LibreOffice
("UNO") interface) says:
=============================================================
>> xlsfinfo ("Mergedranges.xlsx", "uno")
1: Sheet1 (Used range ~ J1:AMJ15)
ans = Microsoft Excel Spreadsheet
=============================================================
Now, in the expanded xml (expanded using 7-zip) I see:
<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
- <worksheet
xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main"
xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships"
xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006"
mc:Ignorable="x14ac"
xmlns:x14ac="http://schemas.microsoft.com/office/spreadsheetml/2009/9/ac">
<dimension ref="A1:AMJ15" />
Hey.....?? ^^^^^^^^^^^^^^^^^^^ ??
>
> * testOOXML.xlsx
> same for this file.
> the values are the same.
Octave (io package) does:
=============================================================
>> xlsfinfo ("testOOXML.xlsx", "uno")
1: FirstSheet (Used range ~ B2:AD9)
2: ThirdSheet (Used range ~ A4:AMJ39)
3: Sheet#5 (Used range ~ F2:Q8)
4: Sheet5 (Used range ~ B2:F14)
5: Sixth (Used range ~ A1:B3)
6: 7th (Used range ~ B2:K9)
7: Sheet_nr_3 (Used range ~ A2:M12)
ans = Microsoft Excel Spreadsheet
>>
=============================================================
...and that exactly matches the worksheets in question.
In the xml itself (again, expanded using 7-zip) I see:
testOOXML.xlsx, for example 4th worksheet ("Sheet5"):
---------------
<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
- <worksheet
xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main"
xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships"
xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006"
mc:Ignorable="x14ac"
xmlns:x14ac="http://schemas.microsoft.com/office/spreadsheetml/2009/9/ac">
<dimension ref="B1:F65536" />
==> Wrong again. Should be B2:F14
I think this points to a more general issue/problem/bug with MS-Excel:
it actually hands you the range of formatted cells (incl. empty ones!),
rather than the range of cells with actual data contents.
So detecting the occupied cell range isn't quite as easy as I hoped it
would be; it appears one cannot avoid scanning the entire worksheet :-(
Oh and this is the very reason that Matlab often gives you rows and
columns of NaNs surrounding the actual data matrix. IMO Matlab behaves
extremely dumb here.
In the xls2oct.m, ods2oct.m and parsecell.m functions in the OF io
package I've added (default) options to strip away such empty rows and
columns (and to keep track of which spreadsheet cell ranges the stripped
data originate from).
Another issue I think needs some attention is that reading data values
from raw spreadsheets as we do here, carries the risk of actually
reading cached formula values rather than recalculated ones as MS-Excel
and LibreOffice would do (if invoked through ActiveX or Java). IOW, one
might read outdated values (depending on recalculation settings in the
spreadsheet program used to create the file).
Note that the same applies to JExcelAPI (JXL) and OpenXLS (OXS) for
.xls/.xlsx, and ODF Toolkit (OTK) and jOpenDocument (JOD) for .ods.
Of the "pure" Java interfaces, only Apache POI (POI; .xls/.xlsx) has a
built-in formula evaluator (that I do invoke in the relevant io package
functions).
>
> the question is now if (get rid of|change) line 102-109+122 to get the
> same wrong result as matlab or ignore this and we are done?
As you can guess, I don't think "we" are done yet :-)
Philip
- Re: xlsread in Octave 3.6.4, Philip Nienhuis, 2013/09/03
- Re: xlsread in Octave 3.6.4, Markus Bergholz, 2013/09/03
- Re: xlsread in Octave 3.6.4, Philip Nienhuis, 2013/09/04
- Re: xlsread in Octave 3.6.4, Markus Bergholz, 2013/09/13
- Re: xlsread in Octave 3.6.4, Philip Nienhuis, 2013/09/13
- Re: xlsread in Octave 3.6.4, Markus Bergholz, 2013/09/13
- Message not available
- Re: Test files [WAS: Re: xlsread in Octave 3.6.4], Markus Bergholz, 2013/09/17
- Re: Test files [WAS: Re: xlsread in Octave 3.6.4],
Philip Nienhuis <=
- Re: Test files [WAS: Re: xlsread in Octave 3.6.4], Markus Bergholz, 2013/09/17
- Message not available
- Re: Fwd: Test files [WAS: Re: xlsread in Octave 3.6.4], Markus, 2013/09/18
- Re: Fwd: Test files [WAS: Re: xlsread in Octave 3.6.4], Philip Nienhuis, 2013/09/18
- Re: Fwd: Test files [WAS: Re: xlsread in Octave 3.6.4], Markus, 2013/09/18
- Re: Fwd: Test files [WAS: Re: xlsread in Octave 3.6.4], Philip Nienhuis, 2013/09/20
- Re: Fwd: Test files [WAS: Re: xlsread in Octave 3.6.4], Markus, 2013/09/21
- Re: Fwd: Test files [WAS: Re: xlsread in Octave 3.6.4], Philip Nienhuis, 2013/09/21
- Re: Fwd: Test files [WAS: Re: xlsread in Octave 3.6.4], Markus, 2013/09/22
- Re: Fwd: Test files [WAS: Re: xlsread in Octave 3.6.4], PhilipNienhuis, 2013/09/22
- Re: Fwd: Test files [WAS: Re: xlsread in Octave 3.6.4], Markus, 2013/09/22