Re: Test files [WAS: Re: xlsread in Octave 3.6.4]

octave-maintainers

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Test files [WAS: Re: xlsread in Octave 3.6.4]

From:	Philip Nienhuis
Subject:	Re: Test files [WAS: Re: xlsread in Octave 3.6.4]
Date:	Tue, 17 Sep 2013 20:56:14 +0200
User-agent:	Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1.11) Gecko/20100701 SeaMonkey/2.0.6

Markus,

Before answering, just a quick question:
What mailer do you use?
- Its indentation seems tabified and (IMO) screws up readibility;

- Could you please strip my email address from your replies? I know itis easily reconstructed, but to just shout it all over the place....

Thanks

Read on...

Markus Bergholz wrote:

On Sun, Sep 15, 2013 at 9:56 PM, Philip Nienhuis  wrote
    :
    <snip>
    :
        This weekend I'll sent you a few test Excel files with empty
        row/columns, merged cells, dates, times, booleans, formulas,
        erroneous
        formulas and -consequently- error values in the cached value
        attributes,
        etc.
    Attached. I've combined a few test spreadsheets I had lying around.

    I used most of these for exploring ranges of occupied cells (which
    in OOXML turns out to be extremely easy, the range is in one of the
    first nodes of the worksheets) and cell value types. You'll find a
    few spreadsheet formulas which -intentionally- contain errors.

    Good luck with it,

<snip>
>
> * Mergedranges.xlsx
> looks nearly the same as with matlab R2010 at work. One difference is
> Matlab start read in column J. My xlsxread start at A as defined in the
> sheet. So i guess matlab ignore the real defined size.
> the values are the same. (as far as i quickly see).

Sorry, Matlab is actually right.

I deliberately merged the ten "topleftmost" rows and columns into one*empty* cell. So your xlsxread.m fails at that hurdle. Oopsie...

Here is what xlsfinfo.m (in Octave's OF-io package, using LibreOffice("UNO") interface) says:


=============================================================
>> xlsfinfo ("Mergedranges.xlsx", "uno")
  1: Sheet1                           (Used range ~ J1:AMJ15)
ans = Microsoft Excel Spreadsheet
=============================================================

Now, in the expanded xml (expanded using 7-zip) I see:

<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>

- <worksheetxmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main";xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships";xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006";mc:Ignorable="x14ac"xmlns:x14ac="http://schemas.microsoft.com/office/spreadsheetml/2009/9/ac";>

  <dimension ref="A1:AMJ15" />

Hey.....?? ^^^^^^^^^^^^^^^^^^^ ??


>
> * testOOXML.xlsx
> same for this file.
> the values are the same.

Octave (io package) does:
=============================================================
>> xlsfinfo ("testOOXML.xlsx", "uno")
  1: FirstSheet                       (Used range ~ B2:AD9)
  2: ThirdSheet                       (Used range ~ A4:AMJ39)
  3: Sheet#5                          (Used range ~ F2:Q8)
  4: Sheet5                           (Used range ~ B2:F14)
  5: Sixth                            (Used range ~ A1:B3)
  6: 7th                              (Used range ~ B2:K9)
  7: Sheet_nr_3                       (Used range ~ A2:M12)
ans = Microsoft Excel Spreadsheet
>>
=============================================================

...and that exactly matches the worksheets in question.

In the xml itself (again, expanded using 7-zip) I see:

testOOXML.xlsx, for example 4th worksheet ("Sheet5"):
---------------
<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>

  <dimension ref="B1:F65536" />

==> Wrong again. Should be B2:F14

I think this points to a more general issue/problem/bug with MS-Excel:it actually hands you the range of formatted cells (incl. empty ones!),rather than the range of cells with actual data contents.So detecting the occupied cell range isn't quite as easy as I hoped itwould be; it appears one cannot avoid scanning the entire worksheet :-(

Oh and this is the very reason that Matlab often gives you rows andcolumns of NaNs surrounding the actual data matrix. IMO Matlab behavesextremely dumb here.In the xls2oct.m, ods2oct.m and parsecell.m functions in the OF iopackage I've added (default) options to strip away such empty rows andcolumns (and to keep track of which spreadsheet cell ranges the strippeddata originate from).

Another issue I think needs some attention is that reading data valuesfrom raw spreadsheets as we do here, carries the risk of actuallyreading cached formula values rather than recalculated ones as MS-Exceland LibreOffice would do (if invoked through ActiveX or Java). IOW, onemight read outdated values (depending on recalculation settings in thespreadsheet program used to create the file).Note that the same applies to JExcelAPI (JXL) and OpenXLS (OXS) for.xls/.xlsx, and ODF Toolkit (OTK) and jOpenDocument (JOD) for .ods.Of the "pure" Java interfaces, only Apache POI (POI; .xls/.xlsx) has abuilt-in formula evaluator (that I do invoke in the relevant io packagefunctions).



>
> the question is now if (get rid of|change) line 102-109+122 to get the
> same wrong result as matlab or ignore this and we are done?

As you can guess, I don't think "we" are done yet :-)

Philip

[Prev in Thread]

Current Thread

[Next in Thread]

Re: xlsread in Octave 3.6.4, Philip Nienhuis, 2013/09/03
- Re: xlsread in Octave 3.6.4, Markus Bergholz, 2013/09/03
  - Re: xlsread in Octave 3.6.4, Philip Nienhuis, 2013/09/04
    - Re: xlsread in Octave 3.6.4, Markus Bergholz, 2013/09/13
    - Re: xlsread in Octave 3.6.4, Philip Nienhuis, 2013/09/13
    - Re: xlsread in Octave 3.6.4, Markus Bergholz, 2013/09/13
    - Message not available
    - Re: Test files [WAS: Re: xlsread in Octave 3.6.4], Markus Bergholz, 2013/09/17
    - Re: Test files [WAS: Re: xlsread in Octave 3.6.4], Philip Nienhuis <=
    - Re: Test files [WAS: Re: xlsread in Octave 3.6.4], Markus Bergholz, 2013/09/17
    - Message not available
    - Re: Fwd: Test files [WAS: Re: xlsread in Octave 3.6.4], Markus, 2013/09/18
    - Re: Fwd: Test files [WAS: Re: xlsread in Octave 3.6.4], Philip Nienhuis, 2013/09/18
    - Re: Fwd: Test files [WAS: Re: xlsread in Octave 3.6.4], Markus, 2013/09/18
    - Re: Fwd: Test files [WAS: Re: xlsread in Octave 3.6.4], Philip Nienhuis, 2013/09/20
    - Re: Fwd: Test files [WAS: Re: xlsread in Octave 3.6.4], Markus, 2013/09/21
    - Re: Fwd: Test files [WAS: Re: xlsread in Octave 3.6.4], Philip Nienhuis, 2013/09/21
    - Re: Fwd: Test files [WAS: Re: xlsread in Octave 3.6.4], Markus, 2013/09/22
    - Re: Fwd: Test files [WAS: Re: xlsread in Octave 3.6.4], PhilipNienhuis, 2013/09/22
    - Re: Fwd: Test files [WAS: Re: xlsread in Octave 3.6.4], Markus, 2013/09/22

Prev by Date: documenting class methods
Next by Date: Re: documenting class methods
Previous by thread: Re: Test files [WAS: Re: xlsread in Octave 3.6.4]
Next by thread: Re: Test files [WAS: Re: xlsread in Octave 3.6.4]
Index(es):
- Date
- Thread