octave-bug-tracker
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Octave-bug-tracker] [bug #33876] textscan: resuming reading does not wo


From: Philip Nienhuis
Subject: [Octave-bug-tracker] [bug #33876] textscan: resuming reading does not work
Date: Sun, 31 Jul 2011 11:57:51 +0000
User-agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1.11) Gecko/20100701 SeaMonkey/2.0.6

Follow-up Comment #4, bug #33876 (project octave):

Ben (just a quick but long-winded note),

Rik and I had a fair bit of email exchange behind the scenes lately about
textscan/textread/strread where quite a few details were discussed, so you
probably missed a few things.

What happens in textscan with the current implementation is that rather than
exactly repeating a format, reading a line (file record) is repeated instead.
That is a subtle but vital difference.
I.e,. it is assumed that a format string always applies to one "line" in the
file.
This breaks when a format applies to several "words" / "fields" / "items"
(whatever you call these) spread over multiple lines.

E.g., I have several FORTRAN output files that look like:

:
       465123.56            112345.32    83   83.44   76.33
   15     456.22    0.0123   9     320.01   -1.709
:

from which I can read using something like

   textscan (fid, "%f%f%d%f%f%d%f%f%d%f%f", delimiter, ' ')

Furthermore, it is allowed to call textread and strread using:

   [arg1, arg2, arg3, arg4] = textread (file)

(i.e. with a default format of "%f")

-or- 

   [arg1, arg2, arg3, arg4] = textread (file, "%f", N)

-or even-

   [arg1, arg2, arg3, arg4] = textread (file, "", N)

where the number of output arguments is determined by the user).

So your (and Rik's) patch work only of a format string applies to just one
file "line" (record). 
This not exactly equivalent to "repeating a format" as it is mentioned in the
on-line ML docs.

If your patch works (can't check right now, maybe tonight) we can always add a
note in the texinfo header that for "multi-line formats" N has to be
multiplied by the nr. of lines / format.


BTW format repeat count doesn't work in strread itself (see below).


A bit of background:

Of course, in principle anything can be fixed, but it may become very
complicated with the current strread implementation (that I simple accepted
and used as a given) to do this for format repeat count.

That's because only after the file and format string have been fully parsed
and deciphered by strread (including literals bordering fields w/o delimiters
in between, and including format specifiers with trailing literals) it is
known for sure how many fields are contained in one file "line".
Strread (1) simply counts the number of format fields (incl. literals and %*
specifiers), then (2) splits the file into "columns" of words, where the
number of columns matches the number of format items, and (3) finally
transforms the columns into the requested format. Somewhere in step (2) the
concept of file line is dropped in favour of a number of fields matching the
number of format items.

Only in above step (3) of strread it would be possible to exactly return the
number of words matching a specific format repeat count. 
Initially I had that in place but Rik dropped it and implemented the current
setup, with good reason (it avoids reading the entire file, which I
overlooked).

BTW I just note that having it moved out of strread has the side effect that
format repeat count doesn't work anymore in strread itself... (so Rik and I
both overlooked that one :-( )

I *do* have an idea of how to fix this situation, using some communication
between strread and textscan/textread) but currently I have other priorities.

BTW, I sent Rik some more patches a few days ago (small bug fixes +
CollectOutput option).


    _______________________________________________________

Reply to this item at:

  <http://savannah.gnu.org/bugs/?33876>

_______________________________________________
  Message sent via/by Savannah
  http://savannah.gnu.org/




reply via email to

[Prev in Thread] Current Thread [Next in Thread]