pspp-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: casefile.c revision


From: John Darrington
Subject: Re: casefile.c revision
Date: Sat, 4 Jun 2005 08:43:49 +0800
User-agent: Mutt/1.5.4i

On Thu, Jun 02, 2005 at 09:59:16PM -0700, Ben Pfaff wrote:
     Ah.  I see.  To my mind, this is a little different from random
     access.  It's more like a "bookmark", in effect "Here, keep my
     place for me while I peek ahead a little bit".
     
     I can see a bunch of ways this might be implemented.  First, we
     could literally implement something like a bookmark.  The
     following two ways are equivalent, I think, but they are
     conceptually a bit different:
     
             A. Add a casereader_clone() that makes a new copy of a
                casereader, so that we can read ahead in one
                casereader and then make another pass across that same
                data in anther one.
     
             B. Create a new "class" called a casefile_position (or
                maybe call it a "casemark"?).  Then add a
                casereader_tell() to save a position and
                casereader_rewind() to go back to a position.  We'd
                also want a casefile_position_destroy() (see below).
     
     Second, we could use an intermediate casefile:
     
             C. Copy each case into the intermediate casefile as we
                go.  When we know what the rank of a set of cases will
                be, copy the intermediate casefile into the final
                casefile, changing the ranks as we go.  Typically the
                intermediate casefile would only have 1 case in it (as
                for 1, 2, and 4 in your example above) but it could
                end up with 100,000,000 cases (as for 3 in your
                example above).
     
     I like A and C the best.  I don't think C would even need any
     change to the casefile code, although some optimization might be
     helpful.


From the programmer's perspective (ie. the person writing rank like
commands) I think that A is the best.  The only thing is, one has to
bear in mind that casereader_clone()/ casereader_destroy() will be
called at least once per case, so some optimisation would be in order
here too --- perhaps a memory pool dedicated to each casefile would be
a good idea.  Also, I suppose it'd not make sense to clone a
destructive reader?

     
     So... why the heck do I think that this is better than just a
     "random seek" operation?  Well, mostly because I like to think of
     casefiles as something that you usually stream from one place to
     another.  That is, I like to think of them as analogous to pipes,
     not to files.
     
     In particular, I want casefiles to be able to support
     "destructive readers", which are readers where once you've read a
     case, it's gone--deleted, destroyed, etc.  A destructive reader
     can be useful because, when the casefile data is in-memory, the
     reader doesn't have to make a copy of data if it wants to modify
     it; instead it can just modify the copy that the casefile had.
     The copy-on-write case implementation in case.[ch] supports this
     out-of-the-box.
     
     Unfortunately, supporting random access means that this useful
     optimization isn't possible, because at any time we could seek
     backward to the first case (and expect to find it in its
     unmodified form).  On the other hand, if we have to indicate how
     many records back we can go (as in A or B) we can still discard
     anything that lies before any marker.  (This is why we'd want a
     casefile_position_destroy() in B: so that we know when markers
     are gone and can thus discard anything before them.)
     
     Am I making sense?

Yes.   I'm beginning to understand the casefile stuff better now.
Perhaps the should have been called "casestream".

J'
     
     

-- 
PGP Public key ID: 1024D/2DE827B3 
fingerprint = 8797 A26D 0854 2EAB 0285  A290 8A67 719C 2DE8 27B3
See http://pgp.mit.edu or any PGP keyserver for public key.


Attachment: pgp1ob9TDwliR.pgp
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]