[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: naming data sets
Re: naming data sets
Sat, 17 Dec 2005 10:15:46 +0800
On Fri, Dec 16, 2005 at 08:39:02AM -0800, Ben Pfaff wrote:
Jason Stover <address@hidden> writes:
> It's time for me to write a routine that saves the residuals of
> a regression model to the data. I have avoided doing this until
> now because I want the user to be able to have a choice of saving
> residuals to different data sets.
> Right now there is only one data set we can refer to, and that
> is a serious limitation. How difficult would it be to make PSPP
> able to recognize different data sets? If there is only one
I've thought about this for a while and I think I have a
First, I think that my suggestion that all the data sets have the
same dictionary was flawed. The problem is that the dictionary
for whatever data set you're working with can change without the
other dictionaries changing in a similar manner. For example,
COMPUTE can add a variable to the current data set's dictionary,
but if we want to add that to the other data sets' dictionaries,
what should be the values? We'd have to, essentially, perform
all transformations on all the existing data sets, and I don't
think that's something that really makes sense (if it does to
you, please explain).
Thus, I propose that we support multiple data sets, each of which
has an independent dictionary. We introduce a new type of file
handle for these data sets. Tentatively I'll call these
"temporary" file handles and "temporary" data sets (but better
terminology is welcome). Access to temporary data sets is
through temporary file handles, using the usual commands for
accessing system files (GET, SAVE, XSAVE).
Here's what I'd add to the PSPP language:
* Some extra syntax on FILE HANDLE for declaring
temporary file handles (MODE=TEMPORARY perhaps).
* A new command for destroying temporary file handles
(e.g. CLOSE FILE HANDLE or DELETE FILE HANDLE), so that
the memory or disk space used to store them can be
* GET, SAVE, XSAVE would be extended to read and write to
temporary file handles. I'd introduce some kind of
syntactic sugar so that it wasn't strictly necessary to
declare temporary file handles in advance,
e.g. something like XSAVE OUTFILE=TEMPORARY
<HANDLENAME> would work properly.
Internally, a temporary file handle would be represented by a
dictionary plus a casefile, I think. When PSPP terminates, the
data in temporary file handles would automatically disappear,
just like data in the active file.
Presumably this something that would only be available if
--syntax=enhanced is used ??
PGP Public key ID: 1024D/2DE827B3
fingerprint = 8797 A26D 0854 2EAB 0285 A290 8A67 719C 2DE8 27B3
See http://pgp.mit.edu or any PGP keyserver for public key.
Description: Digital signature