[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Optimisation of statistic calculations
From: |
John Darrington |
Subject: |
Re: Optimisation of statistic calculations |
Date: |
Thu, 4 Nov 2004 08:18:17 +0800 |
User-agent: |
Mutt/1.3.28i |
On Wed, Nov 03, 2004 at 02:03:10PM +0000, Jason Stover wrote:
Two ways attenuate eventual bloat of PSPP are:
1. As you mentioned, cache the common and, most importantly,
sufficient, statistics. Have every statistical procedure cache its
sufficient statistics for later use. After being computed once, the
sufficient statistics can be used by that or other procedures
later. Sufficient statistics are used frequently, so this policy of
caching them could reduce a lot of recomputation.
That was the basic idea I had in mind. If the cache is preserved
across PSPP commands however, then we'd have to ensure that the cache
is invalidated if any transformations are done. But I think the cache
would be beneficial even within a single command, especially if there
are many subcommands used.
2. Use a generic optimization module. GSL provides one that could be
hooked in to PSPP. Different statistical estimation procedures use
the same backend algorithms (e.g., sorting for nonparametric routines
and Newton-Rhapson for generalized linear models). A single optimizer,
or other backend routines, can eliminate a lot of redundancy.
I briefly looked at the gsl manual, but couldn't see any mention of
this. Can you give me a reference to where this is documented?
J'
--
PGP Public key ID: 1024D/2DE827B3
fingerprint = 8797 A26D 0854 2EAB 0285 A290 8A67 719C 2DE8 27B3
See http://wwwkeys.pgp.net or any PGP keyserver for public key.
pgpwNfdB4mlJZ.pgp
Description: PGP signature