pspp-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: MemopMemmmmo


From: Ben Pfaff
Subject: Re: MemopMemmmmo
Date: Sat, 17 Mar 2012 12:15:17 -0700
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/23.2 (gnu/linux)

John Darrington <address@hidden> writes:

> On Sat, Mar 17, 2012 at 11:13:56AM -0700, Ben Pfaff wrote:
>      
>      The current default of 64 MB is fairly conservative for modern
>      systems.  I'd be happy to adjust that downward on (presumably
>      older) systems that have little memory.  Perhaps we could use the
>      gnulib "physmem" module to find that out.
>      
>      But: Are you sure that the problem is that the default setting is
>      too high?  I would have guessed that the problem is actually one
>      of two things: either the setting is being raised manually to a
>      value that is too high for the system, or the categorical code
>      does not honor the setting regardless of its value.  (Without
>      looking at code, I'd guess that the latter is the case.)
>
> 64MB is quite acceptable if it is being called just a few 
> times, which would be typical of a normal use of a categorical procedure.
> However if a continuous variable is unwittingly specified as a categorical
> variable, then potentially the system will attempt to allocate 64MB * N 
> where N is the number of distinct  values of that variable.  Clearly if N is
> very large, that's not going to work.

Ah, yes.  I've been aware of related problems for a long time,
but I haven't come up with a good solution.  One must limit the
total memory allocated, not the memory allocated per-instance, of
course, but the proper way to distribute the available memory
among the competing users is not obvious.  I guess that the
easiest way is first-come-first-served.  That might be just fine
in the common case, so perhaps we should implement it that way as
a first cut.

For categoricals, though, what's the fallback if the memory usage
becomes too high?  Can we fall back to some kind of on-disk
storage, or do we just fail?  "Just fail" is probably not a good
way to go, if first-come-first-served is the strategy we use,
because it means that unrelated memory use (e.g. for cases) can
cause even small number of categories to break.

Here's another idea that comes to mind: is there a maximum number
of categories that makes sense?  Would a "max categories" setting
defaulting to, say, 1000, still allow most users to get real work
done in realistic cases?
-- 
Ben Pfaff 
http://benpfaff.org



reply via email to

[Prev in Thread] Current Thread [Next in Thread]