Re: 2nd: sparse matrices

octave-maintainers

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: 2nd: sparse matrices

From:	john popeye
Subject:	Re: 2nd: sparse matrices
Date:	Tue, 27 Apr 2004 01:38:13 -0500

hi,

> I have been atracted to this thread because I myself
> have sometimes
> slowness probs w/ matrices w/ ~5e7 doubles.
> 
>   My ignoravimus' question is : doesn't the OS
> (Linux) use its swap
> disk space to allocate space beyond the memory
> capacity of the
> computer it is running on? Of course, it will swap
> data back and
> forth, not necessarily as efficiently as if I moved
> the data myself,
> but isn't the idea of swap that it should be done by
> the OS. Iirc, VMS
> had a reputation for good virtual memory management.

I would bet anything that a OS with 1GB main
memory cannot as efficiently swap 100GB data as
if it would be designed directly.
There is no way that the OS could compete.
The basic reason is: The OS does not know in advance
what data is used when. The guy who implements
the operation knows exactly.
Try the C=A+B example.


>   About the sparse package in general, its point is,
> iigc, to avoid
> both storing and adding/multiplying/whatever lots of
> zeros. As the
> name 'sparse' says, it is for matrices w/ few
> nonzeros, not
> necessarily 'huge'.

Yes and no. In certain cases even sparse matrices
are blown up to full size on a "column" or "row" basis
 during operation and then "compressed" again.
As long as you can easily "hold" the full equivalent
of a sparse matrix in memory, I would say it makes
very limited sense to implement/overload each operator
 for sparse matrices. That is quite a lot of work.
Remember "every" matrix operation has to be basically
reimplemented.
Most matrices in physical simulation, share common
properties:
-many are symmetric
-regardless of their size (n-cols,m-rows) they have
a similar number of nonzeros per column. Which means
their density goes down the larger the matrix becomes.

Also I would say that having a clean "out of core"
approach would also pay off for full matrices.
It just expands the problems you can solve by lets say
a factor of 300-1000.

One big question is for what you want to use
sparse matrices. If you want only a pragmatic approach
to store double indexed data you can live with simple
approaches. But then I suggest you use a database.
If you want a good and general implementation you have
to invest more time.

-- jp




> 
>   Etienne
> 
> 
> On Mon, Apr 26, 2004 at 12:42:02PM -0500, john
> popeye wrote:
> # hi andy,
> # 
> # see below
> # 
> # > > Summary:
> # > > Sparse matrices are a special case of the
> # > mathematical
> # > > object "matrix". This special case was
> introduced
> # > due
> # > > to the simple fact that you can be orders of
> # > magnitude
> # > > more efficient when you apply certain
> restrictions
> # > on
> # > > it.
> # > > My recommendation is implementing such a
> feature
> # > makes
> # > >  only sense if you do not restrict yourself by
> the
> # > way
> # > > you implement it. Focus on what you could
> achieve
> # > with
> # > > it and for what is was designed (LARGEness).
> (you
> # > do
> # > > not buy an expensive car when all you do is
> moving
> # > it
> # > > forward and backward in the garage a few
> meters,
> # > do
> # > > you?)
> # > 
> # > 
> # > Do you actually USE sparse matrices?
> # 
> # Of course I do, but due to the arguments given
> # in my first email I do NOT use matlab for it and
> also
> # not octave's approach. I tried both and got stuck.
> #  
> # > The Matlab sparse implementation (most of which
> is
> # > offered by octave-forge) gives almost exactly
> what
> # > you want, as well as some special features to do
> # > things on sparse that you don't normally do on
> # > full matrices.
> # 
> # I do not speak of the operations you could do
> # with the matrices, I talk about the size and
> amount
> # of matrices you can handle. The key thing 
> # of sparse matrices is, that you use the approach
> # because the objects are too large to handle them
> # like full matrices with tons of 0's. 
> # Well correct me if I am wrong, but does
> # matlab/octave handle the stuff "out of core",
> # speaking they do not hold the matrix in main
> memory?
> # The last time I tried it, honestly at least 3
> years
> # ago this was not the fact.
> # 
> # > Summary:
> # >     What's wrong with the Matlab approach and
> what
> # >     do you propose we do differntly?
> # 
> # matlab has some phantastic features, no question
> about
> # that. that is why others try in many places to
> mimic
> # it and in some areas even have a better solution
> ;-)
> # 
> # According to functionality matlab is a very rich 
> # package:
> # -numerics
> # -graphcis
> # -gui
> # -maybe some symbolic etc.
> # -other stuff
> # that's the reason why many 3rd party "toolboxes" 
> # exist which use these features and build other
> # stuff on it.
> # I think I do not need to explain these benefits.
> # 
> # Let's come to the bottlenecks of matlab. 
> # In it's very basic setup matlab limits itself
> # to the available main memory. Once all the
> # data you have fills up that memory this means
> #  "GAME OVER". It is extremely ugly to insert
> # in all your scripts/m-functions or  whatever
> routines
> # which start saving/loading.
> # In my eyes or lets
> # say in my area of application this is a killer.
> # Even if main memory is not that expansive any more
> # then it was, it is and will always be much more
> # expensive then disk space.
> # I would guess that 300GB of diskspace cost about
> # as much as 1GB of main memory, if it does.
> # 
> # That means that the code limits itself of handling
> # problems of a size 300 times smaller then it could
> 
> # with a proper architecture (we even do not talk
> about
> # 32bit problems and int4 adressing limits).
> # 
> # Example with full matrices for simplicity: 
> # I have 1.2GB of main memory and want to calculate
> # C=A+B;   # do not forget the ";" ;-)
> # this means each matrix could be 400MB which means
> lets
> # say sqrt(400e6/8)=7071
> # You might say this is large, but large is always
> # relative.
> # Assume now you have an "out of core" approach:
> # For the user it does not matter where the data is.
> # For him is important: "what can I do with it, and
> # how fast is it".
> # In the " out of core" approach I would store the
> # following information in memory: 
> #
>
name,n-columns,n-rows,complexity,symmetry,some-other-stuff
> #     
> # 
> # for each matrix. This is can be done with 100
> bytes
> # easily. So we have another 1.2GB main memory for
> # doing the math. Now we fread  chunks of 
> # 400mb   matrix "A" and "B", add it up and
> # fwrite 400mb chunks of C. This leads to as many
> # freads as we need to "eat-up" the matrices. 
> # You agree that I will be able to add as large
> # and as many matrices which fit on my disk.
> # 
> # For other operations and especially for sparse
> # matrices
> # the chunk size is in general multiples of the row
> # size, if it is stored in Harwell-Boeing compressed
> # column format.
> # As many operations use square matrices we can use
> # as a thumb rule that we can solve problems where
> # one column fits in memory. Going back to our
> # 1.2GB this means: 1.2E9/8=1.5e8 ROWS for real
> # and 7.5E7 ROWS for complex matrices (some factors
> for
> # addtional input and output granted). Regardless
> how
> # many additional matrices I have, due to the fact
> # that they all sleep on disc.
> # You must admit that even in the worst case the
> "out of
> # core" can solve significantly larger problems.
> # The key disadvantages are:
> # -you need some more overhead in designing such a
> # system
> # -basically it is slower, which can be compensated
> # with:
> # + more expensive disk system
> # + automatic pre/post-caching of the stuff you need
> # (as long as you handle data which you can afford
> # to have in memory (as all problems now ;-))
> 
=== message truncated === 


        

        
                
Mit schönen Grüßen von Yahoo! Mail - http://mail.yahoo.de

[Prev in Thread]

Current Thread

[Next in Thread]

2nd: sparse matrices, john popeye, 2004/04/26
- Re: 2nd: sparse matrices, Andy Adler, 2004/04/26
  - Re: 2nd: sparse matrices, john popeye, 2004/04/26
    - Re: 2nd: sparse matrices, Etienne Grossmann, 2004/04/26
    - Re: 2nd: sparse matrices, john popeye <=
    - Re: 2nd: sparse matrices, Andy Adler, 2004/04/27
    - Re: 2nd: sparse matrices, Paul Kienzle, 2004/04/27
- rand seed, Daniel J Sebald, 2004/04/27
  - Re: rand seed (answering own questions), Daniel J Sebald, 2004/04/27
    - file SEEK behavior changed, Daniel J Sebald, 2004/04/27
  - Re: rand seed, Paul Kienzle, 2004/04/27
    - Re: rand seed, Daniel J Sebald, 2004/04/27
    - Message not available
    - Re: rand seed, Paul Kienzle, 2004/04/28

Prev by Date: Re: 2nd: sparse matrices
Next by Date: Re: 2nd: sparse matrices
Previous by thread: Re: 2nd: sparse matrices
Next by thread: Re: 2nd: sparse matrices
Index(es):
- Date
- Thread