pan-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Pan-devel] Re: Want to fix memory consumption issues


From: Calin A. Culianu
Subject: Re: [Pan-devel] Re: Want to fix memory consumption issues
Date: Mon, 31 May 2004 16:53:54 -0500 (EST)


On Mon, 31 May 2004, Duncan wrote:

> Calin A. Culianu posted
> <address@hidden>, excerpted
> below,  on Sat, 29 May 2004 18:43:48 -0500:
> 
> > 
> > I really really want to fix the memory consumption issues in Pan. Namely,
> > the problem that loading articles from groups with large numbers of
> > headers (like > 500,000)  tends to bring most boxes to a crawl, because
> > Pan ends up eating tons of memory.  It seems like roughly 1kb per article
> > header!
> > 
> > So if I wanted to experiment with changing the memory needs of Pan (a
> > change that might be very invasive actually..).. What are the good places
> > to look?  One obvious place to optimize for memory use is changing struct
> > Article or the way that articles get loaded.  Another place might be the
> > gtk widget used to display the header pane...
> > 
> > How _does_ the header pane get loaded (I'm blurry-eyed to read any more of
> > the sources now..)?  Ideally one would load text (from the struct Article
> > pointers) into it piecemeal as the user scrolls through the groups.  Is
> > that what happens.. or is the gtk widget for the header pane populated all
> > at once with possibly millions of articles?
> 
> I'm not a source groker myself, but the consensus has always been that
> it's the GTK widgets that are the performance and memory hogs.

Well struct Article eats ~120 bytes in just metadata, on top of the
200-300 bytes for the header/text data, so that's about 300-400 bytes of
data per header.  1 million headers is already 400 megabytes.. so I would 
say struct Article isn't helping matters any..

> 
> The trouble as I understand it is that PAN uses them as they were never
> intended to be used, forcing them to scale to entry-counts they were never
> intended to handle.  The problem is exacerbated by PAN using what is
> primarily a GUI widget that happens to have minor data-widget
> capabilities, as a data-widget that happens to be a GUI-widget as
> well.  This problem should to a large extent go away when PAN switches to
> the sqlite back-end, as has been planned for some time.  A couple months
> ago, there was some discussion here from a volunteer with some database
> programming experience that wished to help there, but I believe the
> discussion was taken to private e-mail and I haven't heard if he lost
> interest or if it's coming along nicely and is just waiting on Charles to
> get some time to fit it into mainline CVS or what.

Hmm.. so all headers are loaded into the header pane widget, on TOP of 
being simultaneously stored in a big linked-list of struct Article?!?!  
Eeek!!

I too have been playing lately with sqlite and trying to design some
reasonable tables to handle the needs of pan.  I have a good amount of
database programming experience too :).

> 
> Anyway, I believe that's the big place to look, right now, as the easy
> performance enhancements have already been done with the existing setup,
> and additional work would likely be dead-ended when the switch IS made. 
> That database back-end work seems to be more and more the thing holding up
> additional large-scale changes, so it needs to be tackled by someone, and
> now is as good a time as any, since as Charles mentioned in his replies to
> the other guy b4 it went off-list, he doesn't have a lot of time for
> other PAN work right now, and PAN is fairly stable where it is anyway.

Ok..

> 
> I've more than once gotten the feeling, however, that Charles is a bit
> hesitant to make the switch himself, because he has no experience in the
> area, and it's going to be a big enough change on its own, even if someone
> with db experience helps.  If you have it, that's IMO DEFINITELY help
> that's needed.  If you don't, perhaps now's the time to get it, because
> that db work is IMO the single biggest thing holding back PAN, at this
> point.

I have a lot of professional DB experience.  I am working out now how to 
best design the schema, so that it is quick to extract information such as 
parent/child relationships between articles for threading, and so that 
sorting on any field is quick too.  

What I am doing is building a dummy database/schema and then trying to
write small perl scripts to load data into it, query it, sort it, and also
do threading (as in, threading articles or grouping them based on either
binary attachments or x-ref headers).  This way I can re-tune the data 
layour to fit the practical needs of pan, as indicated by reading the 
sources..

I envision the end result being that pan stores just a list of article 
id's in memory (id's being integers, as in primary keys from a db table) 
which indicates the order of a sorted list of articles.

The actual data for article headers will be loaded in piecemeal.. so that 
as the user scrolls up and down through a list of articles in the header 
pane, the data gets loaded in from the db as needed, in order to populate 
the header pane widget and display the subjet line, author, etc...

This can minimize memory consumption, but the question is how much will 
it cost in terms of program responsiveness?  To answer that I need to 
experiment some...

> 
> Look in the archives for the previous discussion, and go from there, would
> be my suggestion.
> 

Okay.. it would be worthwhile to see what other people's thoughts were on 
the db design -- perhaps someone has already worked out a pretty good way 
to organize the data already so I don't re-invent the wheel.

I will look into it some more and see what I come up with.  

Thanks for the info!

-Calin
  





reply via email to

[Prev in Thread] Current Thread [Next in Thread]