[Pan-devel] SQLite versioning and timing issues. Was: [Pan-users] 0.14

pan-devel
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Pan-devel] SQLite versioning and timing issues. Was: [Pan-users] 0.14

From:	Duncan
Subject:	[Pan-devel] SQLite versioning and timing issues. Was: [Pan-users] 0.14.0 Slow on large group
Date:	Mon, 30 Jun 2003 15:55:36 -0700
User-agent:	KMail/1.5.2
[I've taken the liberty to post this to the devel group/list as well.  Please 
redirect replies to one OR the other, as appropriate, deleting the other 
group/list.]

On Sun 29 Jun 2003 16:58, address@hidden posted as excerpted below:
>       I've suggested this several times. If you ever take a look at a windows
> News reader called "Xnews" it does just that, and its a big help for
> large groups.

Upside down posting..  Grumble, grumble...  Yes, I know you were just 
following the person you replied to, but this applies to all the replies so 
far..  Replies should be below what they quote, so one can tell what the 
reply is about.  That also encourages proper editing of the quote so a two 
line reply isn't following 200 lines of quote with little to do with the 
reply.

> > I don't know anything about the internal handling of headers for a large
> > group like this, but I've seen the same degradation with groups with
> > anything over 200,000 headers.

Yes..  This is a known problem with PAN.  To some extent it has to do with the 
GTK widgets used, which have been a problem for PAN for some time.  Those 
widgets were apparently designed for some several hundred to several thousand 
sorted items max, nothing over 100K, certainly, and it definitely shows.

> > In thinking about this problem, what about splitting the headers into
> > chunks of 50,000 to sort them, and then take the sorted headers and
> > merge them.  This should allow you to thread the individual sorts as
> > they were read, and merging pre-sorted headers should be fairly linear.

This is an interesting idea.  I'm not an active PAN developer (yet, anyway), 
so don't know if chunking as described here has been tried or not, but if 
not, it would certainly be worth a try.

FWIW, one thing that may help some is huge amounts of memory.  According to 
the posts here, one guy with was it a gig, or 3/4 gig? of memory, was 
complaining when he tried to load several multi-hundred-K overview groups at 
a time.  That seems far better than the single group that gives me problems 
at a couple hundred-K overviews, with 1/2 a gig (512M).  Those with 256M 
memory report PAN choking at 100K overviews.  Thus, memory DOES seem to play 
a fairly large part in PAN's performance with huge groups.

I guess one conclusion that can be drawn from the above is that folks with 
less than half a gig of memory should choose a tool other than PAN if they 
are going to be working with groups of several hundred K overviews, at least 
at this time, unfortunately..

Looking toward the future, one of the BIG projects awaiting the developers is 
the transfer to a different database back end.  Right now, PAN handles all 
the sorting and storage basically on its own, and is somewhat limited in the 
features that can be added and the size of the groups handled (as the above 
demonstrates), without making the code hugely unmanagable due to growth of 
code and duplication of function of what COULD be handed off to a database 
library specializing in management of large numbers of data points.  
Honestly, the current code is unlikely to undergo any huge changes in that 
area, when we (tho I'm not a developer on PAN yet, I include myself as a PAN 
regular both here and on the devel list) are planning to offload much of that 
processing to a library in the not-to-distant future.

The library that's been mentioned is SQLite, which, as the name suggests, is a 
lite SQL style database library designed for inclusion by apps such as PAN.  
This will bring several benefits.  Hopefully, it will make processing of 
large numbers of overviews much more efficient as it's designed for that sort 
of larger quantity of datapoints handling, making it topical for this thread.  
In addition and dependent on that, it should enable virtual servers and 
virtual groups similar to the way BNR2 handles things, among other oft 
requested features.

(BTW, BNR2, which has both Linux and MSWormOS binaries, based on Borland 
Delphi/Kylex, just had a new Linux release, bringing it upto date with the 
MSWormOS release.  It had been several releases behind.  Unfortunately, BNR2 
isn't fully open source, nor could it be, based as it is on the proprietary 
Kylex.  Still, it does some stuff that no one else does as well, managing 
multiple servers and multiple groups and allowing one to combine several into 
a single virtual server or virtual group view.  If you are pragmatic in your 
approach to open source, BNR2 may well be the way to go now, for such huge 
newsgroups and other high end binary group features.)

Anyway, back to SQLite and PAN.  In addition to the above "virtual" features 
and hopefully far more efficient handling of large groups upon which those 
virtual features depend, SQLite is MySQL and perhaps other DB compatible, in 
its data stores.  Thus, once PAN's transfer to it is complete, those wishing 
even MORE power will be able to integrate PAN into a larger database based 
framework.

The catch in all this is that the back end rewrite to merge SQLite into PAN 
will likely be the biggest and deepest core modification project PAN has ever 
undergone.  If you were around for the transfer to GTK2, and for the 
introduction of scoring, you may realize what this means, but on a larger 
scale.  PAN will likely be rather unstable and possibly lose some current 
functionality temporarily, during the transfer, and may not be fully stable 
and with full functionality for several "stable" point releases afterward.  
However, once it's done, entire new realms of possibilities will be opened, 
and PAN will likely enter a whole new feature and performance domain.

I do not, however, know when this job is likely to be undertaken, as there has 
been some discussion both here and on the devel list, but it remains AFAIK 
"in the future" discussion, rather than immediately pending.  It's possible 
PAN will be released in a 1.0 version before this major rewrite, and the 
rewrite would then be for 2.0.  However, my personal feeling, given that PAN 
stands for the longer name and ultimate goal of "Pimp-Ass Newsreader", is 
that this will be done before PAN 1.0.  

Still, if it were me, I'd likely advance the version to 0.50 indicating a 
definite milestone b4 the rewrite, and start the rewrite at 0.60 or 0.70 
perhaps (depending on how actively the "stable" code was intended to be 
maintained), allowing plenty of maintenance releases of the current code 
before the new code is considered mature enough for full featured stable 
deployment.  In hindsight (and IMO, personally), perhaps the same should have 
been done with the GTK2 port, starting it at 0.20, since Gnome/GTK1 was by 
then at 0.11.  It would have been perhaps a bit easier of a concept to grasp 
for newbies not deeply into Linux and the complexities of versioning yet, 
that 0.20 was designed around the newer GTK2 while 0.1x was the older 
GTK/Gnome 1, than it was to explain an immediate progression from 0.11 to 
0.12.  IMO, advancing the version number to 0.50 indicating a decently stable 
milestone series, and starting the new SQLite version at 0.60 or 0.70 would 
be equally indicative of intentions.  

WDYT, Charles, or am I blowing the job all out of proportion and it won't be 
that big a deal, with little new code and mostly deleted old backend code?  
Still, for an SQLite port, an advance to 0.20 (or 0.30, if we are already at 
0.2x by then) at least might be appropriate?

BTW, is that an immediately approaching job or still some time out?  My 
feeling from the list is it should be done /fairly/ soon, as there are 
several new features waiting behind it for implementation.

-- 
Duncan - List replies preferred.
"They that can give up essential liberty to obtain a little
temporary safety, deserve neither liberty nor safety."
Benjamin Franklin
[Prev in Thread]
Current Thread
[Next in Thread]
[Pan-devel] SQLite versioning and timing issues. Was: [Pan-users] 0.14.0 Slow on large group, Duncan <=
Prev by Date: [Pan-devel] Pan compilation from CVS fails
Previous by thread: [Pan-devel] Pan compilation from CVS fails
Index(es):
- Date
- Thread