pan-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Pan-users] Re: Re: PAN project's status


From: Duncan
Subject: [Pan-users] Re: Re: PAN project's status
Date: Thu, 19 Aug 2004 06:47:54 -0700
User-agent: Pan/0.14.2.91 (As She Crawled Across the Table)

Carl Wilhelm Soderstrom posted <address@hidden>,
excerpted below,  on Wed, 18 Aug 2004 13:53:52 -0500:

> On 08/18 07:57 , Tov Are Jacobsen wrote:
>> Why did you decide to integrate an SQL interpreter with Pan?
> 
> because it's a nifty and useful way to store information about messages
> and lists. Consider that you could have other applications retrieving
> Usenet information, that Pan put into the database.
> 
> I can't think of any specific examples of why you'd want to do that; but
> I'm sure someone can. Information portability is *always* a marvelous tool
> for making really powerful combinations of applications. just look at the
> basic Unix tools as an example. :)

There are quite a number of reasons to do it.  Perhaps the most direct and
soonest to appear, once integrated, is scalability and speed.  Currently,
PAN stuffs each posts data in an entry in the gtk widget used to display
the overviews (incorrectly aka "headers", incorrectly because it's only a
/subset/ of the REAL headers).  Get in a binary group with a million
headers, and that simply doesn't scale.  Those widgets were designed
primarily as graphical widgets, with a bit of data management as a
convenient bonus.  They were NOT designed to handle several hundred
thousand entries at a time, and PAN's memory footprint and speed when
it forces the widget to do so demonstrates that.  (Giving credit where
credit is due, this FAR better than it was.  PAN USED to conk out at
100-200K overviews, but now, depending on your machine specs, can handle
1-2M overviews, a factor of 10 improvement, but it's hard to advance from
there.)

Give PAN a decent database backend, and memory conservation techniques
like using a 32-bit (thus taking only four bytes) hash for author, and
only storing the full author data once per author, and doing the same with
the identical portions of subject lines, in the tracking memory, then only
displaying the names X times in the overview pane, and only loading that
overview pane widget with say three screens worth (the currently
displayed plus one before and one after to make scrolling faster), perhaps
a couple hundred entries, at a time, would save VAST amounts of memory. 
Using further database efficient memory vs. file access techniques, making
loading the data from disk much faster, rather than attempting to keep the
data for an entire group in memory at once, PAN could only load say 31
pages (displayed, plus ten pages back, 20 forward, perhaps 2000 overviews)
at a time, and fetch additional data from disk as needed.   Techniques
like this could allow PAN to scale up to 10M and more overviews per group
while actually reducing memory footprint below what it uses for a 100K
overview group now.  That would make PAN FAR faster to load when it's
managing such scales as well, and actually make it useful far beyond where
it's simply impractical, now.

However, that's just the tip of the iceberg.  Some of the fancy features
that PAN has always targeted but that really haven't been feasible until
the database backend, include fully automated multi-server management, in
the form of virtual groups and virtual servers.  Many folks serious about
binary downloading have several servers at their disposal, their ISP's,
plus a "bulk" NSP (news service provider), to cover the majority of the
posts that many crappy ISP services miss, plus a "backup" NSP, to to cover
the posts not found on the other servers.  Each of these may have one or
more servers available, and there is a priority need as well, because the
ISP's servers would likely be free access so get what one can there, then
go to the low cost but not extremely reliable NSP, then only when those
fail fall back to the backup ISP where one pays thru the nose for every
single download.

The competition in this area is BNR2, with BNR3 soon to be released.  It's
proprietaryware based on the Borland Delphi (Windows side) and
Kylex (Linux) developer suite, available for Windows and Linux.  It allows
one to manage multiple servers at multiple priorities.  One can set it up
so a group existing on multiple servers is viewed only once, showing the
posts available on all available servers (where PAN by contrast displays
only a single server at a time).  From this display, one only need select
what one wants to download, and BNR2 will know which servers each part of
each post is available on and fetch them automatically, base on the
priority you've set, so that it only grabs fills from that expensive
server when it absolutely has to, or keeping even a multi-megabit internet
pipe full by downloading from as many servers at once as necessary.

Another feature is virtual groups.  I've not used BNR myself (I don't
want to use proprietaryware and PAN is good enough to keep me from having
to), but from what I've read from other users, this feature essentially
allows one to combine multiple groups (on those multiple servers) into a
single view.  Thus, the multiple groups dedicated to individual music
genres, or to movie interests, or porn kinks, can be combined into one
view, again allowing the user to select what he wants for download and let
the application worry about not only what server it's coming from, but
what GROUP on the server it's coming from, as well.

This sort of thing simply cannot be handled efficiently without a good
database backend of SOME sort, and in the open source world, it simply
makes sense to use a library developed for the purpose by folks who do
database development for a living, rather than trying to write everything
from scratch.

However, that's STILL not the whole picture.  Besides the directly PAN
related functionality a database library would enable, once that library
is in place, there's all sorts of OTHER possibilities one could imagine. 
Consider, for instance, an media library manager with a plugin that
integrates with the PAN database, so all those videos and all that music
you've downloaded not only shows up in your media library manager, but
it's automatically catalogued and all the ID3 tags on the MP3s (for
example) are automatically filled in with the data from the original post,
artist, year, album, etc.  Not only that, but click a couple buttons, and
up pops all the information posted with the mp3, including the poster, the
group it was posted to and the date, any text file readme that accompanied
the posting, etc.

Obviously, such integration doesn't yet exist, because the news app with
the database backend to plug into doesn't yet exist.  However, shortly b4
I left MSWormOS, I played around with an app there that did some of this,
only combining it all in a single app so it didn't do any single function
as well as dedicated apps taking advantage of a common database format and
API would have.  However, the possibilities are there, if one wishes to
take advantage of them.

The first step, tho, is getting that database functionality into PAN.

As for why SQLite was chosen, it just happens to be a good open
source library implementation of the needed database functionality.  In
fact, it's possible something else could be used instead, only I've seen
no proposals  for it, and everyone working with it so far has been using
SQLite, so that's what seems to be the PAN destiny.  One significant
advantage, however, particularly where it comes to the possible third
party application functionality mentioned above, is that SQLite, while
being a fairly light but functional library implementation on its own, is
/also/ compatible with MySQL, if one wants to use the database for other
things.  Thus, the solution scales up as far as one is likely to need, and
if PAN ever implements it to the extent targeted, it could fast become
/the/ binary news server solution on *ix (and even MSWormOS), and could be
seen as inviting the sorts of third party extensions and compatibility
plugins I mentioned.  If it ever gets that far, then PAN will INDEED be
living up to its name as a "Pimp-Ass Newsreader", and could easily become
the standard in its area that Mozilla and OOo are in their own.

Of course, that's obviously some serious way down the road.  At this
point, with mainline development stalled, one has to wonder if it'll ever
happen.  However, as I mentioned previously, this /could/ end up being the
best thing that ever happened to PAN, if it means integration of the
database when development resumes, and that integration leads to the sorts
of things mentioned above.

So.. take your pick.  Pessimistic viewpoint, PAN is dead, the only
question is how long it will take folks to realize it and stop using it,
moving on to other things.  Optimistic viewpoint, this is only a pause in
development, one that had to happen at some point, if PAN is to fulfill
its ultimate destiny, given the name.  Realistic view, we really don't
know whether development will resume or not, but the pause HAS opened up
the database functionality possibility, and with the following PAN has,
it's quite possible that even if Charles decides he can no longer maintain
PAN, another maintainer may take the lead, taking PAN to places only
imagined, before.  After all, taking a look at the credits, Charles wasn't
the original author, so he too took it over from a previous maintainer and
author.  Thus, it wouldn't be the FIRST time it has happened in PAN's
history, and just as PAN grew under Charles, so it may continue to grow
under a new maintainer.  That said, last that Charles said, he still has
every intention of getting back to PAN, but just had to take some time off
to attend to real life.  Time will tell, I guess.

-- 
Duncan - List replies preferred.   No HTML msgs.
"They that can give up essential liberty to obtain a little
temporary safety, deserve neither liberty nor safety." --
Benjamin Franklin






reply via email to

[Prev in Thread] Current Thread [Next in Thread]