pan-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Pan-users] Re: suggestions for large groups


From: Duncan
Subject: [Pan-users] Re: suggestions for large groups
Date: Sun, 04 Jan 2004 11:33:22 -0700
User-agent: Pan/0.14.2.90 (A Bouquet of Corpses)

Mike Erickson posted <address@hidden>, excerpted
below,  on Sat, 03 Jan 2004 22:16:30 -0800:

> Do you remember what this tweak was?

It was a multitude of little tweaks, IIRC.  I don't believe the specifics
were ever laid out in the group (list, but I see it as a group, since I
use the gmane list2news gateway), but check the announcements/changelog
for a comment to the effect that PAN should work better in large groups,
now.  If you are interested in the details beyond that, "use the source,
Luke" <g>, comparing the source for the versions on either side of the
announcement.

I'm not heavy enough into programming to grok all the details, but
performance has been an issue since at least the switch to GTK2 (from
Gnome 1) widgets.  

>From the discussion at the time, the GTK2 widget that would have normally
been used for the overview (aka header) pane simply wouldn't scale to the
necessary number of entries and maintain ANY performance at all.  It was
designed for file systems with perhaps a few thousand up to a few tens of
thousands of individual entries, and didn't scale AT ALL well at the time
to hundreds of thousands or even millions of entries, to track the number
of headers in a typical large newsgroup on a server with good retention.

Thus, Charles and Chris were forced into using a workaround, with a widget
not quite designed for the interface use they put it to, because the one
that WAS designed for it simply couldn't scale.  Among other things,
that's why keyboard multi-header select doesn't work as well as it might
-- With the widget they used, they could either get mouse multi-header
select working right, or keyboard, but not both, because the patch to make
the mouse work killed the keyboard workability for that function.

As GTK2 has matured, so have the widgets, and (again from discussion on
the list) some of PAN's interface has been converted to use the natural
widget for the task.  This includes the group pane, and I believe the
filter edit view displaying the filter conditions (I think it was), and
maybe a couple other locations (status log??).  However, it still doesn't
scale well enough to be used in the overview pane, so they still use the
"wrong" one, because it does.

I'm going to go out on a limb a bit here and make an educated guess that
at least some of the tweaks they made that allowed PAN to scale from a
couple hundred thousand overviews to a couple million  probably had a lot
to do with reworking PAN's interface code to this widget, which stores
each overview in a data structure corresponding to a line in the pane. 
At the original switch to GTK2, I'm guessing they thought they'd
eventually switch away from the "work-around" widget, and so did some
"work-around" code to make it work that they expected would be temporary. 
It has become quite obvious by now that the widget they'd LIKE to use
isn't going to scale any time soon to the required level, so when the
issue of performance came up in the groups again, Charles decided it was
time to go back and recode Pan's interface to that widget right, making it
more efficient in the process.  As I said, tho, I believe it was a number
of tweaks, and there may well have been places NOT in that Charles and
Chris were able to improve as well.  (I do believe a some of the changes
actually were submitted patches, as well, but I'm not sure.)

That brings us to..

> * if this isn't the case, what are the benefits of moving to an SQL
> backend

The benefits include probable better scaling on large groups, yes. 
However, there's more to it than that.

First, picking up where I left off, above.  PAN currently uses the data
interface of the widget in question to some large degree, I believe, not
only for display, but for tracking and sorting as well.  The problem with
this is that widgets are normally primarily designed for display, and
while many of them have additional data management functionality, that is
more a convenience than a primary purpose.  Thus, while they work fine for
that in small scale applications, and even up to medium and large scale,
pushing it into "superscalular" territory, as with millions of entries,
just isn't as efficient as sorting in a database DESIGNED to manage data,
and then shoving that pre-sorted data into a display widget for display
only.  Therefore, yes, the switch to sqlite libs should in theory scale
large group sorting and management a bit, tho perhaps not as much at least
at first as expected, given that PAN has taken the current widget solution
far beyond what it would have originally been expected to handle, and done
so IMO very well, I might add.  Thus, the first few versions of the sqlite
library interface may actually not improve much if at all, here, as it'll
likely take awhile to reoptimize the code to the level it is for the
widget interface now.

However, as I mentioned, that's just the tip of the iceberg.  There's
significantly more to it than that.  Once we have a a true data management
interface to work with, there are all sorts of additional fancy things
that can be done.  Among them...

Virtual servers and virtual groups.  Virtual servers would give PAN the
ability, similar to BNR2, to allow the user to set up multiple physical
servers on a priority basis,then queue up downloads once, not caring which
physical server they are actually on, as PAN would fetch them from
whatever server it happened to find them on, based on the server priority
ranking.  Depending on implementation, they might then simply show up as
d/led on all the servers, or PAN may implement a "virtual server" view,
where again, the user wouldn't care what server they actually came from.

Virtual groups would do the same for groups.  A virtual group could
combine the listings of several related physical groups, say all the
variations on spelling, and the binaries.sound.mp3 vs. the binaries.mp3,
or whatever (that's from memory), groups, such that the content of all of
them appeared in one big list.  Combine that with virtual servers, and
once the groups were subscribed, the user wouldn't have to care what group
OR what server the content was on, it would just all appear together in a
single "group" that would actually be a single listing of a category of
groups.

Taking this one step further, there could be a single list of groups to
subscribe to, regardless of server, tho it might list how many
configured servers the group appears on.

For text users, doing searches would become much easier, as again,
transcending group or server boundaries wouldn't be an issue.  Of course,
text users would probably still want to keep most of their groups
straight, avoiding confusion when posting replies, etc.

A true data management interface would lend itself to all sorts of
statistical tracking functions.  Want to know how many posts user X has in
the various groups you follow, what percentage of them occurred in the
last week, and how many of those were to a specific group?  No problem! 
Keeping statistics on spammers and measuring their BI wouldn't be an
issue, either.

Some of these functions may not be directly built into PAN, but because
sqlite is MySQL compatible, those that want more database functionality
will be able to manage the same database with MySQL.

Along the same lines, it'd be possible to actually save binaries, while at
the same time tracking the info from the post they came with.  Ever saved
a binary and then wished you still had some of the info from the post
title or text content?  No problem.  Tracking that would be doable with a
properly set up database.

I say "along the same lines" because b4 I switched off of MSWormOS, I had
a newsreader called Tifny.  It stored all this stuff in an MDB format
database.  I'm not sure PAN will ever include all this functionality, but
with Tifny, one could view movies or pix, or listen to music, as
downloaded, directly from the application itself.  There was even an
interface that allowed one to add media from other sources to the Tifny
management db, and one could create play lists and all sorts of stuff, all
with the same app that d/led the stuff from the groups!

However, that wasn't the biggest part of it.  Because it actually saved
the binaries in the file system as ordinary binaries, one could access the
SAME files using any OTHER player, WITHOUT FURTHER CONVERSION.  Because it
used the (unfortunately somewhat MS proprietary) MDB format for its
database, any other database app could make use of the same info.  I
didn't have Office, but one could theoretically have set up the Access
Database in such a way that using ActiveX/OLE/etc, it could have
functioned not only as a database, but again, as a media player as well,
for the same data, in the same format, without conversion.

Because sqlite shares the same format as MySQL, the same sort of
possibilities are opened, altho I understand there is as yet no real good
GUI front-end to any of the free source databases, thus, it'd be more work
to integrate all the media player functionality and all that.  Still, the
options are there, virtually limitless, once you have the database library
back-end in place, and provided you've chosen a solid, workable, and
expandable, data element and ordering format, and not limited yourself by
assuming it will NOT expand.

...
(Yes, in case anyone was wondering, I DO routinely compose such "books"
worth of post, in the groups I'm a regular in. =:^P)

-- 
Duncan - List replies preferred.   No HTML msgs.
"They that can give up essential liberty to obtain a little
temporary safety, deserve neither liberty nor safety." --
Benjamin Franklin






reply via email to

[Prev in Thread] Current Thread [Next in Thread]