pan-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Pan-devel] ancient DB schema


From: K. Haley
Subject: Re: [Pan-devel] ancient DB schema
Date: Wed, 09 Jun 2004 01:21:12 -0600
User-agent: Mozilla Thunderbird 0.6 (Windows/20040502)

Calin A. Culianu wrote:

For the binaries newsgroups, I have been able to really increase performance and minimize disk space usage by doing the following:

If it's a multipart binary (as determined using the heuristics already in Pan, namely it ends in [xx/yy] or (xx/yy) and it is over 400 lines), then we can assume that all the subjects are the same, but they differ only in the xx/yy part. So why not truncate that part, then put all the subjects in a separate table, and save only the 'subject id' and part and parts in the Articles table?

In fact, in a typical 1 million+ header group, there are usually only like
1000-2000 unique subjects.  So you save a LOT of space by doing this. This
saves a lot of disk space, and makes queries and sorting of the articles
table much faster since less overall disk space needs to be scanned per
query.
I can't think of any problems with the idea right now. Since we'll have the part and parts info we can rebuild it anyway. Have you thought about doing something similar for the Author? Granted the space savings won't be as large but it might be worth it.

Anyway, the stuff I am working on now as far as DB changes aren't as comprehensive as what you propose here. As an initial first-pass, I am _only_ changing the bits of pan that deal with article headers, and putting only that stuff in the DB, as that's where we have really big problems with memory consumption and that's where we benefit most from using a DB. This is the lazy man's approach.. I don't want to change pan too much.. I only want to tweak it to scale better..

I leave it to you guys to decide how to totally metamorphosize Pan into using a full-fledged DB backend and creating 'virtual groups' or whatever it was you were discussing..
Well, since we're talking about changing one part to a db we might as figure out how to do all of it. I'll fold your suggestions into my schema along with some other additions based on the code. Of course it's quite posible that none of this will be put into pan. For all we know the devs might be well on their way to implementing this. Still, it is a usefull learning experience.

Attachment: signature.asc
Description: OpenPGP digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]