pan-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Pan-devel] the Article cache [was: Speed of Pan]


From: Jeffrey Stedfast
Subject: Re: [Pan-devel] the Article cache [was: Speed of Pan]
Date: 04 Nov 2002 14:10:44 -0500

On Mon, 2002-11-04 at 11:49, Charles Kerr wrote:
> On Sat, Nov 02, 2002 at 12:45:04AM -0500, Jeffrey Stedfast wrote:
> > Pardon me for my ignorance, but why does Pan have to check the cache at
> > all on startup?
> > 
> > Keep in mind, I'm not all that familiar with Pan cache internals nor
> > NNTP/News in general, but to me a cache should be implemented in such a
> > way as:
> > 
> > 1. reader makes a request for (an) article(s)
> > 2. if article exists in cache
> >   then read from the cache
> >   else download article and attempt to cache it (if fail, unlink() so
> > that the next request for the same article doesn't get an empty or
> > truncated file)
> > 
> > If things worked this way, there shouldn't be a need to scan the cache
> > on startup.
> 
> The header pane shows a cache icon next to cached articles, so during
> header pane refreshes there is a burst of article_is_cached() calls.
> IMO this is unavoidable because of the cache icon's importance to
> offline reading.

*nod*
I kinda figured it was something like that.

> 
> The directory walking at startup was chosen to Keep It Simple:
> after walking, Pan can keep a small map of cached article message-ids.
> is_article_cached() is just a lookup in that map, so even the
> header pane's burst is cheap.
> 
> If we remove the startup directory walking, Pan will have to either
> 
> (a) hit the disk each time an article is tested,
>     which makes every header update slow, or
> 
> (b) hit the disk the first time an article is tested,
>     which would make the first header update in each group slow,
>     and force us to remember misses as well as hits in the cache's map,
>     and introduce the question of whether or not to garbage collect
>     those miss entries at some point, and if so, when.
> 
> I'm leaning towards keeping a flat file with a list of the message-ids
> in the cache.  This would replace directory walking with a simple file read.
> The only downside I see is that people tinkering by hand could get the
> flat file and the directory contents out of sync.

Yea... I thought of that too. Let me give you some thoughts:

Keep your cache file (I suppose it would contain message-ids) but also
keep a mtime for the cache directory. If mtime != cache->mtime, then
re-scan the cache the way you do now. Otherwise you can safely assume
that the cache hasn't been touched by the user.

This makes it so that if the user doesn't touch his cache, it should be
extremely fast - otherwise, well... no slower than it is currently :-)

This is a similar technique to the one we use in Evolution which is why
loading extremely large mail folders is so fast.

Jeff

-- 
Jeffrey Stedfast <address@hidden>





reply via email to

[Prev in Thread] Current Thread [Next in Thread]