[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Pan-users] disappearing articles?
From: |
Duncan |
Subject: |
Re: [Pan-users] disappearing articles? |
Date: |
Sun, 25 May 2003 04:03:19 -0700 |
User-agent: |
KMail/1.5.1 |
On Fri 23 May 2003 18:24, Chris Petersen posted as excerpted below:
> I finally bought the bullet and subscribed to a pay-for usenet server,
> and thus have the bandwidth to play in some of the binary groups.
>
> In doing so, I've noticed that the article count gets weird sometimes -
> I'll download a bunch of headers (say, 50k), but when all is done, only
> 41k will show up. I've tried my best to turn off all filters/rules, so
> nothing should be getting excluded - but I fear that it is. Is this
> just Pan correcting for some server stuff, it is it filtering stuff when
> I tell it not to (since I've had plenty of times where I tell it to
> filter stuff out and it doesn't)?
>
> oh, using .14 in linux.
It's possible you've hit on a bug, but I wouldn't jump to that conclusion just
yet, as there are times when that would be the expected behavior.
When getting the initial article count estimates, most readers including PAN
simply use the article sequence numbers for that group on that server, to
come up with the estimate. If the previous highest number you'd processed
was 18277397, and the new high number is 18278399, then it will report 1002
new articles in the group. Some servers, for whatever reason, create message
numbers out of order. Or, more precisely, they end up on the reading server
out of order, with gaps in the numbering.
One reason this might occur is that articles might be centrally numbered, then
distributed to several servers. The reason for this would be to keep all
those servers in numerical sync, so one could switch between them without
messing up the read article tracking and etc, as those numbers (rather than
the more accurate message-ID, which is supposed to be a GUID-->globally
unique identifier, while the server group message sequence numbers are only
unique within the group and on that server, normally) are what most readers
track d/led and read articles by. My ISP, Cox, does this, having a central
feed processing location that does the numbering, and three servers, east,
central, and west, that are numerically synced. Guess what happens when one
gets behind? Right, it gets some of the articles but not others, then when
the problem is corrected, the missing ones show up. That creates gaps in the
sequencing numbers, making those estimates inaccurate.
Another reason the estimate may be wrong is due to filtering and cancels, the
latter if the server processes cancels, of course. The numbering could be
done b4 or after such filtering, but since cancels typically arrive somewhat
later, that would always create holes in the numbering, again, if your server
processes cancels, of course.
There are probably other reasons numbering may be off, as well, but this
should be enough to demonstrate why such initial numbers are only educated
guestimates, anyway, and shouldn't be taken as more than that.
--
Duncan - List replies preferred.
"They that can give up essential liberty to obtain a little
temporary safety, deserve neither liberty nor safety." --
Benjamin Franklin