[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Pan-users] RFC: Detecting multiparts (was: .94 weirdness with detec
From: |
Chris Petersen |
Subject: |
Re: [Pan-users] RFC: Detecting multiparts (was: .94 weirdness with detecting attachments) |
Date: |
08 Aug 2003 12:04:33 -0700 |
> * likely_binary_group is true if the newsgroup name contains
> any of: "binaries", "fan", "mag", "sex", false otherwise
don't forget the plethora of misspelled ones, too.. binaires, etc.
> * likely_binary_subject is true if the Subject: header contains
> any of: "jpeg" "jpg" "gif" "tiff" "png", false otherwise
avi, ogm, mpe?g, mp[23], etc, etc... how about perlre: \w\.\w{2,4}
> * part = 0, or if either "(x/y)" or "[x/y]" is in Subject:, then x.
> (Work backwards from the end of the string, in case someone's
> posting a set of multiparts and (x/y) appears in the Subject:
> twice)
also: x of y
> 4. if is_binary is true,
> and is_reply is true,
> and the part is 0 or 1,
> then it's probably a follow-up to a multipart (I've never seen a
> followup to a part > 1).
> set is_binary to false.
> UNLESS: once in a blue moon people will post binaries as follow-ups, so
> hedge our bets:
> leave is_binary as true if lines > 500.
this is problematic, since people post binaries as replies to REQ
messages (which would probably end up counting as part=0) all the time.
number of lines is also problematic. I often run across binary articles
that show as having 0 or 2 or 10 or some-other-small-number of lines,
but have 100k+ of data in them.
Other than that (not knowing how pan does it now), it all looks GREAT.
- [Pan-users] RFC: Detecting multiparts (was: .94 weirdness with detecting attachments), Charles Kerr, 2003/08/08
- Re: [Pan-users] RFC: Detecting multiparts (was: .94 weirdness with detecting attachments),
Chris Petersen <=
- Re: [Pan-users] RFC: Detecting multiparts (was: .94 weirdness with detecting attachments), Douglas Bollinger, 2003/08/09
- [Pan-users] Re: RFC: Detecting multiparts, Torstein Sunde, 2003/08/09
- Re: [Pan-users] RFC: Detecting multiparts (was: .94 weirdness with detecting attachments), J.B. Moreno, 2003/08/10
- [Pan-users] Re: RFC: Detecting multiparts (was: .94 weirdness with detecting attachments), Lenroc, 2003/08/11