[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Pan-devel] Scoring articles by ration of bytes/lines
From: |
Konrad Karl |
Subject: |
Re: [Pan-devel] Scoring articles by ration of bytes/lines |
Date: |
Thu, 1 Feb 2007 11:15:08 +0100 |
User-agent: |
Mutt/1.4.2.2i |
Hi Charles
pls see below
On Wed, Jan 31, 2007 at 07:17:33PM -0600, Charles Kerr wrote:
> Konrad Karl wrote:
> >Hi,
> >
> >I want to be able to score/filter/delete articles where
> >the ration of article_bytes / article_lines is below a certain
> >value.
> >
> >Many sporged postings could be easily identified. With
> >the old pan 1.x I have been using a simple perl filter program
> >in oder to delete articles with a too low ratio and this simple
> >approach worked surprisingly well - the algortithm might require some
> >tweaking, e.g if number of lines < 10 then dont apply the
> >ratio rule etc. etc.
> >
> >Now I have started looking into the latest sources but I am
> >afraid it will take considerable time until I will understand
> >whats going on.
> >
> >What do you think?
> >
> >Greetings,
> >Konrad
>
> Hi Konrad,
>
> This can be done in 0.120 by adding a scoring rule to ignore
> all articles with a line count less than 10.
> See Article > Add a Scoring Rule
Yes, I know. But I want a more complec rule expressed in
pseudocode:
if (article_lines > some_threshold) {
ratio = article_bytes / article_lines;
if (ratio < ratio_threshold)
apply_some_rule;
}
"apply_some_rule" could perhaps mean: display only these articles - then they
could
be deleted manually.
There are sporged articles which have a linecount in the
range of several hundreds to thousands and a ridiculous low byte
count and I had much success by selecting ratio_threshold at about
10. (every line has to have at least 10 bytes at average, else
it is very likely some sporge)
Greetings,
Konrad
>
> cheers,
> Charles
>
>
> _______________________________________________
> Pan-devel mailing list
> address@hidden
> http://lists.nongnu.org/mailman/listinfo/pan-devel
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- Re: [Pan-devel] Scoring articles by ration of bytes/lines,
Konrad Karl <=