Hi,
I want to be able to score/filter/delete articles where
the ration of article_bytes / article_lines is below a certain
value.
Many sporged postings could be easily identified. With
the old pan 1.x I have been using a simple perl filter program
in oder to delete articles with a too low ratio and this simple
approach worked surprisingly well - the algortithm might require some
tweaking, e.g if number of lines < 10 then dont apply the
ratio rule etc. etc.
Now I have started looking into the latest sources but I am
afraid it will take considerable time until I will understand
whats going on.
What do you think?
Greetings,
Konrad