pornfind-user
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Pornfind-user] [ANN] PornFind 0.3


From: Cedric Foll
Subject: [Pornfind-user] [ANN] PornFind 0.3
Date: 01 Aug 2003 12:14:51 +0200

Hi,

I'm glad to inform you that the new version of PornFind is out !

See http://www.nongnu.org/pornfind/.

This version include a lot of improvement.
The version 0.2 was doing first a key-word search and on page which have
match, the program run a bayesian test.
Mark Kool has pointed that this technique miss a lot of no-french
no-english pages. For example east european and asian page were skiped
by this algorithm.
So I've test to suppress this first step and does a bayesian search on
all pages.
The problem I get was an incredible amount of false positives. So I
improved the pre-processing process (mybogo.rb) such that token are not
only single words but single word and couple of words.
For a page like "A B C" the preprocessing return now "A B C A_B B_C".
This is very important when, for exemple, a page deals with "cancer
breast".

That improve drasticly results.
I've also modified some parameters in bogofilter.cf.

You should get more false-positive/false-negative than with v0.2,
repport me all one (in a txt file, an url of the form "http://..."; on
each line).

The file worldlist.db is no longer in the archive, you will have to
build your own .db with bogoutil and the file worldlist.txt.
I did this for people who are using tdb isntead of db.
This operation could very long (several minuts) on slow computers and
does huge amount of disk access on Berkley DB (less more with TDB).

I'm in holliday this evening until the 12 of august (without internet
access :-().

Regards





reply via email to

[Prev in Thread] Current Thread [Next in Thread]