nmh-workers
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Nmh-workers] Re: enhancement to mhfinddup


From: Valdis . Kletnieks
Subject: [Nmh-workers] Re: enhancement to mhfinddup
Date: Tue, 09 Sep 2008 12:59:50 -0400

On Tue, 09 Sep 2008 11:41:43 EDT, address@hidden said:

> Below you can find the diff (suitable for feeding to patch(1)) against 
> mhfinddup 1.2.

Possible enhancement for 1.3 - I'd code and test but am swimming in other work
today...

>                 $msgs{$msgid} =~ m|^\+(.*)/(\d+)$|;
>                 my($f, $m) = ($1, $2);
>                 if ($folder eq $f || $no_same_folder) {
...
> !                       my $sum1=md5_hex(@msgbody);

At this point, you could consider doing something like:

my %cached;

                if (exists $cached{"$folderpath/$m"}) {
                        $sum1=$cached{$msgid};
                } else {
                        $sum1=md5_hex(@msgbody);
                        $cached{"$folderpath/$m"}=$sum1;
                }

and similarly for $sum2. Probably should move all the open/read/close
inside the second part of the 'if' too...

Otherwise, if messages 100, 101, 102, 103, and 104
are in fact duplicates, you compute the md5sums for

100, 101, 100, 102, 100, 103, 100, 104, 101, 102, 101, 103,

And so on.  That way you only do N md5sums, not (N+1)*N/2  which is a lot
different for N=4,000.. ;)

Attachment: pgpuXwY3FmuWr.pgp
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]