[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Sort and delete duplcate messages
From: |
Conrad Hughes |
Subject: |
Re: Sort and delete duplcate messages |
Date: |
Mon, 04 May 2020 15:14:24 +0100 |
> I know that 'sortm -textfield Subject' will sort messages accoring to
> the subject field. Having run that command, is there a way to then
> delete the first duplicate of each message in the list such that if 1
> and 2 are duplicates and 6 and 7 are duplicates you would delete messages
> 2 and 7 leaving 1 and 6?
The attached might be a useful starting point: run it with a directory
name (e.g. ~/Mail) and it'll find everything that looks like an MH mail
file (i.e. its name is a number) and delete any messages with an
already-seen Message-ID — i.e. the second and subsequent copies of any
emails.
I'd run it on a duplicate of your Mail folder and see what the diffs
look like: biggest issue is that I'm not sure what order it'll do things
in so you may find your preferred copy doesn't get kept — but its output
will be a good starting point for any better version.
Conrad
#!/usr/bin/perl
use strict;
use warnings;
use Email::Simple;
use File::Find;
die "Syntax: $0 <dir> [..]" unless @ARGV >= 1;
my %ids;
find(sub {
my $file = $_;
return unless $file =~ /^\d+$/ && -f $file;
open EMAIL, "<$file" or die "couldn't read \"$File::Find::name\"!";
my $email = do { local $/; <EMAIL> };
close EMAIL;
my $msg = Email::Simple->new($email);
if(my $id = $msg->header("Message-id")) {
if($ids{$id}) {
unlink $_;
print "Seen $File::Find::name ($ids{$id})\n";
} else {
$ids{$id} = $File::Find::name;
}
} else {
warn "No ID in \"$File::Find::name\"!";
}
}, @ARGV);