[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Sort and delete duplcate messages

From: Conrad Hughes
Subject: Re: Sort and delete duplcate messages
Date: Mon, 04 May 2020 15:14:24 +0100

> I know that 'sortm -textfield Subject' will sort messages accoring to
> the subject field. Having run that command, is there a way to then
> delete the first duplicate of each message in the list such that if 1
> and 2 are duplicates and 6 and 7 are duplicates you would delete messages
> 2 and 7 leaving 1 and 6?

The attached might be a useful starting point: run it with a directory
name (e.g. ~/Mail) and it'll find everything that looks like an MH mail
file (i.e. its name is a number) and delete any messages with an
already-seen Message-ID — i.e. the second and subsequent copies of any

I'd run it on a duplicate of your Mail folder and see what the diffs
look like: biggest issue is that I'm not sure what order it'll do things
in so you may find your preferred copy doesn't get kept — but its output
will be a good starting point for any better version.


use strict;
use warnings;

use Email::Simple;
use File::Find;

die "Syntax: $0 <dir> [..]" unless @ARGV >= 1;

my %ids;
find(sub {
  my $file = $_;
  return unless $file =~ /^\d+$/ && -f $file;
  open EMAIL, "<$file" or die "couldn't read \"$File::Find::name\"!";
  my $email = do { local $/; <EMAIL> };
  close EMAIL;
  my $msg = Email::Simple->new($email);
  if(my $id = $msg->header("Message-id")) {
    if($ids{$id}) {
      unlink $_;
      print "Seen $File::Find::name ($ids{$id})\n";
    } else {
      $ids{$id} = $File::Find::name;
  } else {
    warn "No ID in \"$File::Find::name\"!";
}, @ARGV);

reply via email to

[Prev in Thread] Current Thread [Next in Thread]