Re: file archive

From: Helmut Messerer
Subject: Re: file archive
Date: Thu, 11 Jan 2007 08:47:25 -0800

thanks, that looks great :-)
much appreciated


On 1/11/07, James Youngman <address@hidden> wrote:
On 12/5/06, Helmut Messerer <address@hidden> wrote:
> I would need a file-archive tool, like a modified "locate" version,
> which would store for each file an MD5 checksum, which then could be
> searched in the database as well... this would enable us to find
> identical files easily.
> is that possible with findutils?


$ cat example.sh
#! /bin/sh

# make an example file tree
set -e
cd "$HOME"
mkdir -p tmp
cd tmp
cp -ar /usr/share/doc/gcc* .
set +e

find "$WORKDIR" -type f -exec md5sum {} \+ | /usr/lib/locate/frcode >

$ time sh  example.sh

real    0m0.815s
user    0m0.032s
sys     0m0.080s
$ locate -d ./md5sum.db a71b89a32c72accd00daf10cb5e41d56
a71b89a32c72accd00daf10cb5e41d56  /home/youngman/tmp/gcc-3.3-base/README.Bugs
a71b89a32c72accd00daf10cb5e41d56  /home/youngman/tmp/gcc-3.4-base/README.Bugs
a71b89a32c72accd00daf10cb5e41d56  /home/youngman/tmp/gcc-4.0-base/README.Bugs

$ locate -d ./md5sum.db . | awk '
 instances[$1] = instances[$1] $2;

 for (i in count) {
   if (count[i] > 1)
     printf("md5sum %20s is shared by %d files\n", i, count[i]);
md5sum 63b818f22d81e2a0a0c7f3875a431128 is shared by 2 files
md5sum cf2eccc0a1d4cf7596a23cde61b9b0e2 is shared by 2 files
md5sum 1f3c7181ad7c9def4d79824256e3765d is shared by 2 files
md5sum a71b89a32c72accd00daf10cb5e41d56 is shared by 3 files

