On 12/5/06, Helmut Messerer <address@hidden> wrote:
> I would need a file-archive tool, like a modified "locate" version,
> which would store for each file an MD5 checksum, which then could be
> searched in the database as well... this would enable us to find
> identical files easily.
>
> is that possible with findutils?
Sure:-
$ cat example.sh
#! /bin/sh
# make an example file tree
set -e
cd "$HOME"
mkdir -p tmp
cd tmp
WORKDIR=$(pwd)
cp -ar /usr/share/doc/gcc* .
set +e
find "$WORKDIR" -type f -exec md5sum {} \+ | /usr/lib/locate/frcode >
"$WORKDIR/md5sum.db"
$ time sh example.sh
real 0m0.815s
user 0m0.032s
sys 0m0.080s
$ locate -d ./md5sum.db a71b89a32c72accd00daf10cb5e41d56
a71b89a32c72accd00daf10cb5e41d56 /home/youngman/tmp/gcc-3.3-base/README.Bugs
a71b89a32c72accd00daf10cb5e41d56 /home/youngman/tmp/gcc-3.4-base/README.Bugs
a71b89a32c72accd00daf10cb5e41d56 /home/youngman/tmp/gcc-4.0-base/README.Bugs
$ locate -d ./md5sum.db . | awk '
{
instances[$1] = instances[$1] $2;
++count[$1];
}
END {
for (i in count) {
if (count[i] > 1)
printf("md5sum %20s is shared by %d files\n", i, count[i]);
}
};'
md5sum 63b818f22d81e2a0a0c7f3875a431128 is shared by 2 files
md5sum cf2eccc0a1d4cf7596a23cde61b9b0e2 is shared by 2 files
md5sum 1f3c7181ad7c9def4d79824256e3765d is shared by 2 files
md5sum a71b89a32c72accd00daf10cb5e41d56 is shared by 3 files