|
Home > Archive > Data Storage > October 2005 > Duplicate file searching
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
Duplicate file searching
|
|
|
| Can anyone recommend software for searching for true duplicate files
over the network ... were running everything with ms w2k and the users
are copying sets of files all over the place, much of them duplicates
but doing it by hand would be an impossible task.
| |
| lenneis@wu-wien.ac.at 2005-10-02, 5:47 pm |
|
m0rk :
> Can anyone recommend software for searching for true duplicate files
> over the network ... were running everything with ms w2k and the users
> are copying sets of files all over the place, much of them duplicates
> but doing it by hand would be an impossible task.
Generate a list of the files to be checked and run a checksum over
them, like sha1 or md5. Sort on the checksum and duplicates should be
listed in adjacent positions. A shortish PERL script could be used as
well.
--
Joerg Lenneis
email: lenneis@wu-wien.ac.at
| |
|
| #! /usr/bin/perl -w
# finddups.pl
# Lists duplicates in MD5 sums
# Use with find something -type f -print0 | xargs -i -0 md5sum "{}"
use strict;
$|=1;
my %h;
while(<> )
{ chomp $_;
# print STDERR substr($_,0,70),qq( \r);
my @a=split / /,$_,2;
push @{$h{$a[0]}},$a[1] if @a==2;
};
foreach(keys %h)
{ print join qq(\n),'',@{$h{$_}},'' if @{$h{$_}}>1;
}
|
|
|
|
|