Data Storage - Duplicate file searching

This is Interesting: Free IT Magazines  
Home > Archive > Data Storage > October 2005 > Duplicate file searching





You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

Author Duplicate file searching
m0rk

2005-09-17, 5:56 pm

Can anyone recommend software for searching for true duplicate files
over the network ... were running everything with ms w2k and the users
are copying sets of files all over the place, much of them duplicates
but doing it by hand would be an impossible task.
lenneis@wu-wien.ac.at

2005-10-02, 5:47 pm


m0rk :

> Can anyone recommend software for searching for true duplicate files
> over the network ... were running everything with ms w2k and the users
> are copying sets of files all over the place, much of them duplicates
> but doing it by hand would be an impossible task.


Generate a list of the files to be checked and run a checksum over
them, like sha1 or md5. Sort on the checksum and duplicates should be
listed in adjacent positions. A shortish PERL script could be used as
well.

--

Joerg Lenneis

email: lenneis@wu-wien.ac.at
RPR

2005-10-08, 5:49 pm

#! /usr/bin/perl -w
# finddups.pl
# Lists duplicates in MD5 sums
# Use with find something -type f -print0 | xargs -i -0 md5sum "{}"
use strict;
$|=1;
my %h;
while(<> )
{ chomp $_;
# print STDERR substr($_,0,70),qq( \r);
my @a=split / /,$_,2;
push @{$h{$a[0]}},$a[1] if @a==2;
};
foreach(keys %h)
{ print join qq(\n),'',@{$h{$_}},'' if @{$h{$_}}>1;
}

Sponsored Links






Free braindumps | Software forum | Database administration forum

Copyright 2003 - 2008 webservertalk.com