Create MD5 of files in directories and subdirectories

nicogroen · Apr 16, 2004

Can somebody help me out with the following problem. I tried to use
the following script of Ron Savage to create MD5 checksums of files in
a directory and all subdirectories in it, posted here:

http://groups.google.nl/[email protected]&rnum=2

On OpenBSD:
It takes a long time to create MD5 checksums of large files (about 4
seconds of a file of 3MB, 12 sec of a file of 5.5MB, 43 sec of a file
of 10,5MB).

On Windows:
Files having the same filesize (all 14,5MB) creates the same MD5
checksum. This process goes very fast (perhaps too fast).

On Redhat and FreeBSD:
The script retuns the flowing error message:

can't open (#path#
): No such file or directory at md5.pl line 39.

The script should work on all operation systems.

Thanks in advance,
Nico

James Willmore · Apr 16, 2004

Can somebody help me out with the following problem. I tried to use the
following script of Ron Savage to create MD5 checksums of files in a
directory and all subdirectories in it, posted here:

http://groups.google.nl/[email protected]&rnum=2

On OpenBSD:
It takes a long time to create MD5 checksums of large files (about 4
seconds of a file of 3MB, 12 sec of a file of 5.5MB, 43 sec of a file of
10,5MB).

On Windows:
Files having the same filesize (all 14,5MB) creates the same MD5
checksum. This process goes very fast (perhaps too fast).

On Redhat and FreeBSD:
The script retuns the flowing error message:

can't open (#path#
): No such file or directory at md5.pl line 39.

The script should work on all operation systems.

Yes, it should (and appears it has) work(ed) on almost all platforms -
because you noticed a difference in the execution times

Posting your code would be helpful

--
Jim

Copyright notice: all code written by the author in this post is
released under the GPL. http://www.gnu.org/licenses/gpl.txt
for more information.

a fortune quote ...
I'll defend to the death your right to say that, but I never
said I'd listen to it! -- Tom Galloway with apologies to
Voltaire

James Willmore · Apr 16, 2004

[ ... ]

Posting your code would be helpful

My bad, you did post code

I ran it on ye olde Linux box and it worked up until it ran into a
directory that I had no permission to access ... bummer :-(

You're execution time will depend greatly on the OS and the filesystem
being accessed. That's not the script (in most cases).

IMHO, you might be able to speed up the script by using File::Find instead
of using Cwd.

Another option is to use this script as a filter and use a command native
to the OS to feed the script files. Meaning, use `find` (in *nix) and pipe the
output to the script you're working with. Now your only concern is check
the MD5 digest of each file the script is being feed

An added plus
to this idea is ... you can check one -or- many files with your script
without the script having to figure out *how* to find the files (using Cwd
or Find::Files).

HTH

--
Jim

Copyright notice: all code written by the author in this post is
released under the GPL. http://www.gnu.org/licenses/gpl.txt
for more information.

a fortune quote ...
It is very difficult to prophesy, especially when it pertains to
the future.

James Willmore · Apr 16, 2004

Can somebody help me out with the following problem. I tried to use
the following script of Ron Savage to create MD5 checksums of files in
a directory and all subdirectories in it, posted here:

http://groups.google.nl/[email protected]&rnum=2

On OpenBSD:
It takes a long time to create MD5 checksums of large files (about 4
seconds of a file of 3MB, 12 sec of a file of 5.5MB, 43 sec of a file
of 10,5MB).

On Windows:
Files having the same filesize (all 14,5MB) creates the same MD5
checksum. This process goes very fast (perhaps too fast).

On Redhat and FreeBSD:
The script retuns the flowing error message:

can't open (#path#
): No such file or directory at md5.pl line 39.

The script should work on all operation systems.

I figured I'd post an example of what I meant by 'filter'. You'll
notice that I used 'warn' instead of 'die' if the script can't digest
a file. This will prevent the script from bombing if it's run from
cron or some other method by a common user. The output of the script
is, again, your call. And to sort (or not) is your call.

I tested it on a Linux box with the following command:
find /home/jim | perl news.pl | sort

However, I'd refine the find command to avoid getting files from other
filesystems, NFS mounts, syslinks, etc. And if you use it on a
Windows box, you'll have to find the right switches for `dir`. I
don't really do Windows

I didn't benchmark it. There's going to be variations on this -
because each OS and filesystem is different.

Enjoy

==start (what I called news.pl)==
#!/usr/gnu/bin/perl -w
#
# Name:
# MD5.pl.
#
# Purpose:
# Calculate the MD5 digest of all files in a directory and its
subdirectories.
#
# Parameter:
# File(s) provided to script from STDIN.
#
# Output:
# Digest of each file from STDIN.
#
# Output format:
# <Dirname/File name>: <MD5>\n
# <Dirname/File name>: <MD5>\n
# ...

use integer;
use strict;

use Digest::MD5;

# -------------------------------------------------------------------
my $md5 = Digest::MD5->new();

while(<>){
chomp;
newprocess($md5, $_);
}

sub newprocess{
local $/ = undef;
my($md5, $file) = @_;
open(FILE, $_)
or warn "FAILED TO DIGEST $_: $!\n" and return;
my $data = <FILE>;
print "$_: ".$md5->add($data)->hexdigest()."\n";
close FILE;
}

==end==

Michele Dondi · Apr 17, 2004

Can somebody help me out with the following problem. I tried to use
the following script of Ron Savage to create MD5 checksums of files in
a directory and all subdirectories in it, posted here:

http://groups.google.nl/[email protected]&rnum=2

I didn't see that: well, on *nix (linux), I'd just do

find <dir> -type f | xargs md5sums

but if you want it in Perl and running on virtually any system perl
runs on, then see if something like this is fine for you/can be
adapted to your needs:

#!/usr/bin/perl -l

use strict;
use warnings;
use File::Find;
use Digest::MD5;

@ARGV=grep { -d or !warn "`$_': not a directory!\n" } @ARGV;
die "Usage: $0 <dir> [<dirs>]" unless @ARGV;

find { no_chdir => 1,
wanted => sub {
return unless -f;
open my $fh, '<:raw', $_ or
warn "Can't open `$_': $!\n" and return;
print Digest::MD5->new->addfile($fh)->hexdigest,
' ', $_;
} }, @ARGV;

__END__

[Tested to work correctly in Linux (2.6.5) and W98...]

On Windows:
Files having the same filesize (all 14,5MB) creates the same MD5
checksum. This process goes very fast (perhaps too fast).

Huh?!? Are you *really* sure that by any chance not only those files
have the same file size but are also actually identical?

HTH,
Michele

Michele Dondi · Apr 17, 2004

I didn't see that: well, on *nix (linux), I'd just do

find <dir> -type f | xargs md5sums

find <dir> -type f | xargs md5sum

actually! (sorry: a typo!)

Michele

Joe Smith · Apr 18, 2004

nicogroen said:
On OpenBSD:
It takes a long time to create MD5 checksums of large files (about 4
seconds of a file of 3MB, 12 sec of a file of 5.5MB, 43 sec of a file
of 10,5MB).

That is expected if you are stuck the pure-perl implementation of
MD5 as opposed to the compiled XS module.

On Windows:
Files having the same filesize (all 14,5MB) creates the same MD5
checksum. This process goes very fast (perhaps too fast).

On Redhat and FreeBSD:
The script retuns the flowing error message:

can't open (#path#
): No such file or directory at md5.pl line 39.

The script should work on all operation systems.

It works on all systems where Digest::MD5 is properly installed.

Looks like you're running into the slow method that is invoked
whenever the MD5.so loadable object cannot be found.

eval {
Digest::MD5->bootstrap($VERSION); # Load the fast MD5.so object
};
if ($@) {
eval {
# Try to load the pure perl version if bootstrap fails
require Digest:

erl::MD5;
Digest:

erl::MD5->import(qw(md5 md5_hex md5_base64));
push(@ISA, "Digest:

erl::MD5"); # make OO interface work
};
}

-Joe

Joe Smith · Apr 18, 2004

Michele said:
I didn't see that: well, on *nix (linux), I'd just do

find <dir> -type f | xargs md5sum

Not recommended for Samba shares or anywhere that file names
and/or directory names have imbedded blanks.

find <dir> -type f -print0 | xargs -0 md5sum

-Joe

nicogroen · Apr 21, 2004

Thanks for your replies. My problem in Windows is solved by updating
the ActivePerl version (from 5.6.1 to 5.8.3).

James Willmore · Apr 26, 2004

Can somebody help me out with the following problem. I tried to use the
following script of Ron Savage to create MD5 checksums of files in a
directory and all subdirectories in it, posted here:

http://groups.google.nl/[email protected]&rnum=2

On OpenBSD:
It takes a long time to create MD5 checksums of large files (about 4
seconds of a file of 3MB, 12 sec of a file of 5.5MB, 43 sec of a file of
10,5MB).

On Windows:
Files having the same filesize (all 14,5MB) creates the same MD5
checksum. This process goes very fast (perhaps too fast).

On Redhat and FreeBSD:
The script retuns the flowing error message:

can't open (#path#
): No such file or directory at md5.pl line 39.

The script should work on all operation systems.

Yes, it should (and appears it has) work(ed) on almost all platforms -
because you noticed a difference in the execution times

Posting your code would be helpful

--
Jim

Copyright notice: all code written by the author in this post is
released under the GPL. http://www.gnu.org/licenses/gpl.txt
for more information.

a fortune quote ...
I'll defend to the death your right to say that, but I never
said I'd listen to it! -- Tom Galloway with apologies to
Voltaire

James Willmore · Apr 26, 2004

Yes, it should (and appears it has) work(ed) on almost all platforms -
because you noticed a difference in the execution times

Posting your code would be helpful

My bad, you did post code

I ran it on ye olde Linux box and it worked up until it ran into a
directory that I had no permission to access ... bummer :-(

You're execution time will depend greatly on the OS and the filesystem
being accessed. That's not the script (in most cases).

IMHO, you might be able to speed up the script by using File::Find instead
of using Cwd.

Another option is to use this script as a filter and use a command native
to the OS to feed the script files. Meaning, use `find` (in *nix) and pipe the
output to the script you're working with. Now your only concern is check
the MD5 digest of each file the script is being feed

An added plus
to this idea is ... you can check one -or- many files with your script
without the script having to figure out *how* to find the files (using Cwd
or Find::Files).

HTH

--
Jim

Copyright notice: all code written by the author in this post is
released under the GPL. http://www.gnu.org/licenses/gpl.txt
for more information.

a fortune quote ...
It is very difficult to prophesy, especially when it pertains to
the future.

Create directories and modify files with Python	8	Apr 30, 2012
List of files in directories and subdirectoriesusing os.walk or glob	0	May 25, 2008
Working with files and directories on Windows	5	Sep 29, 2006
remove last 10 lines of all files in a directory	6	Sep 26, 2006
extracting text data in the presence of a "look-up" file: Is it possible?	5	Jan 7, 2004
JavaScript to display dir of files on website	0	Jul 9, 2009
IE6 acting weird with relative include of js and css files having ..in the beginning	4	Jan 11, 2008
Create an Image Gallery with FILES web directory and names of files in SQL server	1	Feb 26, 2005

Create MD5 of files in directories and subdirectories

nicogroen

James Willmore

James Willmore

James Willmore

Michele Dondi

Michele Dondi

Joe Smith

Joe Smith

nicogroen

James Willmore

James Willmore

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads