Create MD5 of files in directories and subdirectories

Discussion in 'Perl Misc' started by nicogroen, Apr 16, 2004.

  1. nicogroen

    nicogroen Guest

    Can somebody help me out with the following problem. I tried to use
    the following script of Ron Savage to create MD5 checksums of files in
    a directory and all subdirectories in it, posted here:

    http://groups.google.nl/groups?hl=n...m=7dbo9s$&rnum=2

    On OpenBSD:
    It takes a long time to create MD5 checksums of large files (about 4
    seconds of a file of 3MB, 12 sec of a file of 5.5MB, 43 sec of a file
    of 10,5MB).

    On Windows:
    Files having the same filesize (all 14,5MB) creates the same MD5
    checksum. This process goes very fast (perhaps too fast).

    On Redhat and FreeBSD:
    The script retuns the flowing error message:

    can't open (#path#
    ): No such file or directory at md5.pl line 39.

    The script should work on all operation systems.

    Thanks in advance,
    Nico
     
    nicogroen, Apr 16, 2004
    #1
    1. Advertising

  2. On Fri, 16 Apr 2004 03:06:17 -0700, nicogroen wrote:

    > Can somebody help me out with the following problem. I tried to use the
    > following script of Ron Savage to create MD5 checksums of files in a
    > directory and all subdirectories in it, posted here:
    >
    > http://groups.google.nl/groups?hl=n...m=7dbo9s$&rnum=2
    >
    > On OpenBSD:
    > It takes a long time to create MD5 checksums of large files (about 4
    > seconds of a file of 3MB, 12 sec of a file of 5.5MB, 43 sec of a file of
    > 10,5MB).
    >
    > On Windows:
    > Files having the same filesize (all 14,5MB) creates the same MD5
    > checksum. This process goes very fast (perhaps too fast).
    >
    > On Redhat and FreeBSD:
    > The script retuns the flowing error message:
    >
    > can't open (#path#
    > ): No such file or directory at md5.pl line 39.
    >
    > The script should work on all operation systems.


    Yes, it should (and appears it has) work(ed) on almost all platforms -
    because you noticed a difference in the execution times :)

    Posting your code would be helpful :)

    --
    Jim

    Copyright notice: all code written by the author in this post is
    released under the GPL. http://www.gnu.org/licenses/gpl.txt
    for more information.

    a fortune quote ...
    I'll defend to the death your right to say that, but I never
    said I'd listen to it! -- Tom Galloway with apologies to
    Voltaire
     
    James Willmore, Apr 16, 2004
    #2
    1. Advertising

  3. On Fri, 16 Apr 2004 09:24:38 -0400, James Willmore wrote:

    > On Fri, 16 Apr 2004 03:06:17 -0700, nicogroen wrote:
    >
    >> Can somebody help me out with the following problem. I tried to use the
    >> following script of Ron Savage to create MD5 checksums of files in a
    >> directory and all subdirectories in it, posted here:
    >>
    >> http://groups.google.nl/groups?hl=n...m=7dbo9s$&rnum=2

    [ ... ]

    > Posting your code would be helpful :)


    My bad, you did post code :)

    I ran it on ye olde Linux box and it worked up until it ran into a
    directory that I had no permission to access ... bummer :-(

    You're execution time will depend greatly on the OS and the filesystem
    being accessed. That's not the script (in most cases).

    IMHO, you might be able to speed up the script by using File::Find instead
    of using Cwd.

    Another option is to use this script as a filter and use a command native
    to the OS to feed the script files. Meaning, use `find` (in *nix) and pipe the
    output to the script you're working with. Now your only concern is check
    the MD5 digest of each file the script is being feed :) An added plus
    to this idea is ... you can check one -or- many files with your script
    without the script having to figure out *how* to find the files (using Cwd
    or Find::Files).

    HTH

    --
    Jim

    Copyright notice: all code written by the author in this post is
    released under the GPL. http://www.gnu.org/licenses/gpl.txt
    for more information.

    a fortune quote ...
    It is very difficult to prophesy, especially when it pertains to
    the future.
     
    James Willmore, Apr 16, 2004
    #3
  4. (nicogroen) wrote in message news:<>...
    > Can somebody help me out with the following problem. I tried to use
    > the following script of Ron Savage to create MD5 checksums of files in
    > a directory and all subdirectories in it, posted here:
    >
    > http://groups.google.nl/groups?hl=n...m=7dbo9s$&rnum=2
    >
    > On OpenBSD:
    > It takes a long time to create MD5 checksums of large files (about 4
    > seconds of a file of 3MB, 12 sec of a file of 5.5MB, 43 sec of a file
    > of 10,5MB).
    >
    > On Windows:
    > Files having the same filesize (all 14,5MB) creates the same MD5
    > checksum. This process goes very fast (perhaps too fast).
    >
    > On Redhat and FreeBSD:
    > The script retuns the flowing error message:
    >
    > can't open (#path#
    > ): No such file or directory at md5.pl line 39.
    >
    > The script should work on all operation systems.


    I figured I'd post an example of what I meant by 'filter'. You'll
    notice that I used 'warn' instead of 'die' if the script can't digest
    a file. This will prevent the script from bombing if it's run from
    cron or some other method by a common user. The output of the script
    is, again, your call. And to sort (or not) is your call.

    I tested it on a Linux box with the following command:
    find /home/jim | perl news.pl | sort

    However, I'd refine the find command to avoid getting files from other
    filesystems, NFS mounts, syslinks, etc. And if you use it on a
    Windows box, you'll have to find the right switches for `dir`. I
    don't really do Windows :)

    I didn't benchmark it. There's going to be variations on this -
    because each OS and filesystem is different.

    Enjoy :)

    ==start (what I called news.pl)==
    #!/usr/gnu/bin/perl -w
    #
    # Name:
    # MD5.pl.
    #
    # Purpose:
    # Calculate the MD5 digest of all files in a directory and its
    subdirectories.
    #
    # Parameter:
    # File(s) provided to script from STDIN.
    #
    # Output:
    # Digest of each file from STDIN.
    #
    # Output format:
    # <Dirname/File name>: <MD5>\n
    # <Dirname/File name>: <MD5>\n
    # ...

    use integer;
    use strict;

    use Digest::MD5;

    # -------------------------------------------------------------------
    my $md5 = Digest::MD5->new();

    while(<>){
    chomp;
    newprocess($md5, $_);
    }

    sub newprocess{
    local $/ = undef;
    my($md5, $file) = @_;
    open(FILE, $_)
    or warn "FAILED TO DIGEST $_: $!\n" and return;
    my $data = <FILE>;
    print "$_: ".$md5->add($data)->hexdigest()."\n";
    close FILE;
    }

    ==end==
     
    James Willmore, Apr 17, 2004
    #4
  5. On 16 Apr 2004 03:06:17 -0700, (nicogroen) wrote:

    >Can somebody help me out with the following problem. I tried to use
    >the following script of Ron Savage to create MD5 checksums of files in
    >a directory and all subdirectories in it, posted here:
    >
    >http://groups.google.nl/groups?hl=n...m=7dbo9s$&rnum=2


    I didn't see that: well, on *nix (linux), I'd just do

    find <dir> -type f | xargs md5sums

    but if you want it in Perl and running on virtually any system perl
    runs on, then see if something like this is fine for you/can be
    adapted to your needs:

    #!/usr/bin/perl -l

    use strict;
    use warnings;
    use File::Find;
    use Digest::MD5;

    @ARGV=grep { -d or !warn "`$_': not a directory!\n" } @ARGV;
    die "Usage: $0 <dir> [<dirs>]" unless @ARGV;

    find { no_chdir => 1,
    wanted => sub {
    return unless -f;
    open my $fh, '<:raw', $_ or
    warn "Can't open `$_': $!\n" and return;
    print Digest::MD5->new->addfile($fh)->hexdigest,
    ' ', $_;
    } }, @ARGV;

    __END__


    [Tested to work correctly in Linux (2.6.5) and W98...]

    >On Windows:
    >Files having the same filesize (all 14,5MB) creates the same MD5
    >checksum. This process goes very fast (perhaps too fast).


    Huh?!? Are you *really* sure that by any chance not only those files
    have the same file size but are also actually identical?


    HTH,
    Michele
    --
    you'll see that it shouldn't be so. AND, the writting as usuall is
    fantastic incompetent. To illustrate, i quote:
    - Xah Lee trolling on clpmisc,
    "perl bug File::Basename and Perl's nature"
     
    Michele Dondi, Apr 17, 2004
    #5
  6. On Sat, 17 Apr 2004 10:25:22 +0200, Michele Dondi
    <> wrote:

    >I didn't see that: well, on *nix (linux), I'd just do
    >
    > find <dir> -type f | xargs md5sums


    find <dir> -type f | xargs md5sum

    actually! (sorry: a typo!)


    Michele
    --
    you'll see that it shouldn't be so. AND, the writting as usuall is
    fantastic incompetent. To illustrate, i quote:
    - Xah Lee trolling on clpmisc,
    "perl bug File::Basename and Perl's nature"
     
    Michele Dondi, Apr 17, 2004
    #6
  7. nicogroen

    Joe Smith Guest

    nicogroen wrote:

    > On OpenBSD:
    > It takes a long time to create MD5 checksums of large files (about 4
    > seconds of a file of 3MB, 12 sec of a file of 5.5MB, 43 sec of a file
    > of 10,5MB).


    That is expected if you are stuck the pure-perl implementation of
    MD5 as opposed to the compiled XS module.

    > On Windows:
    > Files having the same filesize (all 14,5MB) creates the same MD5
    > checksum. This process goes very fast (perhaps too fast).
    >
    > On Redhat and FreeBSD:
    > The script retuns the flowing error message:
    >
    > can't open (#path#
    > ): No such file or directory at md5.pl line 39.
    >
    > The script should work on all operation systems.


    It works on all systems where Digest::MD5 is properly installed.

    Looks like you're running into the slow method that is invoked
    whenever the MD5.so loadable object cannot be found.

    eval {
    Digest::MD5->bootstrap($VERSION); # Load the fast MD5.so object
    };
    if ($@) {
    eval {
    # Try to load the pure perl version if bootstrap fails
    require Digest::perl::MD5;
    Digest::perl::MD5->import(qw(md5 md5_hex md5_base64));
    push(@ISA, "Digest::perl::MD5"); # make OO interface work
    };
    }

    -Joe
     
    Joe Smith, Apr 19, 2004
    #7
  8. nicogroen

    Joe Smith Guest

    Michele Dondi wrote:

    > I didn't see that: well, on *nix (linux), I'd just do
    >
    > find <dir> -type f | xargs md5sum


    Not recommended for Samba shares or anywhere that file names
    and/or directory names have imbedded blanks.

    find <dir> -type f -print0 | xargs -0 md5sum

    -Joe
     
    Joe Smith, Apr 19, 2004
    #8
  9. nicogroen

    nicogroen Guest

    Thanks for your replies. My problem in Windows is solved by updating
    the ActivePerl version (from 5.6.1 to 5.8.3).
     
    nicogroen, Apr 21, 2004
    #9
  10. On Fri, 16 Apr 2004 03:06:17 -0700, nicogroen wrote:

    > Can somebody help me out with the following problem. I tried to use the
    > following script of Ron Savage to create MD5 checksums of files in a
    > directory and all subdirectories in it, posted here:
    >
    > http://groups.google.nl/groups?hl=n...m=7dbo9s$&rnum=2
    >
    > On OpenBSD:
    > It takes a long time to create MD5 checksums of large files (about 4
    > seconds of a file of 3MB, 12 sec of a file of 5.5MB, 43 sec of a file of
    > 10,5MB).
    >
    > On Windows:
    > Files having the same filesize (all 14,5MB) creates the same MD5
    > checksum. This process goes very fast (perhaps too fast).
    >
    > On Redhat and FreeBSD:
    > The script retuns the flowing error message:
    >
    > can't open (#path#
    > ): No such file or directory at md5.pl line 39.
    >
    > The script should work on all operation systems.


    Yes, it should (and appears it has) work(ed) on almost all platforms -
    because you noticed a difference in the execution times :)

    Posting your code would be helpful :)

    --
    Jim

    Copyright notice: all code written by the author in this post is
    released under the GPL. http://www.gnu.org/licenses/gpl.txt
    for more information.

    a fortune quote ...
    I'll defend to the death your right to say that, but I never
    said I'd listen to it! -- Tom Galloway with apologies to
    Voltaire
     
    James Willmore, Apr 26, 2004
    #10
  11. On Fri, 16 Apr 2004 09:24:38 -0400, James Willmore wrote:

    > On Fri, 16 Apr 2004 03:06:17 -0700, nicogroen wrote:
    >
    >> Can somebody help me out with the following problem. I tried to use the
    >> following script of Ron Savage to create MD5 checksums of files in a
    >> directory and all subdirectories in it, posted here:
    >>
    >> http://groups.google.nl/groups?hl=n...m=7dbo9s$&rnum=2
    >>
    >> On OpenBSD:
    >> It takes a long time to create MD5 checksums of large files (about 4
    >> seconds of a file of 3MB, 12 sec of a file of 5.5MB, 43 sec of a file of
    >> 10,5MB).
    >>
    >> On Windows:
    >> Files having the same filesize (all 14,5MB) creates the same MD5
    >> checksum. This process goes very fast (perhaps too fast).
    >>
    >> On Redhat and FreeBSD:
    >> The script retuns the flowing error message:
    >>
    >> can't open (#path#
    >> ): No such file or directory at md5.pl line 39.
    >>
    >> The script should work on all operation systems.

    >
    > Yes, it should (and appears it has) work(ed) on almost all platforms -
    > because you noticed a difference in the execution times :)
    >
    > Posting your code would be helpful :)


    My bad, you did post code :)

    I ran it on ye olde Linux box and it worked up until it ran into a
    directory that I had no permission to access ... bummer :-(

    You're execution time will depend greatly on the OS and the filesystem
    being accessed. That's not the script (in most cases).

    IMHO, you might be able to speed up the script by using File::Find instead
    of using Cwd.

    Another option is to use this script as a filter and use a command native
    to the OS to feed the script files. Meaning, use `find` (in *nix) and pipe the
    output to the script you're working with. Now your only concern is check
    the MD5 digest of each file the script is being feed :) An added plus
    to this idea is ... you can check one -or- many files with your script
    without the script having to figure out *how* to find the files (using Cwd
    or Find::Files).

    HTH

    --
    Jim

    Copyright notice: all code written by the author in this post is
    released under the GPL. http://www.gnu.org/licenses/gpl.txt
    for more information.

    a fortune quote ...
    It is very difficult to prophesy, especially when it pertains to
    the future.
     
    James Willmore, Apr 26, 2004
    #11
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Luigi Donatello Asero

    Directories and subdirectories.

    Luigi Donatello Asero, Oct 31, 2005, in forum: HTML
    Replies:
    5
    Views:
    481
    Brian Cryer
    Nov 3, 2005
  2. Luigi Donatello Asero

    Re:Directories and subdirectories.

    Luigi Donatello Asero, Oct 31, 2005, in forum: HTML
    Replies:
    2
    Views:
    394
    dorayme
    Nov 1, 2005
  3. Christian Seberino
    Replies:
    0
    Views:
    298
    Christian Seberino
    Nov 3, 2003
  4. Replies:
    9
    Views:
    17,203
    John Salerno
    May 8, 2006
  5. Peter Woodsky

    create a md5 / md5 passwd with a salt

    Peter Woodsky, Nov 20, 2008, in forum: Ruby
    Replies:
    6
    Views:
    240
    Brian Candler
    Nov 21, 2008
Loading...

Share This Page